Frontier Culture and Modern Politics in the US

This project seeks to understand the defining features of the American Frontier by identifying “frontier language” and its persistence over time through natural language processing and machine learning. The main difficulties with this project lie primarily with 2 tasks: Firstly, having to handle a dataset of such a large size (in excess of 1,200 lengthy text documents); Secondly, the possibility of not finding any natural language processing algorithm which, out of the box, can analyze the data and instead having to heavily modify either the algorithm or our data in order to obtain our analyses.

Team Members

Jean P. Vazquez
Kevin Chen
Zhiwei Tang

Dataset

Our data consisted of two historical datasets: Frederick Jackson Turner’s speeches from http://xroads.virginia.edu/~hyper/turner/, a folder containing the histories of several hundred U.S. counties organized by state from www.dropbox.com/county-histories as well as two modern datasets to compare with the historical datasets consisting of presidential nominees nomination acceptance speeches and political party platforms from https://www.presidency.ucsb.edu/documents. In order to use the dataset, however, several steps had to be taken. First, all the files were converted into .txt format, which the county histories files already were in, but all others required conversion into this. Following this, several preprocessing steps were taken to streamline the reading process; all punctuation marks and special characters were filtered out. Lastly, each individual text was compacted such that there was no distinction between different lines or sentences, only separate words.

Approach

The project was divided into two separate phases. The original first phase consisted of finding as many “frontier words” as possible within the historical datasets provided; which are words that are much more frequently spoken by “frontier-esque” persons and, as such, would serve as the basis for the second phase of this project, wherein the change of “frontier-esque” behavior over time would be measured. However, several attempts were made using unsupervised machine learning algorithms in order to find these words to extremely limited success. Additionally, consulting with Dr. Lapets about the attempts made to find words given a theme such as “frontier” only confirmed the information inferred from the previous results, that no machine learning or natural language processing algorithm would yield any results unless the method was supervised and very heavily modified. As such, instead of attempting to find words, the project focus was shifted immediately towards the second phase thanks to a list of “frontier words” given by the project head, Dr. Martin Fitzsbein. The second phase consists of then using the “frontier words” found in the first phase of the project in order to place candidates on a continuum based on how “frontier-esque” they are determined to be. In order to determine the “frontierness” of a candidate, three separate methods were applied across both historical datasets: term frequency analysis, a tf-idf score analysis, and what was dubbed “near word association”.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Frontier_Culture_Poster.pdf		Frontier_Culture_Poster.pdf
README.md		README.md
V2_Final _Project_Report.pdf		V2_Final _Project_Report.pdf
aa_FrontierCultureTermFrequency_&_Near Words.ipynb		aa_FrontierCultureTermFrequency_&_Near Words.ipynb
tfidf-with weight term.ipynb		tfidf-with weight term.ipynb
tfidf-without weight.ipynb		tfidf-without weight.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frontier Culture and Modern Politics in the US

Team Members

Dataset

Approach

For more information on this project and the final results, refer to the Final Report pdf

About

Releases

Packages

Contributors 3

Languages

JPVazquez/Frontier-Culture-and-Modern-Politics-in-the-US

Folders and files

Latest commit

History

Repository files navigation

Frontier Culture and Modern Politics in the US

Team Members

Dataset

Approach

For more information on this project and the final results, refer to the Final Report pdf

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages