Description: You should work in a group of no more than 4 people on a final data science project. The purpose of the project is to provide you with real data science experience, including posing questions, finding data, exploring and visualizing the data, analyzing the data, and summarizing your findings. As a group you should begin with a certain curiosity, for example, in my lecture 'What happened in Ohio?' I looked at the presidential election in OH. Then we processed the data, visualized it, and asked specific questions.
Data Sources: Based on what you have learned you can extract data from pretty much anywhere, but for inspiration you can look at the following links:
- This big list of datasets
- Data.gov
- The internet
Grading criteria
- Code: we will grade the code according to the rubric
- Exploratory data analysis: Did you explore the data before moving on with your analysis? Looking at the data can mean summary statistics, dealing with missingness, visualization, etc.
- Question and summaries: Are there clear research questions that you asked, and did you address these in an orderly fashion? Do you make well justified conclusions?
- Statistics: is your use of statistics and machine learning valid? Did you choose appropriate methods based on your questions, the data, and your assumptions?
- Visualization: do your visualizations follow the principles of graphical excellence? Do your visualizations support your conclusions?
- Data extraction: does your project display novel ways of extracting data (via web scrapping, etc.) and do you use multiple data sources?
- Data munging and storage: do you process the data in an clear, efficient, and organized way? Do you join multiple data sources appropriately? Do you store your processed data in an efficient way, using databases or well thought out data structures?
- Organization: Is your github repo organized? Are your notebooks clear, with a natural flow, and easy to follow presentation? Is your writing clear and edited?
We will grade each of these according to a scale, with the highest grades going to only the best examples of these categories. Then we will drop the lowest 2 of these scores, so that we will promote excellence without necessarily requiring that the you hit all of these bases. We will also add grades to smaller groups, and penalize larger groups. You should roughly have material proportional to the number of people in your group.
Feel free to remove this README or cache it as another file, then add you own README describing your project. It should be clear what notebook the reader should be looking at.