diff --git a/1_Reference_EDA.ipynb b/1_Reference_EDA.ipynb new file mode 100644 index 0000000..1f75ecc --- /dev/null +++ b/1_Reference_EDA.ipynb @@ -0,0 +1,3253 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zb2OPLAEoc29" + }, + "source": [ + "# DonorsChoose" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TnzJ7Nkqoc2_" + }, + "source": [ + "

\n", + "DonorsChoose.org receives hundreds of thousands of project proposals each year for classroom projects in need of funding. Right now, a large number of volunteers is needed to manually screen each submission before it's approved to be posted on the DonorsChoose.org website.\n", + "

\n", + "

\n", + " Next year, DonorsChoose.org expects to receive close to 500,000 project proposals. As a result, there are three main problems they need to solve:\n", + "

\n", + "

\n", + "

\n", + "The goal of the competition is to predict whether or not a DonorsChoose.org project proposal submitted by a teacher will be approved, using the text of project descriptions as well as additional metadata about the project, teacher, and school. DonorsChoose.org can then use this information to identify projects most likely to need further review before approval.\n", + "

" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0LUPgFS9oc3A" + }, + "source": [ + "## About the DonorsChoose Data Set\n", + "\n", + "The `train.csv` data set provided by DonorsChoose contains the following features:\n", + "\n", + "Feature | Description\n", + "----------|---------------\n", + "**`project_id`** | A unique identifier for the proposed project. **Example:** `p036502` \n", + "**`project_title`** | Title of the project. **Examples:**
\n", + "**`project_grade_category`** | Grade level of students for which the project is targeted. One of the following enumerated values:
\n", + " **`project_subject_categories`** | One or more (comma-separated) subject categories for the project from the following enumerated list of values:

**Examples:**