diff --git a/1_Reference_EDA.ipynb b/1_Reference_EDA.ipynb
new file mode 100644
index 0000000..1f75ecc
--- /dev/null
+++ b/1_Reference_EDA.ipynb
@@ -0,0 +1,3253 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zb2OPLAEoc29"
+ },
+ "source": [
+ "# DonorsChoose"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TnzJ7Nkqoc2_"
+ },
+ "source": [
+ "
\n",
+ "DonorsChoose.org receives hundreds of thousands of project proposals each year for classroom projects in need of funding. Right now, a large number of volunteers is needed to manually screen each submission before it's approved to be posted on the DonorsChoose.org website.\n",
+ "
\n",
+ "\n",
+ " Next year, DonorsChoose.org expects to receive close to 500,000 project proposals. As a result, there are three main problems they need to solve:\n",
+ "
\n",
+ "\n",
+ " How to scale current manual processes and resources to screen 500,000 projects so that they can be posted as quickly and as efficiently as possible \n",
+ " How to increase the consistency of project vetting across different volunteers to improve the experience for teachers \n",
+ " How to focus volunteer time on the applications that need the most assistance \n",
+ " \n",
+ " \n",
+ "\n",
+ "The goal of the competition is to predict whether or not a DonorsChoose.org project proposal submitted by a teacher will be approved, using the text of project descriptions as well as additional metadata about the project, teacher, and school. DonorsChoose.org can then use this information to identify projects most likely to need further review before approval.\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0LUPgFS9oc3A"
+ },
+ "source": [
+ "## About the DonorsChoose Data Set\n",
+ "\n",
+ "The `train.csv` data set provided by DonorsChoose contains the following features:\n",
+ "\n",
+ "Feature | Description\n",
+ "----------|---------------\n",
+ "**`project_id`** | A unique identifier for the proposed project. **Example:** `p036502` \n",
+ "**`project_title`** | Title of the project. **Examples:**Art Will Make You Happy!
First Grade Fun
\n",
+ "**`project_grade_category`** | Grade level of students for which the project is targeted. One of the following enumerated values: Grades PreK-2
Grades 3-5
Grades 6-8
Grades 9-12
\n",
+ " **`project_subject_categories`** | One or more (comma-separated) subject categories for the project from the following enumerated list of values: Applied Learning
Care & Hunger
Health & Sports
History & Civics
Literacy & Language
Math & Science
Music & The Arts
Special Needs
Warmth
**Examples:** Music & The Arts
Literacy & Language, Math & Science
\n",
+ " **`school_state`** | State where school is located ([Two-letter U.S. postal code](https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations#Postal_codes)). **Example:** `WY`\n",
+ "**`project_subject_subcategories`** | One or more (comma-separated) subject subcategories for the project. **Examples:** Literacy
Literature & Writing, Social Sciences
\n",
+ "**`project_resource_summary`** | An explanation of the resources needed for the project. **Example:** My students need hands on literacy materials to manage sensory needs!
\n",
+ "**`project_essay_1`** | First application essay* \n",
+ "**`project_essay_2`** | Second application essay* \n",
+ "**`project_essay_3`** | Third application essay* \n",
+ "**`project_essay_4`** | Fourth application essay* \n",
+ "**`project_submitted_datetime`** | Datetime when project application was submitted. **Example:** `2016-04-28 12:43:56.245` \n",
+ "**`teacher_id`** | A unique identifier for the teacher of the proposed project. **Example:** `bdf8baa8fedef6bfeec7ae4ff1c15c56` \n",
+ "**`teacher_prefix`** | Teacher's title. One of the following enumerated values: \n",
+ "**`teacher_number_of_previously_posted_projects`** | Number of project applications previously submitted by the same teacher. **Example:** `2`\n",
+ "\n",
+ "* See the section Notes on the Essay Data for more details about these features.\n",
+ "\n",
+ "Additionally, the `resources.csv` data set provides more data about the resources required for each project. Each line in this file represents a resource required by a project:\n",
+ "\n",
+ "Feature | Description\n",
+ "----------|---------------\n",
+ "**`id`** | A `project_id` value from the `train.csv` file. **Example:** `p036502` \n",
+ "**`description`** | Desciption of the resource. **Example:** `Tenor Saxophone Reeds, Box of 25` \n",
+ "**`quantity`** | Quantity of the resource required. **Example:** `3` \n",
+ "**`price`** | Price of the resource required. **Example:** `9.95` \n",
+ "\n",
+ "**Note:** Many projects require multiple resources. The `id` value corresponds to a `project_id` in train.csv, so you use it as a key to retrieve all resources needed for a project:\n",
+ "\n",
+ "The data set contains the following label (the value you will attempt to predict):\n",
+ "\n",
+ "Label | Description\n",
+ "----------|---------------\n",
+ "`project_is_approved` | A binary flag indicating whether DonorsChoose approved the project. A value of `0` indicates the project was not approved, and a value of `1` indicates the project was approved."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "4nXezp54oc3B"
+ },
+ "source": [
+ "### Notes on the Essay Data\n",
+ "\n",
+ "\n",
+ "Prior to May 17, 2016, the prompts for the essays were as follows:\n",
+ "__project_essay_1:__ \"Introduce us to your classroom\" \n",
+ "__project_essay_2:__ \"Tell us more about your students\" \n",
+ "__project_essay_3:__ \"Describe how your students will use the materials you're requesting\" \n",
+ "__project_essay_3:__ \"Close by sharing why your project will make a difference\" \n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ "Starting on May 17, 2016, the number of essays was reduced from 4 to 2, and the prompts for the first 2 essays were changed to the following: \n",
+ "__project_essay_1:__ \"Describe your students: What makes your students special? Specific details about their background, your neighborhood, and your school are all helpful.\" \n",
+ "__project_essay_2:__ \"About your project: How will these materials make a difference in your students' learning and improve their school lives?\" \n",
+ " For all projects with project_submitted_datetime of 2016-05-17 and later, the values of project_essay_3 and project_essay_4 will be NaN.\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 17
+ },
+ "id": "lDzTpG88oc3D",
+ "outputId": "113ac5a6-8381-4219-f29a-382434086caa"
+ },
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/html": [
+ " \n",
+ " "
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "source": [
+ "%matplotlib inline\n",
+ "import warnings\n",
+ "warnings.filterwarnings(\"ignore\")\n",
+ "\n",
+ "import sqlite3\n",
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "import nltk\n",
+ "import string\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "from sklearn.feature_extraction.text import TfidfTransformer\n",
+ "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+ "\n",
+ "from sklearn.feature_extraction.text import CountVectorizer\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "from sklearn import metrics\n",
+ "from sklearn.metrics import roc_curve, auc\n",
+ "from nltk.stem.porter import PorterStemmer\n",
+ "\n",
+ "import re\n",
+ "# Tutorial about Python regular expressions: https://pymotw.com/2/re/\n",
+ "import string\n",
+ "from nltk.corpus import stopwords\n",
+ "from nltk.stem import PorterStemmer\n",
+ "from nltk.stem.wordnet import WordNetLemmatizer\n",
+ "\n",
+ "from gensim.models import Word2Vec\n",
+ "from gensim.models import KeyedVectors\n",
+ "import pickle\n",
+ "\n",
+ "from tqdm import tqdm\n",
+ "import os\n",
+ "\n",
+ "from chart_studio import plotly\n",
+ "import plotly.offline as offline\n",
+ "import plotly.graph_objs as go\n",
+ "offline.init_notebook_mode()\n",
+ "from collections import Counter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from google.colab import files\n",
+ "files.upload()\n"
+ ],
+ "metadata": {
+ "colab": {
+ "resources": {
+ "http://localhost:8080/nbextensions/google.colab/files.js": {
+ "data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgZG8gewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwoKICAgICAgbGV0IHBlcmNlbnREb25lID0gZmlsZURhdGEuYnl0ZUxlbmd0aCA9PT0gMCA/CiAgICAgICAgICAxMDAgOgogICAgICAgICAgTWF0aC5yb3VuZCgocG9zaXRpb24gLyBmaWxlRGF0YS5ieXRlTGVuZ3RoKSAqIDEwMCk7CiAgICAgIHBlcmNlbnQudGV4dENvbnRlbnQgPSBgJHtwZXJjZW50RG9uZX0lIGRvbmVgOwoKICAgIH0gd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCk7CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
+ "ok": true,
+ "headers": [
+ [
+ "content-type",
+ "application/javascript"
+ ]
+ ],
+ "status": 200,
+ "status_text": ""
+ }
+ },
+ "base_uri": "https://localhost:8080/",
+ "height": 73
+ },
+ "id": "rjK9wg5AU_Gn",
+ "outputId": "68ad995e-5a3b-4622-f80c-e857f2252351"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ " \n",
+ " Upload widget is only available when the cell has been executed in the\n",
+ " current browser session. Please rerun this cell to enable.\n",
+ " \n",
+ " "
+ ]
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Saving train_data.csv to train_data.csv\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9GiSAGzAoc3R"
+ },
+ "source": [
+ "## 1.1 Reading Data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 346
+ },
+ "id": "pSrg84iHoc3T",
+ "outputId": "48919635-a94d-4519-e31d-65b2c5e56f27"
+ },
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "FileNotFoundError",
+ "evalue": "ignored",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mproject_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'train_data.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mresource_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'resources.csv'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/util/_decorators.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 309\u001b[0m \u001b[0mstacklevel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mstacklevel\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 310\u001b[0m )\n\u001b[0;32m--> 311\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 312\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 313\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py\u001b[0m in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)\u001b[0m\n\u001b[1;32m 584\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkwds_defaults\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 585\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 586\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 587\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 588\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 480\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 481\u001b[0m \u001b[0;31m# Create the parser.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 482\u001b[0;31m \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 483\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 484\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mchunksize\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0miterator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 809\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"has_index_names\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 810\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 811\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 812\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 813\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/readers.py\u001b[0m in \u001b[0;36m_make_engine\u001b[0;34m(self, engine)\u001b[0m\n\u001b[1;32m 1038\u001b[0m )\n\u001b[1;32m 1039\u001b[0m \u001b[0;31m# error: Too many arguments for \"ParserBase\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1040\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mmapping\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# type: ignore[call-arg]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1041\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1042\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_failover_to_python\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/c_parser_wrapper.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 49\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 50\u001b[0m \u001b[0;31m# open handles\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 51\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_open_handles\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 52\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhandles\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 53\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/parsers/base_parser.py\u001b[0m in \u001b[0;36m_open_handles\u001b[0;34m(self, src, kwds)\u001b[0m\n\u001b[1;32m 227\u001b[0m \u001b[0mmemory_map\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"memory_map\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 228\u001b[0m \u001b[0mstorage_options\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"storage_options\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 229\u001b[0;31m \u001b[0merrors\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"encoding_errors\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"strict\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 230\u001b[0m )\n\u001b[1;32m 231\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/common.py\u001b[0m in \u001b[0;36mget_handle\u001b[0;34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[0m\n\u001b[1;32m 705\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mioargs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mencoding\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 706\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0merrors\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 707\u001b[0;31m \u001b[0mnewline\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 708\u001b[0m )\n\u001b[1;32m 709\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'train_data.csv'"
+ ]
+ }
+ ],
+ "source": [
+ "project_data = pd.read_csv('train_data.csv')\n",
+ "resource_data = pd.read_csv('resources.csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ga6SAgwKoc3Z",
+ "outputId": "65842885-302d-4b5b-b8c7-55defe1448b2"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Number of data points in train data (109248, 17)\n",
+ "--------------------------------------------------\n",
+ "The attributes of data : ['Unnamed: 0' 'id' 'teacher_id' 'teacher_prefix' 'school_state'\n",
+ " 'project_submitted_datetime' 'project_grade_category'\n",
+ " 'project_subject_categories' 'project_subject_subcategories'\n",
+ " 'project_title' 'project_essay_1' 'project_essay_2' 'project_essay_3'\n",
+ " 'project_essay_4' 'project_resource_summary'\n",
+ " 'teacher_number_of_previously_posted_projects' 'project_is_approved']\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"Number of data points in train data\", project_data.shape)\n",
+ "print('-'*50)\n",
+ "print(\"The attributes of data :\", project_data.columns.values)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "JJG7pF0yoc3f",
+ "outputId": "c30e2577-12e3-46b1-eb67-0858be479499"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Number of data points in train data (1541272, 4)\n",
+ "['id' 'description' 'quantity' 'price']\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " id \n",
+ " description \n",
+ " quantity \n",
+ " price \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " p233245 \n",
+ " LC652 - Lakeshore Double-Space Mobile Drying Rack \n",
+ " 1 \n",
+ " 149.00 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " p069063 \n",
+ " Bouncy Bands for Desks (Blue support pipes) \n",
+ " 3 \n",
+ " 14.95 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " id description quantity \\\n",
+ "0 p233245 LC652 - Lakeshore Double-Space Mobile Drying Rack 1 \n",
+ "1 p069063 Bouncy Bands for Desks (Blue support pipes) 3 \n",
+ "\n",
+ " price \n",
+ "0 149.00 \n",
+ "1 14.95 "
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "print(\"Number of data points in train data\", resource_data.shape)\n",
+ "print(resource_data.columns.values)\n",
+ "resource_data.head(2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5Vn9_hV9oc3m"
+ },
+ "source": [
+ "# 1.2 Data Analysis"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "9thzW3Bxoc3p",
+ "outputId": "c2159997-a047-4281-ba7a-1cc101a37f31"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Number of projects thar are approved for funding 92706 , ( 84.85830404217927 %)\n",
+ "Number of projects thar are not approved for funding 16542 , ( 15.141695957820739 %)\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# this code is taken from\n",
+ "# https://matplotlib.org/gallery/pie_and_polar_charts/pie_and_donut_labels.html#sphx-glr-gallery-pie-and-polar-charts-pie-and-donut-labels-py\n",
+ "\n",
+ "\n",
+ "y_value_counts = project_data['project_is_approved'].value_counts()\n",
+ "print(\"Number of projects thar are approved for funding \", y_value_counts[1], \", (\", (y_value_counts[1]/(y_value_counts[1]+y_value_counts[0]))*100,\"%)\")\n",
+ "print(\"Number of projects thar are not approved for funding \", y_value_counts[0], \", (\", (y_value_counts[0]/(y_value_counts[1]+y_value_counts[0]))*100,\"%)\")\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(aspect=\"equal\"))\n",
+ "recipe = [\"Accepted\", \"Not Accepted\"]\n",
+ "\n",
+ "data = [y_value_counts[1], y_value_counts[0]]\n",
+ "\n",
+ "wedges, texts = ax.pie(data, wedgeprops=dict(width=0.5), startangle=-40)\n",
+ "\n",
+ "bbox_props = dict(boxstyle=\"square,pad=0.3\", fc=\"w\", ec=\"k\", lw=0.72)\n",
+ "kw = dict(xycoords='data', textcoords='data', arrowprops=dict(arrowstyle=\"-\"),\n",
+ " bbox=bbox_props, zorder=0, va=\"center\")\n",
+ "\n",
+ "for i, p in enumerate(wedges):\n",
+ " ang = (p.theta2 - p.theta1)/2. + p.theta1\n",
+ " y = np.sin(np.deg2rad(ang))\n",
+ " x = np.cos(np.deg2rad(ang))\n",
+ " horizontalalignment = {-1: \"right\", 1: \"left\"}[int(np.sign(x))]\n",
+ " connectionstyle = \"angle,angleA=0,angleB={}\".format(ang)\n",
+ " kw[\"arrowprops\"].update({\"connectionstyle\": connectionstyle})\n",
+ " ax.annotate(recipe[i], xy=(x, y), xytext=(1.35*np.sign(x), 1.4*y),\n",
+ " horizontalalignment=horizontalalignment, **kw)\n",
+ "\n",
+ "ax.set_title(\"Nmber of projects that are Accepted and not accepted\")\n",
+ "\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6-A5ffngoc3w"
+ },
+ "source": [
+ "### 1.2.1 Univariate Analysis: School State"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "978IRWUsoc3y",
+ "outputId": "036fc87d-2bcd-4194-add4-3facaed61ef6"
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.plotly.v1+json": {
+ "config": {
+ "linkText": "Export to plot.ly",
+ "plotlyServerURL": "https://plot.ly",
+ "showLink": false
+ },
+ "data": [
+ {
+ "autocolorscale": false,
+ "colorbar": {
+ "title": {
+ "text": "% of pro"
+ }
+ },
+ "colorscale": [
+ [
+ 0,
+ "rgb(242,240,247)"
+ ],
+ [
+ 0.2,
+ "rgb(218,218,235)"
+ ],
+ [
+ 0.4,
+ "rgb(188,189,220)"
+ ],
+ [
+ 0.6,
+ "rgb(158,154,200)"
+ ],
+ [
+ 0.8,
+ "rgb(117,107,177)"
+ ],
+ [
+ 1,
+ "rgb(84,39,143)"
+ ]
+ ],
+ "locationmode": "USA-states",
+ "locations": [
+ "AK",
+ "AL",
+ "AR",
+ "AZ",
+ "CA",
+ "CO",
+ "CT",
+ "DC",
+ "DE",
+ "FL",
+ "GA",
+ "HI",
+ "IA",
+ "ID",
+ "IL",
+ "IN",
+ "KS",
+ "KY",
+ "LA",
+ "MA",
+ "MD",
+ "ME",
+ "MI",
+ "MN",
+ "MO",
+ "MS",
+ "MT",
+ "NC",
+ "ND",
+ "NE",
+ "NH",
+ "NJ",
+ "NM",
+ "NV",
+ "NY",
+ "OH",
+ "OK",
+ "OR",
+ "PA",
+ "RI",
+ "SC",
+ "SD",
+ "TN",
+ "TX",
+ "UT",
+ "VA",
+ "VT",
+ "WA",
+ "WI",
+ "WV",
+ "WY"
+ ],
+ "marker": {
+ "line": {
+ "color": "rgb(255,255,255)",
+ "width": 2
+ }
+ },
+ "text": [
+ "AK",
+ "AL",
+ "AR",
+ "AZ",
+ "CA",
+ "CO",
+ "CT",
+ "DC",
+ "DE",
+ "FL",
+ "GA",
+ "HI",
+ "IA",
+ "ID",
+ "IL",
+ "IN",
+ "KS",
+ "KY",
+ "LA",
+ "MA",
+ "MD",
+ "ME",
+ "MI",
+ "MN",
+ "MO",
+ "MS",
+ "MT",
+ "NC",
+ "ND",
+ "NE",
+ "NH",
+ "NJ",
+ "NM",
+ "NV",
+ "NY",
+ "OH",
+ "OK",
+ "OR",
+ "PA",
+ "RI",
+ "SC",
+ "SD",
+ "TN",
+ "TX",
+ "UT",
+ "VA",
+ "VT",
+ "WA",
+ "WI",
+ "WV",
+ "WY"
+ ],
+ "type": "choropleth",
+ "z": [
+ 0.8405797101449275,
+ 0.8547105561861521,
+ 0.8312678741658722,
+ 0.8383791336748952,
+ 0.8581362100337926,
+ 0.8415841584158416,
+ 0.8689116055321707,
+ 0.8023255813953488,
+ 0.8979591836734694,
+ 0.8316895715440582,
+ 0.8400201867272269,
+ 0.8560157790927022,
+ 0.8528528528528528,
+ 0.8354978354978355,
+ 0.8528735632183908,
+ 0.8450381679389313,
+ 0.8391167192429022,
+ 0.8634969325153374,
+ 0.8312447786131997,
+ 0.8601925491837589,
+ 0.8388375165125496,
+ 0.8475247524752475,
+ 0.8453021195824106,
+ 0.8576158940397351,
+ 0.8548136645962733,
+ 0.8450491307634165,
+ 0.8163265306122449,
+ 0.8550383028874484,
+ 0.8881118881118881,
+ 0.8414239482200647,
+ 0.8735632183908046,
+ 0.8439874832364774,
+ 0.8599640933572711,
+ 0.8536942209217264,
+ 0.859661109592785,
+ 0.8751520064856101,
+ 0.8347978910369068,
+ 0.8502415458937198,
+ 0.8549372788678031,
+ 0.8526315789473684,
+ 0.860010162601626,
+ 0.84,
+ 0.8501184834123223,
+ 0.8131422390481341,
+ 0.8365106874638937,
+ 0.8503667481662591,
+ 0.8,
+ 0.87617823479006,
+ 0.8456486042692939,
+ 0.8548707753479126,
+ 0.8367346938775511
+ ]
+ }
+ ],
+ "layout": {
+ "geo": {
+ "lakecolor": "rgb(255, 255, 255)",
+ "projection": {
+ "type": "albers usa"
+ },
+ "scope": "usa",
+ "showlakes": true
+ },
+ "template": {
+ "data": {
+ "bar": [
+ {
+ "error_x": {
+ "color": "#2a3f5f"
+ },
+ "error_y": {
+ "color": "#2a3f5f"
+ },
+ "marker": {
+ "line": {
+ "color": "#E5ECF6",
+ "width": 0.5
+ },
+ "pattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ }
+ },
+ "type": "bar"
+ }
+ ],
+ "barpolar": [
+ {
+ "marker": {
+ "line": {
+ "color": "#E5ECF6",
+ "width": 0.5
+ },
+ "pattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ }
+ },
+ "type": "barpolar"
+ }
+ ],
+ "carpet": [
+ {
+ "aaxis": {
+ "endlinecolor": "#2a3f5f",
+ "gridcolor": "white",
+ "linecolor": "white",
+ "minorgridcolor": "white",
+ "startlinecolor": "#2a3f5f"
+ },
+ "baxis": {
+ "endlinecolor": "#2a3f5f",
+ "gridcolor": "white",
+ "linecolor": "white",
+ "minorgridcolor": "white",
+ "startlinecolor": "#2a3f5f"
+ },
+ "type": "carpet"
+ }
+ ],
+ "choropleth": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "type": "choropleth"
+ }
+ ],
+ "contour": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "contour"
+ }
+ ],
+ "contourcarpet": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "type": "contourcarpet"
+ }
+ ],
+ "heatmap": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "heatmap"
+ }
+ ],
+ "heatmapgl": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "heatmapgl"
+ }
+ ],
+ "histogram": [
+ {
+ "marker": {
+ "pattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ }
+ },
+ "type": "histogram"
+ }
+ ],
+ "histogram2d": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "histogram2d"
+ }
+ ],
+ "histogram2dcontour": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "histogram2dcontour"
+ }
+ ],
+ "mesh3d": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "type": "mesh3d"
+ }
+ ],
+ "parcoords": [
+ {
+ "line": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "parcoords"
+ }
+ ],
+ "pie": [
+ {
+ "automargin": true,
+ "type": "pie"
+ }
+ ],
+ "scatter": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatter"
+ }
+ ],
+ "scatter3d": [
+ {
+ "line": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatter3d"
+ }
+ ],
+ "scattercarpet": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattercarpet"
+ }
+ ],
+ "scattergeo": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattergeo"
+ }
+ ],
+ "scattergl": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattergl"
+ }
+ ],
+ "scattermapbox": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattermapbox"
+ }
+ ],
+ "scatterpolar": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatterpolar"
+ }
+ ],
+ "scatterpolargl": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatterpolargl"
+ }
+ ],
+ "scatterternary": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatterternary"
+ }
+ ],
+ "surface": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "surface"
+ }
+ ],
+ "table": [
+ {
+ "cells": {
+ "fill": {
+ "color": "#EBF0F8"
+ },
+ "line": {
+ "color": "white"
+ }
+ },
+ "header": {
+ "fill": {
+ "color": "#C8D4E3"
+ },
+ "line": {
+ "color": "white"
+ }
+ },
+ "type": "table"
+ }
+ ]
+ },
+ "layout": {
+ "annotationdefaults": {
+ "arrowcolor": "#2a3f5f",
+ "arrowhead": 0,
+ "arrowwidth": 1
+ },
+ "autotypenumbers": "strict",
+ "coloraxis": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "colorscale": {
+ "diverging": [
+ [
+ 0,
+ "#8e0152"
+ ],
+ [
+ 0.1,
+ "#c51b7d"
+ ],
+ [
+ 0.2,
+ "#de77ae"
+ ],
+ [
+ 0.3,
+ "#f1b6da"
+ ],
+ [
+ 0.4,
+ "#fde0ef"
+ ],
+ [
+ 0.5,
+ "#f7f7f7"
+ ],
+ [
+ 0.6,
+ "#e6f5d0"
+ ],
+ [
+ 0.7,
+ "#b8e186"
+ ],
+ [
+ 0.8,
+ "#7fbc41"
+ ],
+ [
+ 0.9,
+ "#4d9221"
+ ],
+ [
+ 1,
+ "#276419"
+ ]
+ ],
+ "sequential": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "sequentialminus": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ]
+ },
+ "colorway": [
+ "#636efa",
+ "#EF553B",
+ "#00cc96",
+ "#ab63fa",
+ "#FFA15A",
+ "#19d3f3",
+ "#FF6692",
+ "#B6E880",
+ "#FF97FF",
+ "#FECB52"
+ ],
+ "font": {
+ "color": "#2a3f5f"
+ },
+ "geo": {
+ "bgcolor": "white",
+ "lakecolor": "white",
+ "landcolor": "#E5ECF6",
+ "showlakes": true,
+ "showland": true,
+ "subunitcolor": "white"
+ },
+ "hoverlabel": {
+ "align": "left"
+ },
+ "hovermode": "closest",
+ "mapbox": {
+ "style": "light"
+ },
+ "paper_bgcolor": "white",
+ "plot_bgcolor": "#E5ECF6",
+ "polar": {
+ "angularaxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ },
+ "bgcolor": "#E5ECF6",
+ "radialaxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ }
+ },
+ "scene": {
+ "xaxis": {
+ "backgroundcolor": "#E5ECF6",
+ "gridcolor": "white",
+ "gridwidth": 2,
+ "linecolor": "white",
+ "showbackground": true,
+ "ticks": "",
+ "zerolinecolor": "white"
+ },
+ "yaxis": {
+ "backgroundcolor": "#E5ECF6",
+ "gridcolor": "white",
+ "gridwidth": 2,
+ "linecolor": "white",
+ "showbackground": true,
+ "ticks": "",
+ "zerolinecolor": "white"
+ },
+ "zaxis": {
+ "backgroundcolor": "#E5ECF6",
+ "gridcolor": "white",
+ "gridwidth": 2,
+ "linecolor": "white",
+ "showbackground": true,
+ "ticks": "",
+ "zerolinecolor": "white"
+ }
+ },
+ "shapedefaults": {
+ "line": {
+ "color": "#2a3f5f"
+ }
+ },
+ "ternary": {
+ "aaxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ },
+ "baxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ },
+ "bgcolor": "#E5ECF6",
+ "caxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ }
+ },
+ "title": {
+ "x": 0.05
+ },
+ "xaxis": {
+ "automargin": true,
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": "",
+ "title": {
+ "standoff": 15
+ },
+ "zerolinecolor": "white",
+ "zerolinewidth": 2
+ },
+ "yaxis": {
+ "automargin": true,
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": "",
+ "title": {
+ "standoff": 15
+ },
+ "zerolinecolor": "white",
+ "zerolinewidth": 2
+ }
+ }
+ },
+ "title": {
+ "text": "Project Proposals % of Acceptance Rate by US States"
+ }
+ }
+ },
+ "text/html": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Pandas dataframe grouby count, mean: https://stackoverflow.com/a/19385591/4084039\n",
+ "\n",
+ "temp = pd.DataFrame(project_data.groupby(\"school_state\")[\"project_is_approved\"].apply(np.mean)).reset_index()\n",
+ "# if you have data which contain only 0 and 1, then the mean = percentage (think about it)\n",
+ "temp.columns = ['state_code', 'num_proposals']\n",
+ "\n",
+ "# How to plot US state heatmap: https://datascience.stackexchange.com/a/9620\n",
+ "\n",
+ "scl = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\\\n",
+ " [0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']]\n",
+ "\n",
+ "data = [ dict(\n",
+ " type='choropleth',\n",
+ " colorscale = scl,\n",
+ " autocolorscale = False,\n",
+ " locations = temp['state_code'],\n",
+ " z = temp['num_proposals'].astype(float),\n",
+ " locationmode = 'USA-states',\n",
+ " text = temp['state_code'],\n",
+ " marker = dict(line = dict (color = 'rgb(255,255,255)',width = 2)),\n",
+ " colorbar = dict(title = \"% of pro\")\n",
+ " ) ]\n",
+ "\n",
+ "layout = dict(\n",
+ " title = 'Project Proposals % of Acceptance Rate by US States',\n",
+ " geo = dict(\n",
+ " scope='usa',\n",
+ " projection=dict( type='albers usa' ),\n",
+ " showlakes = True,\n",
+ " lakecolor = 'rgb(255, 255, 255)',\n",
+ " ),\n",
+ " )\n",
+ "\n",
+ "fig = go.Figure(data=data, layout=layout)\n",
+ "offline.iplot(fig, filename='us-map-heat-map')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "IjU9Eai_oc34",
+ "outputId": "0d0e0ad1-5edc-40a3-9b52-c74ae3d82a98"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "States with lowest % approvals\n",
+ " state_code num_proposals\n",
+ "46 VT 0.800000\n",
+ "7 DC 0.802326\n",
+ "43 TX 0.813142\n",
+ "26 MT 0.816327\n",
+ "18 LA 0.831245\n",
+ "==================================================\n",
+ "States with highest % approvals\n",
+ " state_code num_proposals\n",
+ "30 NH 0.873563\n",
+ "35 OH 0.875152\n",
+ "47 WA 0.876178\n",
+ "28 ND 0.888112\n",
+ "8 DE 0.897959\n"
+ ]
+ }
+ ],
+ "source": [
+ "# https://www.csi.cuny.edu/sites/default/files/pdf/administration/ops/2letterstabbrev.pdf\n",
+ "temp.sort_values(by=['num_proposals'], inplace=True)\n",
+ "print(\"States with lowest % approvals\")\n",
+ "print(temp.head(5))\n",
+ "print('='*50)\n",
+ "print(\"States with highest % approvals\")\n",
+ "print(temp.tail(5))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "-Htj_Ldkoc3-"
+ },
+ "outputs": [],
+ "source": [
+ "#stacked bar plots matplotlib: https://matplotlib.org/gallery/lines_bars_and_markers/bar_stacked.html\n",
+ "def stack_plot(data, xtick, col2='project_is_approved', col3='total'):\n",
+ " ind = np.arange(data.shape[0])\n",
+ "\n",
+ " plt.figure(figsize=(20,5))\n",
+ " p1 = plt.bar(ind, data[col3].values)\n",
+ " p2 = plt.bar(ind, data[col2].values)\n",
+ "\n",
+ " plt.ylabel('Projects')\n",
+ " plt.title('% of projects aproved state wise')\n",
+ " plt.xticks(ind, list(data[xtick].values))\n",
+ " plt.legend((p1[0], p2[0]), ('total', 'accepted'))\n",
+ " plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ilYj-EPyoc4F"
+ },
+ "outputs": [],
+ "source": [
+ "def univariate_barplots(data, col1, col2='project_is_approved', top=False):\n",
+ " # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039\n",
+ " temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()\n",
+ " print(temp.head(20))\n",
+ " # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039\n",
+ " temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({total:'count'})).reset_index()\n",
+ " temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({Avg:'mean'})).reset_index()['Avg']\n",
+ "\n",
+ " temp.sort_values(by=['total'],inplace=True, ascending=False)\n",
+ "\n",
+ " if top:\n",
+ " temp = temp[0:top]\n",
+ "\n",
+ " stack_plot(temp, xtick=col1, col2=col2, col3='total')\n",
+ " print(temp.head(5))\n",
+ " print(\"=\"*50)\n",
+ " print(temp.tail(5))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "7YGSvPWSoc4K",
+ "outputId": "23455f42-50f9-4613-ff29-909a45ed758b"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " school_state project_is_approved\n",
+ "0 AK 290\n",
+ "1 AL 1506\n",
+ "2 AR 872\n",
+ "3 AZ 1800\n",
+ "4 CA 13205\n",
+ "5 CO 935\n",
+ "6 CT 1445\n",
+ "7 DC 414\n",
+ "8 DE 308\n",
+ "9 FL 5144\n",
+ "10 GA 3329\n",
+ "11 HI 434\n",
+ "12 IA 568\n",
+ "13 ID 579\n",
+ "14 IL 3710\n",
+ "15 IN 2214\n",
+ "16 KS 532\n",
+ "17 KY 1126\n",
+ "18 LA 1990\n",
+ "19 MA 2055\n"
+ ]
+ },
+ {
+ "ename": "NameError",
+ "evalue": "name 'total' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m~\\AppData\\Local\\Temp/ipykernel_3772/2430464069.py\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0munivariate_barplots\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mproject_data\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'school_state'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'project_is_approved'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;32mFalse\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[1;32m~\\AppData\\Local\\Temp/ipykernel_3772/3603038413.py\u001b[0m in \u001b[0;36munivariate_barplots\u001b[1;34m(data, col1, col2, top)\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mtemp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m20\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;31m# Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m \u001b[0mtemp\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'total'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mproject_data\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcol1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mcol2\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0magg\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m{\u001b[0m\u001b[0mtotal\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;34m'count'\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mreset_index\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 7\u001b[0m \u001b[0mtemp\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'Avg'\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mproject_data\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcol1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mcol2\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0magg\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m{\u001b[0m\u001b[0mAvg\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;34m'mean'\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mreset_index\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'Avg'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 8\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
+ "\u001b[1;31mNameError\u001b[0m: name 'total' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "univariate_barplots(project_data, 'school_state', 'project_is_approved', False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ulA_DTSOoc4O"
+ },
+ "source": [
+ "__Every state is having more than 80% success rate in approval__"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FtUHgSzdoc4Q"
+ },
+ "source": [
+ "### 1.2.2 Univariate Analysis: teacher_prefix"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "u3eGSWoPoc4R",
+ "outputId": "16d855f6-e1bf-4cda-d4a7-f2fcca03b79e"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " teacher_prefix project_is_approved total Avg\n",
+ "2 Mrs. 48997 57269 0.855559\n",
+ "3 Ms. 32860 38955 0.843537\n",
+ "1 Mr. 8960 10648 0.841473\n",
+ "4 Teacher 1877 2360 0.795339\n",
+ "0 Dr. 9 13 0.692308\n",
+ "==================================================\n",
+ " teacher_prefix project_is_approved total Avg\n",
+ "2 Mrs. 48997 57269 0.855559\n",
+ "3 Ms. 32860 38955 0.843537\n",
+ "1 Mr. 8960 10648 0.841473\n",
+ "4 Teacher 1877 2360 0.795339\n",
+ "0 Dr. 9 13 0.692308\n"
+ ]
+ }
+ ],
+ "source": [
+ "univariate_barplots(project_data, 'teacher_prefix', 'project_is_approved' , top=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7OVowKoToc4V"
+ },
+ "source": [
+ "### 1.2.3 Univariate Analysis: project_grade_category"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "dXV569pboc4X",
+ "outputId": "0b9c3108-164c-415c-c876-5d745842914c"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " project_grade_category project_is_approved total Avg\n",
+ "3 Grades PreK-2 37536 44225 0.848751\n",
+ "0 Grades 3-5 31729 37137 0.854377\n",
+ "1 Grades 6-8 14258 16923 0.842522\n",
+ "2 Grades 9-12 9183 10963 0.837636\n",
+ "==================================================\n",
+ " project_grade_category project_is_approved total Avg\n",
+ "3 Grades PreK-2 37536 44225 0.848751\n",
+ "0 Grades 3-5 31729 37137 0.854377\n",
+ "1 Grades 6-8 14258 16923 0.842522\n",
+ "2 Grades 9-12 9183 10963 0.837636\n"
+ ]
+ }
+ ],
+ "source": [
+ "univariate_barplots(project_data, 'project_grade_category', 'project_is_approved', top=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aqKWHWBxoc4b"
+ },
+ "source": [
+ "### 1.2.4 Univariate Analysis: project_subject_categories"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "xc5btjFqoc4d"
+ },
+ "outputs": [],
+ "source": [
+ "catogories = list(project_data['project_subject_categories'].values)\n",
+ "# remove special characters from list of strings python: https://stackoverflow.com/a/47301924/4084039\n",
+ "\n",
+ "# https://www.geeksforgeeks.org/removing-stop-words-nltk-python/\n",
+ "# https://stackoverflow.com/questions/23669024/how-to-strip-a-specific-word-from-a-string\n",
+ "# https://stackoverflow.com/questions/8270092/remove-all-whitespace-in-a-string-in-python\n",
+ "cat_list = []\n",
+ "for i in catogories:\n",
+ " temp = \"\"\n",
+ " # consider we have text like this \"Math & Science, Warmth, Care & Hunger\"\n",
+ " for j in i.split(','): # it will split it in three parts [\"Math & Science\", \"Warmth\", \"Care & Hunger\"]\n",
+ " if 'The' in j.split(): # this will split each of the catogory based on space \"Math & Science\"=> \"Math\",\"&\", \"Science\"\n",
+ " j=j.replace('The','') # if we have the words \"The\" we are going to replace it with ''(i.e removing 'The')\n",
+ " j = j.replace(' ','') # we are placeing all the ' '(space) with ''(empty) ex:\"Math & Science\"=>\"Math&Science\"\n",
+ " temp+=j.strip()+\" \" #\" abc \".strip() will return \"abc\", remove the trailing spaces\n",
+ " temp = temp.replace('&','_') # we are replacing the & value into\n",
+ " cat_list.append(temp.strip())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "3u8K7-c8oc4h",
+ "outputId": "f7ea0700-4d27-4528-d1f5-8d32d63dac1a"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Unnamed: 0 \n",
+ " id \n",
+ " teacher_id \n",
+ " teacher_prefix \n",
+ " school_state \n",
+ " project_submitted_datetime \n",
+ " project_grade_category \n",
+ " project_subject_subcategories \n",
+ " project_title \n",
+ " project_essay_1 \n",
+ " project_essay_2 \n",
+ " project_essay_3 \n",
+ " project_essay_4 \n",
+ " project_resource_summary \n",
+ " teacher_number_of_previously_posted_projects \n",
+ " project_is_approved \n",
+ " clean_categories \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 160221 \n",
+ " p253737 \n",
+ " c90749f5d961ff158d4b4d1e7dc665fc \n",
+ " Mrs. \n",
+ " IN \n",
+ " 2016-12-05 13:43:57 \n",
+ " Grades PreK-2 \n",
+ " ESL, Literacy \n",
+ " Educational Support for English Learners at Home \n",
+ " My students are English learners that are work... \n",
+ " \\\"The limits of your language are the limits o... \n",
+ " NaN \n",
+ " NaN \n",
+ " My students need opportunities to practice beg... \n",
+ " 0 \n",
+ " 0 \n",
+ " Literacy_Language \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 140945 \n",
+ " p258326 \n",
+ " 897464ce9ddc600bced1151f324dd63a \n",
+ " Mr. \n",
+ " FL \n",
+ " 2016-10-25 09:22:10 \n",
+ " Grades 6-8 \n",
+ " Civics & Government, Team Sports \n",
+ " Wanted: Projector for Hungry Learners \n",
+ " Our students arrive to our school eager to lea... \n",
+ " The projector we need for our school is very c... \n",
+ " NaN \n",
+ " NaN \n",
+ " My students need a projector to help with view... \n",
+ " 7 \n",
+ " 1 \n",
+ " History_Civics Health_Sports \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 id teacher_id teacher_prefix \\\n",
+ "0 160221 p253737 c90749f5d961ff158d4b4d1e7dc665fc Mrs. \n",
+ "1 140945 p258326 897464ce9ddc600bced1151f324dd63a Mr. \n",
+ "\n",
+ " school_state project_submitted_datetime project_grade_category \\\n",
+ "0 IN 2016-12-05 13:43:57 Grades PreK-2 \n",
+ "1 FL 2016-10-25 09:22:10 Grades 6-8 \n",
+ "\n",
+ " project_subject_subcategories \\\n",
+ "0 ESL, Literacy \n",
+ "1 Civics & Government, Team Sports \n",
+ "\n",
+ " project_title \\\n",
+ "0 Educational Support for English Learners at Home \n",
+ "1 Wanted: Projector for Hungry Learners \n",
+ "\n",
+ " project_essay_1 \\\n",
+ "0 My students are English learners that are work... \n",
+ "1 Our students arrive to our school eager to lea... \n",
+ "\n",
+ " project_essay_2 project_essay_3 \\\n",
+ "0 \\\"The limits of your language are the limits o... NaN \n",
+ "1 The projector we need for our school is very c... NaN \n",
+ "\n",
+ " project_essay_4 project_resource_summary \\\n",
+ "0 NaN My students need opportunities to practice beg... \n",
+ "1 NaN My students need a projector to help with view... \n",
+ "\n",
+ " teacher_number_of_previously_posted_projects project_is_approved \\\n",
+ "0 0 0 \n",
+ "1 7 1 \n",
+ "\n",
+ " clean_categories \n",
+ "0 Literacy_Language \n",
+ "1 History_Civics Health_Sports "
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "project_data['clean_categories'] = cat_list\n",
+ "project_data.drop(['project_subject_categories'], axis=1, inplace=True)\n",
+ "project_data.head(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "RSQAHa_woc4k",
+ "outputId": "f8364f7d-8fe9-4fe3-85f8-91343da762b8",
+ "scrolled": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " clean_categories project_is_approved total Avg\n",
+ "24 Literacy_Language 20520 23655 0.867470\n",
+ "32 Math_Science 13991 17072 0.819529\n",
+ "28 Literacy_Language Math_Science 12725 14636 0.869432\n",
+ "8 Health_Sports 8640 10177 0.848973\n",
+ "40 Music_Arts 4429 5180 0.855019\n",
+ "==================================================\n",
+ " clean_categories project_is_approved total Avg\n",
+ "19 History_Civics Literacy_Language 1271 1421 0.894441\n",
+ "14 Health_Sports SpecialNeeds 1215 1391 0.873472\n",
+ "50 Warmth Care_Hunger 1212 1309 0.925898\n",
+ "33 Math_Science AppliedLearning 1019 1220 0.835246\n",
+ "4 AppliedLearning Math_Science 855 1052 0.812738\n"
+ ]
+ }
+ ],
+ "source": [
+ "univariate_barplots(project_data, 'clean_categories', 'project_is_approved', top=20)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "H0UM0Ruyoc4o"
+ },
+ "outputs": [],
+ "source": [
+ "# count of all the words in corpus python: https://stackoverflow.com/a/22898595/4084039\n",
+ "from collections import Counter\n",
+ "my_counter = Counter()\n",
+ "for word in project_data['clean_categories'].values:\n",
+ " my_counter.update(word.split())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "XzXq5ZERoc4r",
+ "outputId": "cf0e8427-0d1c-4be4-bc6e-2e55a82823cd"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# dict sort by value python: https://stackoverflow.com/a/613218/4084039\n",
+ "cat_dict = dict(my_counter)\n",
+ "sorted_cat_dict = dict(sorted(cat_dict.items(), key=lambda kv: kv[1]))\n",
+ "\n",
+ "\n",
+ "ind = np.arange(len(sorted_cat_dict))\n",
+ "plt.figure(figsize=(20,5))\n",
+ "p1 = plt.bar(ind, list(sorted_cat_dict.values()))\n",
+ "\n",
+ "plt.ylabel('Projects')\n",
+ "plt.title('% of projects aproved state wise')\n",
+ "plt.xticks(ind, list(sorted_cat_dict.keys()))\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "-atwZbVFoc4v",
+ "outputId": "49993663-51c3-4578-bd3c-73551f00a2aa",
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Warmth : 1388\n",
+ "Care_Hunger : 1388\n",
+ "History_Civics : 5914\n",
+ "Music_Arts : 10293\n",
+ "AppliedLearning : 12135\n",
+ "SpecialNeeds : 13642\n",
+ "Health_Sports : 14223\n",
+ "Math_Science : 41421\n",
+ "Literacy_Language : 52239\n"
+ ]
+ }
+ ],
+ "source": [
+ "for i, j in sorted_cat_dict.items():\n",
+ " print(\"{:20} :{:10}\".format(i,j))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "sk6rX2awoc5y"
+ },
+ "source": [
+ "### 1.2.5 Univariate Analysis: project_subject_subcategories"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "5ikIuYryoc5z"
+ },
+ "outputs": [],
+ "source": [
+ "sub_catogories = list(project_data['project_subject_subcategories'].values)\n",
+ "# remove special characters from list of strings python: https://stackoverflow.com/a/47301924/4084039\n",
+ "\n",
+ "# https://www.geeksforgeeks.org/removing-stop-words-nltk-python/\n",
+ "# https://stackoverflow.com/questions/23669024/how-to-strip-a-specific-word-from-a-string\n",
+ "# https://stackoverflow.com/questions/8270092/remove-all-whitespace-in-a-string-in-python\n",
+ "\n",
+ "sub_cat_list = []\n",
+ "for i in sub_catogories:\n",
+ " temp = \"\"\n",
+ " # consider we have text like this \"Math & Science, Warmth, Care & Hunger\"\n",
+ " for j in i.split(','): # it will split it in three parts [\"Math & Science\", \"Warmth\", \"Care & Hunger\"]\n",
+ " if 'The' in j.split(): # this will split each of the catogory based on space \"Math & Science\"=> \"Math\",\"&\", \"Science\"\n",
+ " j=j.replace('The','') # if we have the words \"The\" we are going to replace it with ''(i.e removing 'The')\n",
+ " j = j.replace(' ','') # we are placeing all the ' '(space) with ''(empty) ex:\"Math & Science\"=>\"Math&Science\"\n",
+ " temp +=j.strip()+\" \"#\" abc \".strip() will return \"abc\", remove the trailing spaces\n",
+ " temp = temp.replace('&','_')\n",
+ " sub_cat_list.append(temp.strip())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "6YN38eProc52",
+ "outputId": "e28be892-53ef-482e-8cf2-6020343ef2fd"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Unnamed: 0 \n",
+ " id \n",
+ " teacher_id \n",
+ " teacher_prefix \n",
+ " school_state \n",
+ " project_submitted_datetime \n",
+ " project_grade_category \n",
+ " project_title \n",
+ " project_essay_1 \n",
+ " project_essay_2 \n",
+ " project_essay_3 \n",
+ " project_essay_4 \n",
+ " project_resource_summary \n",
+ " teacher_number_of_previously_posted_projects \n",
+ " project_is_approved \n",
+ " clean_categories \n",
+ " clean_subcategories \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 160221 \n",
+ " p253737 \n",
+ " c90749f5d961ff158d4b4d1e7dc665fc \n",
+ " Mrs. \n",
+ " IN \n",
+ " 2016-12-05 13:43:57 \n",
+ " Grades PreK-2 \n",
+ " Educational Support for English Learners at Home \n",
+ " My students are English learners that are work... \n",
+ " \\\"The limits of your language are the limits o... \n",
+ " NaN \n",
+ " NaN \n",
+ " My students need opportunities to practice beg... \n",
+ " 0 \n",
+ " 0 \n",
+ " Literacy_Language \n",
+ " ESL Literacy \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 140945 \n",
+ " p258326 \n",
+ " 897464ce9ddc600bced1151f324dd63a \n",
+ " Mr. \n",
+ " FL \n",
+ " 2016-10-25 09:22:10 \n",
+ " Grades 6-8 \n",
+ " Wanted: Projector for Hungry Learners \n",
+ " Our students arrive to our school eager to lea... \n",
+ " The projector we need for our school is very c... \n",
+ " NaN \n",
+ " NaN \n",
+ " My students need a projector to help with view... \n",
+ " 7 \n",
+ " 1 \n",
+ " History_Civics Health_Sports \n",
+ " Civics_Government TeamSports \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 id teacher_id teacher_prefix \\\n",
+ "0 160221 p253737 c90749f5d961ff158d4b4d1e7dc665fc Mrs. \n",
+ "1 140945 p258326 897464ce9ddc600bced1151f324dd63a Mr. \n",
+ "\n",
+ " school_state project_submitted_datetime project_grade_category \\\n",
+ "0 IN 2016-12-05 13:43:57 Grades PreK-2 \n",
+ "1 FL 2016-10-25 09:22:10 Grades 6-8 \n",
+ "\n",
+ " project_title \\\n",
+ "0 Educational Support for English Learners at Home \n",
+ "1 Wanted: Projector for Hungry Learners \n",
+ "\n",
+ " project_essay_1 \\\n",
+ "0 My students are English learners that are work... \n",
+ "1 Our students arrive to our school eager to lea... \n",
+ "\n",
+ " project_essay_2 project_essay_3 \\\n",
+ "0 \\\"The limits of your language are the limits o... NaN \n",
+ "1 The projector we need for our school is very c... NaN \n",
+ "\n",
+ " project_essay_4 project_resource_summary \\\n",
+ "0 NaN My students need opportunities to practice beg... \n",
+ "1 NaN My students need a projector to help with view... \n",
+ "\n",
+ " teacher_number_of_previously_posted_projects project_is_approved \\\n",
+ "0 0 0 \n",
+ "1 7 1 \n",
+ "\n",
+ " clean_categories clean_subcategories \n",
+ "0 Literacy_Language ESL Literacy \n",
+ "1 History_Civics Health_Sports Civics_Government TeamSports "
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "project_data['clean_subcategories'] = sub_cat_list\n",
+ "project_data.drop(['project_subject_subcategories'], axis=1, inplace=True)\n",
+ "project_data.head(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "nZBLRgNGoc55",
+ "outputId": "835ebf56-cb95-4a00-e3f9-721b6e2596b3"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " clean_subcategories project_is_approved total Avg\n",
+ "317 Literacy 8371 9486 0.882458\n",
+ "319 Literacy Mathematics 7260 8325 0.872072\n",
+ "331 Literature_Writing Mathematics 5140 5923 0.867803\n",
+ "318 Literacy Literature_Writing 4823 5571 0.865733\n",
+ "342 Mathematics 4385 5379 0.815207\n",
+ "==================================================\n",
+ " clean_subcategories project_is_approved total Avg\n",
+ "196 EnvironmentalScience Literacy 389 444 0.876126\n",
+ "127 ESL 349 421 0.828979\n",
+ "79 College_CareerPrep 343 421 0.814727\n",
+ "17 AppliedSciences Literature_Writing 361 420 0.859524\n",
+ "3 AppliedSciences College_CareerPrep 330 405 0.814815\n"
+ ]
+ }
+ ],
+ "source": [
+ "univariate_barplots(project_data, 'clean_subcategories', 'project_is_approved', top=50)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "l0hscXcToc57"
+ },
+ "outputs": [],
+ "source": [
+ "# count of all the words in corpus python: https://stackoverflow.com/a/22898595/4084039\n",
+ "from collections import Counter\n",
+ "my_counter = Counter()\n",
+ "for word in project_data['clean_subcategories'].values:\n",
+ " my_counter.update(word.split())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "f8dg8oiroc5-",
+ "outputId": "2bddce12-5786-464b-cf10-a6b0c15b0d07"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# dict sort by value python: https://stackoverflow.com/a/613218/4084039\n",
+ "sub_cat_dict = dict(my_counter)\n",
+ "sorted_sub_cat_dict = dict(sorted(sub_cat_dict.items(), key=lambda kv: kv[1]))\n",
+ "\n",
+ "\n",
+ "ind = np.arange(len(sorted_sub_cat_dict))\n",
+ "plt.figure(figsize=(20,5))\n",
+ "p1 = plt.bar(ind, list(sorted_sub_cat_dict.values()))\n",
+ "\n",
+ "plt.ylabel('Projects')\n",
+ "plt.title('% of projects aproved state wise')\n",
+ "plt.xticks(ind, list(sorted_sub_cat_dict.keys()))\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "fgTozC6woc6B",
+ "outputId": "57592fd7-0b73-4fe3-c220-485e61100e36"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Economics : 269\n",
+ "CommunityService : 441\n",
+ "FinancialLiteracy : 568\n",
+ "ParentInvolvement : 677\n",
+ "Extracurricular : 810\n",
+ "Civics_Government : 815\n",
+ "ForeignLanguages : 890\n",
+ "NutritionEducation : 1355\n",
+ "Warmth : 1388\n",
+ "Care_Hunger : 1388\n",
+ "SocialSciences : 1920\n",
+ "PerformingArts : 1961\n",
+ "CharacterEducation : 2065\n",
+ "TeamSports : 2192\n",
+ "Other : 2372\n",
+ "College_CareerPrep : 2568\n",
+ "Music : 3145\n",
+ "History_Geography : 3171\n",
+ "Health_LifeScience : 4235\n",
+ "EarlyDevelopment : 4254\n",
+ "ESL : 4367\n",
+ "Gym_Fitness : 4509\n",
+ "EnvironmentalScience : 5591\n",
+ "VisualArts : 6278\n",
+ "Health_Wellness : 10234\n",
+ "AppliedSciences : 10816\n",
+ "SpecialNeeds : 13642\n",
+ "Literature_Writing : 22179\n",
+ "Mathematics : 28074\n",
+ "Literacy : 33700\n"
+ ]
+ }
+ ],
+ "source": [
+ "for i, j in sorted_sub_cat_dict.items():\n",
+ " print(\"{:20} :{:10}\".format(i,j))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YqtXGwRUoc6E"
+ },
+ "source": [
+ "### 1.2.6 Univariate Analysis: Text features (Title)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "UQHyF0ASoc6F",
+ "outputId": "ecd2350b-7399-4456-ddf8-34389eee54fa"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#How to calculate number of words in a string in DataFrame: https://stackoverflow.com/a/37483537/4084039\n",
+ "word_count = project_data['project_title'].str.split().apply(len).value_counts()\n",
+ "word_dict = dict(word_count)\n",
+ "word_dict = dict(sorted(word_dict.items(), key=lambda kv: kv[1]))\n",
+ "\n",
+ "\n",
+ "ind = np.arange(len(word_dict))\n",
+ "plt.figure(figsize=(20,5))\n",
+ "p1 = plt.bar(ind, list(word_dict.values()))\n",
+ "\n",
+ "plt.ylabel('Numeber of projects')\n",
+ "plt.title('Words for each title of the project')\n",
+ "plt.xticks(ind, list(word_dict.keys()))\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "51BD8dOioc6I"
+ },
+ "outputs": [],
+ "source": [
+ "approved_word_count = project_data[project_data['project_is_approved']==1]['project_title'].str.split().apply(len)\n",
+ "approved_word_count = approved_word_count.values\n",
+ "\n",
+ "rejected_word_count = project_data[project_data['project_is_approved']==0]['project_title'].str.split().apply(len)\n",
+ "rejected_word_count = rejected_word_count.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "PhMvROIvoc6L",
+ "outputId": "d9d2855a-b5f9-4fe7-8879-6eab5b3cde28"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# https://glowingpython.blogspot.com/2012/09/boxplot-with-matplotlib.html\n",
+ "plt.boxplot([approved_word_count, rejected_word_count])\n",
+ "plt.xticks([1,2],('Approved Projects','Rejected Projects'))\n",
+ "plt.ylabel('Words in project title')\n",
+ "plt.grid()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ifNxNmA0oc6O",
+ "outputId": "682ce3fd-d1fb-4523-c356-584f444aea61"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAl0AAADFCAYAAABuKEcsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzs3Xd81EX++PHX7KZueiUhARJ6b1KkChYQC4qiYvc8z/Msp56e5YrtTu9OvWL9cd6pd18LoKCIZy8U6b0HkFCSkN7rJtns/P747C4JScgm2d1EeT8fDx6Qz87MTgImb98z8x6ltUYIIYQQQniXqasnIIQQQghxJpCgSwghhBDCByToEkIIIYTwAQm6hBBCCCF8QIIuIYQQQggfkKBLCCGEEMIHJOgSQgghhPABCbqEEEIIIXxAgi4hhBBCCB/w6+oJnCo2NlanpKR09TSEEEIIIdq0bdu2Qq11nDttu13QlZKSwtatW7t6GkIIIYQQbVJKHXe3rSwvCiGEEEL4gARdQgghhBA+IEGXEEIIIYQPdLs9XUIIIYQn1dfXk5WVhdVq7eqpiB+woKAgkpOT8ff37/AYEnQJ0QkNdk1GcTWpsSFdPRUhRCuysrIICwsjJSUFpVRXT0f8AGmtKSoqIisri9TU1A6PI8uLQnTCki2ZnP+31ZworenqqQghWmG1WomJiZGAS3SYUoqYmJhOZ0sl6BKiE1YfyqfBrtl6rLirpyKEOA0JuERneeLfkARdQnRQg12z8YgRbG09VtLFsxFCCNHdSdAlRAel5ZRTVlOPv1mx7bgEXUKI0/vwww9RSnHgwIGunopbQkNDW3xuNpsZPXo0w4cP56qrrqK6urpd41500UWUlpa2ez6rVq1i/fr17e7XnUjQJUQHrU8vBGD+WckcyC2nstbWxTMSQnRnixYtYurUqSxevNhjY9psvv++ExwczM6dO9m7dy8BAQEsXLiwyetaa+x2e6v9P/30UyIjI9v9vj+GoEtOLwrRQRvSi+gbF8Kc4Yks2pzJrsxSpvSP7eppCSFO48mP97E/u9yjYw7tGc7jlw47bZvKykrWrVvHypUrmTt3Lk888QRgBBKPPfYYMTExHDx4kOnTp/Pqq69iMpkIDQ3l5z//OStXriQqKorFixcTFxfHjBkzmDx5MuvWrWPu3LnMnz+fW2+9lYKCAuLi4njzzTeJiIhg1KhRHDlyBJPJRHV1NYMGDeLIkSNkZGRw1113UVBQgMVi4V//+heDBw/m6NGjXHfdddhsNi688EK3Pvdp06axe/dujh07xpw5c5g5cyYbNmxg+fLlrF+/nmeeeQatNRdffDF/+ctfgJPX/cXGxvL222/z4osvUldXx8SJE3n11Vcxm818/vnn/OY3v6GhoYHY2Fhef/11Fi5ciNls5u233+all14iNzeXJ598ErPZTEREBGvWrOnU36MvSKZLiA6ob7Cz+Wgxk/vFMLp3JErJvi4hROuWL1/OhRdeyMCBA4mOjmb79u2u1zZv3sxf//pX9uzZQ3p6Oh988AEAVVVVjB07lu3bt3POOefw5JNPuvqUlpayevVqHnjgAe6++25uuukmdu/ezfXXX88vf/lLV9C1evVqAD7++GNmz56Nv78/t99+Oy+99BLbtm3j+eef58477wTg3nvv5Re/+AVbtmwhISGhzc/JZrPx2WefMWLECAAOHjzITTfdxI4dO/D39+fhhx/m22+/ZefOnWzZsoXly5c36Z+WlsaSJUtYt24dO3fuxGw2884771BQUMDPfvYzli1bxq5du3j//fdJSUnhjjvu4P7772fnzp1MmzaNp556ii+++IJdu3axYsWKzv0F+YhkuoTogN1ZZVTVNTC5XyzhQf4M6hHGtgwJuoTo7trKSHnLokWLuO+++wBYsGABixYtYuzYsQBMmDCBvn37AnDttdeydu1a5s+fj8lk4pprrgHghhtu4IorrnCN53wOsGHDBlegduONN/LQQw+52ixZsoSZM2eyePFi7rzzTiorK1m/fj1XXXWVq39tbS0A69atY9myZa5xHn744RY/l5qaGkaPHg0Yma6f/vSnZGdn06dPH84++2wAtmzZwowZM4iLiwPg+uuvZ82aNVx++eWucb755hu2bdvG+PHjXePGx8ezceNGpk+f7qqHFR0d3eI8pkyZwi233MLVV1/d5GvTnUnQJUQHbHDs5zq7bwwAZ/WJYsXObOx2jcnUdUfTd2SUEBsaSHJUsByRF6KbKCoq4ttvv2Xv3r0opWhoaEApxbPPPgs0L0XQ2n+7jZ+HhLRekNnZbu7cuTz66KMUFxezbds2zj33XKqqqoiMjGTnzp1tvkdrnHu6TtV4TlrrNsfRWnPzzTfzpz/9qcnzFStWuDWPhQsXsmnTJj755BNGjx7Nzp07iYmJabNfV5LlRSE6YMORIgYnhBEdEgAYQVdFrY1D+RVdNqe9J8qY9+p6pj27ktFPfcX1/97IX788SJ2t9Q2tQgjvW7p0KTfddBPHjx/n2LFjZGZmkpqaytq1awFjefHo0aPY7XaWLFnC1KlTAbDb7SxduhSAd9991/X8VJMnT3Ztzn/nnXdc7UJDQ5kwYQL33nsvl1xyCWazmfDwcFJTU3n//fcBI/DZtWsXYGSOGo/TGRMnTmT16tUUFhbS0NDAokWLOOecc5q0Oe+881i6dCn5+fkAFBcXc/z4cSZNmsTq1as5evSo6zlAWFgYFRUnv8emp6czceJEnnrqKWJjY8nMzOzUnH3BraBLKXWhUuqgUuqwUuqRFl6/Qym1Rym1Uym1Vik1tNFrjzr6HVRKzfbk5IXoCtb6BrYeK2Fyv5Ob5s/qEwXQpaUjDuYa34zuO38AF41IoKiyjpe+PSzlLIToYosWLWLevHlNnl155ZW8++67AEyaNIlHHnmE4cOHk5qa6mobEhLCvn37OOuss/j222957LHHWhz/xRdf5M0332TkyJG89dZbvPDCC67XrrnmGt5+++0my5HvvPMOr7/+OqNGjWLYsGF89NFHALzwwgu88sorjB8/nrKysk59zomJifzpT39i5syZjBo1irFjx3LZZZe5XldKMXToUP74xz8ya9YsRo4cyQUXXEBOTg5xcXG89tprXHHFFYwaNco190svvZQPP/yQ0aNH89133/HrX/+aESNGMHz4cKZPn86oUaM6NWdfUG2lAJVSZuAQcAGQBWwBrtVa72/UJlxrXe7481zgTq31hY7gaxEwAegJfA0M1Fo3tPZ+48aN01u3bu3cZyWEF21IL+Laf23k3zeN4/yhPQDj/xbHP/010wfG8berR3fJvP765UFeWXmYA3+YQ4CficziaqY9u5I/XzGCBRN6d8mchOgO0tLSGDJkSFdPo0WrVq3i+eef53//+1+z10JDQ6msrOyCWXlPQ0MD8fHx5Obmduri6K7S0r8lpdQ2rfU4d/q7k+maABzWWh/RWtcBi4HLGjdwBlwOIYAzkrsMWKy1rtVaHwUOO8YT4gdrw5EiTAom9D25uVMpxdjeUWzvwqzS0cIqkqMsBPgZ/1n3jAzG36w4VtS+woVCCOEtw4YN47bbbvtBBlye4M5G+iSg8UJpFjDx1EZKqbuAXwEBwLmN+m48pW9SC31vB24H6N1b/o9cdG8b0gsZkRRBeFDTbxrjUqL4cn8eBRW1xIUF+nxex4qqSIk9uZHVbFL0irKQUVzl87kIIdwzY8YMZsyY0eJrP7YsF/CDqcbvLe5kulo6QtBsTVJr/YrWuh/wMPC7dvZ9TWs9Tms9znm8VIjuqLrOxo6MUib1a14E1bmva3sXlI7QWnOssJrUGEuT531iLBwrlEyXEEJ0B+4EXVlAr0YfJwPZp2m/GHAW4mhvXyG6tR0ZpdjsmrP7Nq8bM6xnBAFmU5csMRZW1lFZa2uS6QLoExNCRnG1W8e3hRBCeJc7QdcWYIBSKlUpFQAsAJqUflVKDWj04cXA944/rwAWKKUClVKpwABgc+enLUTXyCw2skb945tfBBvkb2Z4UniXnBY8VmQsITYPuixU1tooqqrz+ZyEEEI01WbQpbW2AXcDXwBpwHta631KqaccJxUB7lZK7VNK7cTY13Wzo+8+4D1gP/A5cNfpTi4K0ZLqOhu7s0ppsHd9tiav3KjcHB8W1OLrZ/WJYveJMmwNvq2NdbTQCLpSY5oGXSmOj48Xyb4uIYToam7V6dJaf6q1Hqi17qe1ftrx7DGt9QrHn+/VWg/TWo/WWs90BFvOvk87+g3SWn/mnU9D/Ji9/t1R5r68jsl//oZnPk1jf3Z5ly2X5VVYiQkJcJ0QPFXvmBDqbHafZ5aOFVbhZ1IkRwWfMh9jj9dxOcEoRJdSSvHAAw+4Pn7++eddl163Zvny5ezfv/+0bUaNGsW1117riSl63RNPPMHzzz/f4vOkpCRGjx7N8OHD232P4ooVK/jzn//coTk988wzHerXUVKRXnR7+3PKiQ0NZERSJG+sPcpFL37H3e/u6JK55JdbiQ9vOcsF0MNxajHfkRHzlWNFVfSKtuBnbvqfdHJUMCaFlI0QoosFBgbywQcfUFhY6HaftoKutLQ07HY7a9asoarKM9nshoauWYxyXmT9/vvvc+utt2K3N10tsNlsrfadO3cujzzSrG67W3wddMndi6LbSy+oZHSvSP598ziKq+p48uN9fLwrm1pbA4F+Zp/OJa+8lh7hrZeDcAZkeeVWRhDhq2lxtLCalFNOLgIE+plJjAgmQ5YXhTB89gjk7vHsmAkjYM7pMy1+fn7cfvvt/P3vf+fpp59u8trx48e59dZbKSgoIC4ujjfffJOsrCxWrFjB6tWr+eMf/8iyZcvo169fk37vvvsuN954I2lpaaxYscKV8ZoxYwajR49m8+bNlJeX88YbbzBhwgSeeOIJ0tPTOXHiBJmZmTz00EP87Gc/Y9WqVTz55JMkJiayc+dO9u/fz9/+9jfeeOMNAG677Tbuu+8+Hn74Yfr06cOdd94JGBmqsLAwHnjgAZ577jnee+89amtrmTdvHk8++SQATz/9NP/3f/9Hr169iIuL46yzzjrt12nIkCH4+flRWFjIQw89RHR0NDt27GDs2LH89re/5dZbb+XIkSNYLBZee+01Ro4cyX/+8x+2bt3Kyy+/TEFBAXfccQcZGRkA/OMf/2DKlClUVlZyzz33sHXrVpRSPP7442zZssV1efewYcN47bXXuPrqq8nKyqKhoYHf//73TSr5e4IEXaJbszXYOVZYzczB8QBEhwQwc1A8H+3MJrO4mv7xYT6dT165laGJ4a2+7gzI8it8l+nSWnO8qKrFE5UAKbEWyXQJ0Q3cddddjBw5koceeqjJ87vvvpubbrqJm2++mTfeeINf/vKXLF++nLlz53LJJZcwf/78FsdbsmQJX331FQcPHuTll19ussxYVVXF+vXrWbNmDbfeeit79+4FYPfu3WzcuJGqqirGjBnDxRdfDBj3P+7du5fU1FS2bdvGm2++yaZNm9BaM3HiRM455xwWLFjAfffd5wq63nvvPT7//HO+/PJLvv/+ezZv3ozWmrlz57JmzRpCQkJYvHgxO3bswGazMXbs2DaDrk2bNmEymXCWjzp06BBff/01ZrOZe+65hzFjxrB8+XK+/fZbbrrppmYXb997773cf//9TJ06lYyMDGbPnk1aWhp/+MMfiIiIYM8eI+AuKSnhyiuv5OWXX3aNsWzZMnr27Mknn3wC0OmrkFoiQZfo1rJKaqhrsNM/7uRpQecJvSMFVT4NumwNdgorT5/pig0NRCkjOPOV/IpaqusaSD3l5KJT7+gQPt+b47P5CNGttZGR8qbw8HBuuukmXnzxRYKDT+6/3LBhAx988AEAN954Y7OgrCVbtmwhLi6OPn36kJyczK233kpJSQlRUUa9QGcANn36dMrLyyktLQXgsssuIzg4mODgYGbOnMnmzZuJjIxkwoQJpKamArB27VrmzZtHSIjxPeWKK67gu+++45e//CX5+flkZ2dTUFBAVFQUvXv35sUXX+TLL79kzJgxgFHU9fvvv6eiooJ58+ZhsRhZ+Llz59Kav//977z99tuEhYWxZMkSlDLKfF511VWYzWbXvJYtWwbAueeeS1FRUbPA6Ouvv26yJFteXk5FRQVff/216zJvwPV1amzEiBE8+OCDPPzww1xyySVMmzatzb+H9pKgS3Rrh/ONisz9GpVocJ7QO+bjJbOiqjrsmtPu6fI3m4i2BPg00+U8uZgS03LQlRJjoaS6nrKaeiKCz8yrN4ToLu677z7Gjh3LT37yk1bbOAOO01m0aBEHDhwgJSUFMIKLZcuWcdttt7U4hvPj1p47AyzgtAeV5s+fz9KlS8nNzWXBggWu9o8++ig///nPm7T9xz/+4dbnAsaergcffLDZ87bmder4drudDRs2NAlqnX3bmsvAgQPZtm0bn376KY8++iizZs1q9ZLxjpKN9KJbSy9wBF2xJ4OuCIs/0SEBHPVxpXVn9qrHaYIuMIKyggrfZbqOOctFtJLp6uMIxjJkiVGILhcdHc3VV1/N66+/7no2efJkVxbmnXfeYerUqQCEhYVRUVHRbAy73c7777/P7t27OXbsGMeOHeOjjz5i0aJFrjZLliwBjOxQREQEERHGHtOPPvoIq9VKUVERq1atYvz48c3Gnz59OsuXL6e6upqqqio+/PBDV9ZnwYIFLF68mKVLl7qWPWfPns0bb7zhurboxIkT5OfnM336dD788ENqamqoqKjg448/7tTXbvr06bzzzjuAcVF4bGws4eFNt3vMmjWLl19+2fWxc+nw1OclJUY9RX9/f+rr6wHIzs7GYrFwww038OCDD7J9+/ZOzbclEnSJbi29oJLY0EAiLE0zNCkxFo4W+vZeMmeNrtMtLwLEhwW62vrC0aIqAswmekYGt/h6H2fZCLmDUYhu4YEHHmhyivHFF1/kzTffZOTIkbz11lu88MILgBHgPPfcc4wZM4b09HRX+zVr1pCUlERS0smrjKdPn87+/fvJyTG2EkRFRTF58mTuuOOOJgHehAkTuPjiizn77LP5/e9/T8+ePZvNb+zYsdxyyy1MmDCBiRMnctttt7mWDocNG0ZFRQVJSUkkJiYCRkBz3XXXMWnSJEaMGMH8+fOpqKhg7NixXHPNNYwePZorr7yy08t1TzzxBFu3bmXkyJE88sgj/Pe//3W95sxivfjii642Q4cOZeHChQD87ne/o6SkhOHDhzNq1ChWrlwJwO23387IkSO5/vrr2bNnDxMmTGD06NE8/fTT/O53v2s+iU5S3e16kHHjxumtW7d29TREN3Hl/1uPv1mx+PZJTZ4/8N4u1h0uZONvzvPZXN7eeJzfLd/Lpt+cd9ps10NLd7H6UAGbfnO+T+b187e2cji/km8emNHi69V1NoY+9gW/nj2Iu2b298mchOhO0tLSGDJkSFdPw2dmzJjB888/z7hx45o8f+KJJwgNDW1xGe+H7K9//Svl5eWuE5Pe1NK/JaXUNq31uFa6NCGZLtFtaa05nF9Jv7jmV+6kxlrILbdSXdd67RZPyy+3YlIQExJw2nbxYUEUVNT6rIL+scLqVpcWASwBfsSFBbqWIYUQ4sdi4cKF/Oc//+GGG27o6qm4RTbSi26rqKqOspr6FoMu5wnGY4XVDO3ZegkHT8orryU2NLBZAdJT9QgPxK6hqKq21euCPMVu1xwrqmLagNjTtkuJsXC8WPZ0CXEmWLVqVYvP26qA/0N0xx13cMcdd3T1NNwmmS7RbaW3cHLRyZnZ8eUJxrwKa5ub6AHiHIGWL6rS55ZbqbXZm110fare0SFy/6I4o3W3rTTih8cT/4Yk6BLdVnqBEST0byHocpZHOOrDJbO2qtE7nSyQ6v0TjG2dXHRKibGQV15LTZ3cNy/OPEFBQRQVFUngJTpMa01RURFBQZ1bvZDlRdFtHc6vJNjfTGIL2aWQQD96hAf6NOjKL7cypndkm+2cdbx8kek66shetZnpcpxgzCiuZlCCb6v4C9HVkpOTycrKoqCgoKunIn7AgoKCSE5O7tQYEnSJbiu9oJK+cSGYTC0XtEuJCfFZ0FVns1NUVUcPN/ZoxYUamS5flI04VlhFoJ+pxcC0MWdm8HhRlQRd4ozj7+/vqrYuRFeS5UXRbaUXtHxy0Sk1NsRnJ/IKKt2r0QUQ4GciOiTAJ8uLRwur6RNjaTUwdToZdMlmeiGE6CpuBV1KqQuVUgeVUoeVUo+08PqvlFL7lVK7lVLfKKX6NHqtQSm10/FrhScnL368auoaOFFa02bQ5Tzh6G3uVqN38lWB1GNFVa1e/9NYhMWfiGB/n1+dJIQQ4qQ2gy6llBl4BZgDDAWuVUoNPaXZDmCc1noksBR4ttFrNVrr0Y5frd92KUQjRwor0brlTfROJ8tGeD+QyHcEXfFuZLqMdt6/CqjBrskoOn2NrsZSYixkSNkIIYToMu5kuiYAh7XWR7TWdcBi4LLGDbTWK7XWzu/mG4HO7TQTZzznycV+8a0HFL4sG3HyCqDuk+nKLbdS12B33a3Ylt4xIZLpEkKILuRO0JUEZDb6OMvxrDU/BT5r9HGQUmqrUmqjUuryljoopW53tNkqp0sEGDW6lOK0S2e9oy0o5ZuyEXnlVvxMimjL6avRO/UID6Swsha7F6vSO5c8EyLcy76lxFg4UVJDnc3utTkJIYRonTtBV0s7dFv8SaKUugEYBzzX6HFvx51E1wH/UEr1azaY1q9prcdprcfFxcW5MSXxY5deUEmvKAtB/uZW2wT5m+kZEeyjoKuW+LDANjesO8WHBWGza4qr67w2J2dJCner3veKtmDXkFNW47U5nU56QaXPrkYSQojuyJ2gKwvo1ejjZCD71EZKqfOB3wJztdaudRWtdbbj9yPAKmBMJ+YrzhCH8ytPu5/LqW+cb04w5ldYXfW33BEf5iwb4b19Xc49Y873cndO+RXe3+B/qqOFVVzwt9W8vfG4z99bCCG6C3eCri3AAKVUqlIqAFgANDmFqJQaA/wTI+DKb/Q8SikV6PhzLDAF2O+pyYsfpwa75mhhFf3i2t6rlBITwpHCKq9Xms4rt7pVLsLJVSDViwFOfkWtcQF3qLtBlzGngi4Iuj7dk4Ndwyd7cnz+3kII0V20GXRprW3A3cAXQBrwntZ6n1LqKaWU8zTic0Ao8P4ppSGGAFuVUruAlcCftdYSdInTyi6todZmP225CKeU2BAqrDaKq7y3jAfOK4Dan+nK92KmK7+8lpjQQMzuLnmGe39OrflsrxFsbT1WTFGl74M+IYToDtyqSK+1/hT49JRnjzX68/mt9FsPjOjMBMWZ53BB6xddn6pvoxOM7mZ82sta30BZTX37gi5XgOPNTJfV7aVFgGhLAH4m5fPlxYyiavaeKGfemCQ+3HGCbw7kc/W4Xm13FEKIHxmpSC+6HXcvcYaTtbqOFHhvX9fJDevuBziBfmYiLf7kebFWV35FbbvmZDIpYkMDfR50ObNcv7pgIEmRwXy5L9en7y+EEN2FBF2i28ktt+JvVsSEtF2eITkqGLNJebX+lDNwak+mC6BHWJCXM121bp9cdIoP933Q9eneXEYmR9Ar2sIFQ3uw5vtCqmptPp2DEEJ0BxJ0iW4nv9wIJpRqe6+Sv9lE72gLxwq9V2m9vVcAOXkzwGmwa4oqa92ukO+aU1igT/d0ZZVUsyuzlDnDEwGYPSyBOpudNYekHp8Q4swjQZfodnLLrCREuB/gpMRYvFqr62Q1+hYCHGsZ7HgbbM2Dq/iwIK8FOEWVtdh1+5Y8AeLCAn16evHzvcZS4pzhCQCMT4kiyuLPl/vzfDYHIYToLiToEt1OXoWVhJaySvaWK6knRgaT69VTglYC/ExEBPs3faG2At6+Ej66C774TbN+8eGBFHipKr0zgxbXzuXFuLAgiqvrqG/wTVX6z/bmMjQx3LX3zs9s4rwhPfgmLc9ncxBCiO5Cgi7R7eSVWZsvm+XsgmdT4KvHwN7Q5KWE8CCKq+qotTV97rH5OGp0NVnurKuGd6+BE9uh//mw5d+wZ2mTfvFhgdQ3aEq8UJU+v6J9F3A3npPWUFTp3RIbYGQstx0v4aIRCU2ezxrag3KrjU1Hir0+ByGE6E4k6BLdSmWtjaq6hqaZrgYbrLgH6q2w7gVYcgPUVrpedrb11qb1vPJaejTOKNVbYfF1cHw9XPEaXLsYek+CFb+EgoOuZj28WCC1IycqG7fP9+KpSqfPHacW54xIbPJ8+sA4gv3NfLlfTjEKIc4sEnSJbiW3rIVN65v+n5HpuuKfMOc5OPQ5vDEbSo172J3ZHm9duZNXYT05H7sdlt4KR1bC3JdgxHww+8P8N8A/GJbc6AoIvXkV0MnlxXYGXV4OUBv7dG8ug3qENStyG+RvZvrAWL7cl+fVC8GFEKK7kaBLdCvNTgoWH4Vvn4ZBF8HQy2Hi7XD9+1CaAf8+D6qLXZvuvbWvK7+80SnBrM1w8BM473EYe+PJRuE9Yf7rUHgIPvlVk8/BK5muCiuRFn8C/Vq5EDz9W1jzfLN9cL66f9Fa38CWY8VcMLRHi6/PHpZAbrmVvdllXp2HEEJ0JxJ0iW7FGXQlRASB1vC/+8HkBxc9D849Vf3Ph+uXQmUe7PvQtbyY54XsTWWtjcpa28kgcN+HYA6ECT9r3rjvDJh6H+xeAqUZriyUN04LGmU1Wshy5e03Nve/NQ++/QPseKvJy7GhvllezCiuRmsY0KPlWwXGp0QDkJZT7tV5CCFEdyJBl+hWcl2ZrkAjeDmyEs5/HCKSmjbsNQFiB8Ge94kI9ifAz+SdZbzG87HbYd9yGHABBIa13GGMI/uV9j+C/M1EBPt7bXmxSWFUWx18fB8snAJZW2DWH419Zl8/AdUnN6wH+JmIDgnweqYro8iom9Y72tLi64kRQfiZFBnF3quvJoQQ3Y0EXaJbyS+vJSzID4ufgi9/D8kTYNxPmzdUCkZeBRkbUGWZJIQHufaDeXQ+Fc4N60GQsQEqc2HYvNY7xPSD+GGQ9rGjX6BX9k8VnHoF0O7FsO1NGH8b/HInTL4HLnoOrKWw8pkmfeNCvV+r67gjmOoT0/JVTn5mE0lyNmykAAAgAElEQVRRwWQU13h1HkII0Z1I0CW6ldwyx6b1E9ugKh/OvgNMrfwzHXGV8fuepSSEB3l1w3p8WKCxtOgXDAMvPH2nIZc6ArR84sMDPX7/otaagopa4hqXi9j2H4gbDHOeBYuxdEfCCCMI2/o65Ox2NfXFVUCZxdWEBvoRZfFvtU3vaItkuoQQZxQJukS34iqM+v1XoEzQd2brjaNSjEzYnveN4MYLQZczIxQX4gf7P4KBsyCw5X1KLkMuATQc/NQr9y+WVtdT12A/ubyYu8cIUs+65eS+N6eZv4XgaPj018YeORxV6b18FVBGcTW9oy2nvcqpV7SFTAm6hBBnEAm6RLfiKox6+CtIHn8ya9OakVdD/n5G+mWRW25Fa8+WICioqCXAbCIif7OReRt2Rdudegw3AsK0j4kLN5byPDmvJtk3gG3/NTb3j7ymeePgSDj/CcjcCLvfc/QLoqDSs3M61fGiqlb3czn1jrZQXFVHhbXea/MQQojuxK2gSyl1oVLqoFLqsFLqkRZe/5VSar9SardS6hulVJ9Gr92slPre8etmT05e/LjY7Zr8ilr6BldD9g7of0HbnYbNA2VmYtU3WOvtlFttHp1TQUUtcWGBqP3Lwd8CA2a13UkpY4nxyGqSgo2sVFmN5wILVzX6sECjMv7uJTD0stYD1NHXQ8+xsOoZ0LpRpXzvBDt2uyazpIY+MW0HXQCZsq9LCHGGaDPoUkqZgVeAOcBQ4Fql1NBTmu0AxmmtRwJLgWcdfaOBx4GJwATgcaVUlOemL35MiqrqsNk1o+u2GQ8GnN92p5BY6H8eA/K/QGH3+BJjfoWV+FA/2L/C2MsVcPpAwmXwpWCvZ1jlRsCzZSNc1ejDg4x9ZrXlxtJia0wmGHsTlByDggOummPeKhuRV2GlzmanlxuZLkD2dQkhzhjuZLomAIe11ke01nXAYuCyxg201iu11s7vnBuBZMefZwNfaa2LtdYlwFdAG7uQxZnKGTD1Ld0IIXGQMMq9jiOuwlKTw3h10ONBV0FFLVP90qC6EIa7sbTolDweQnuQUvAt4NlipE2WF7f/F2IGQJ/Jp+80cLbx+8HPXHvBvFWV3lkuos1MV4wz0yVBlxDizOBO0JUEZDb6OMvxrDU/BT5rT1+l1O1Kqa1Kqa0FBQVuTEn8GOWVWzFhJz5/rVEAtbVTi6cadBF2v2AuN6/zeNmIwspaptV9BwGhxpzcZTLB4EuIOrGKQOo8m+mqsBISYCak9BBkbmp5A/2pwntC4ig49IVXi7bCyXIRbe3pCg/yJ9LiL5kuIcQZw52fai19N29xB65S6gZgHPBce/pqrV/TWo/TWo+Li4tzY0rixyi33MoolY5fbWn7ApzAUPSgi5lj3kx+WZXH5mNrsFNcZWV4xVpjadE/uH0DDLkUk62G6abdHl3Ky6+oNZYWt/8XzAEw6lr3Og6cA1mbiTdXusbxhsziaswmRc/Itr9eUjZCCHEmcSfoygJ6Nfo4Gcg+tZFS6nzgt8BcrXVte/oKAcY1PjPMu9DKBP3ObVdf8+A5RKlKzLm7227spuKqOoaQgaW+xKhC314pU9FBkVzst9WjWaWC8loSQk2wa7GxYT8kxr2OA2eDthOSsZKQALPX9nRlFFfTMzIIf3Pb316kbIQQ4kziTtC1BRiglEpVSgUAC4AVjRsopcYA/8QIuPIbvfQFMEspFeXYQD/L8UyIZvLKrJzvvxuVNK7tUhGn6jsDO4qEwvUem09+RS1TTXtc47eb2R81YBbTTbsoKPfcCb38CiuT/A4Z1eaHX+l+x8TREJpg7OsKD/Japut4UTV9oluuRH+q3tEWskpqaLB7r3yFEEJ0F20GXVprG3A3RrCUBryntd6nlHpKKTXX0ew5IBR4Xym1Uym1wtG3GPgDRuC2BXjK8UyIZqpLcxmi0zuWVQqJ5XhAfwZUbPbYfAocQVdN1CAIS+jYIP1mEk0ZQSUHPDav/IpaxtdtNZYWU89xv6PJZBR3Tf+WhFATBV7aSJ9ZXN3myUWn3tEW6ho8f+pUCCG6I7d2KmutP9VaD9Ra99NaP+149pjW2hlcna+17qG1Hu34NbdR3ze01v0dv970zqchfgx6FW/AhG7ffq5GjoZPZJAtDazlHplPcWkZE0wHsfVpR2Bzqr4zAEgp2+KROVXW2qiua2Bw5QZImdp2dfxTDZwDteVM8jvoleXFylobRVV1bW6id5KyEUKIM4lUpBfdxvCaLVT6RRnLYB1Q2GMqfthpSF/tkfn4ndhEoKoncNB5HR8kvCf5QSmMqN3ukTnll1vppfKIqj7mXqHWU/U9B8yBTKjb4pXlRXfLRThJ0CWEOJNI0CW6hVpbA6Pt+8iOHOd+qYhT2JLGU6mDqD34tUfmFJO3jjr8COg3rVPj5ERPZKxOw1rT+ZOV+RW1nGvaaXzQkaArIAT6nsOQinVU19moqvVsBf+MYuNzdDfTlRgRhNmkZDO9EOKMIEGX6BaKTqTTUxVTGT+uw2PER4axwT4U89FvPTKnPmWb2W8ebAQqnVDecxrBqo7yQ53f5J9XbmWmaSd1EakQ069jgwycTURNFv1UtsezXc6MVW83M11+ZhNJkcGS6RJCnBEk6BLdgjXdCEh074kdHiMhIojv7CMIrMiA4iOdm1BlPr1qD3PA0vEg0En3mYJNm7Cndz4YLCktZZJpP7p/B7JcTgONSyHOM20n38Mb2I8XVRNp8Sc8yN/tPlKrSwhxppCgS3QLpqxNVOlAQnt3bD8XQHx4IGvsI40PDn/TuQkdMfaFZUWd3blxgOjoGHbo/gRnrun0WJYT6wlU9QQM6cRtWhHJ1EYP5hzTbq9kuvq4ubToJLW6hBBnCgm6RLcQlr+V7fYB9Ijs+FJebEggWSqR0sBESF/ZuQkdWUmZDsEaN7xz42Dckbi2YQThJfugunMVU5IK1lBNECplSqfG0akzGGc6RFGpZ056OrWnXIRT72gLhZV1Ht9fJoQQ3Y0EXaLrWcuJqjzMTjWIiGD3l6VOZTIp4sOC2G8ZD0fXQEN9xwbSGnv6t6y1DyM2vH0BREtiQgNZp0eg0Ma8OkprBlVsYE/AaPAL7NScAgfOMDJmOZ4pZQHGtUlZJTVun1x0khOMQogzhQRdoutlbcGEnaPBI1BtXdzchvjwIDabRkNdBWR1MKAoPISpIoe19hHEhXYuuAGM03nBQ7CaLHCkExm4/DRiGwr4PmJyp+ek+kzBhom4go2dHsspp8yKza7dPrnoJEGXEOJMIUGX6HqZm7BjojBiZKeHSggPYlXdYFDmju/rcixNfmcfQVxY54MugJjwENICR3Vu2fN74wat3PjOlbAAICicw36DSKnY1vmxHFwnF928AsjJGXT5cl+X3a6x1je4ftXZ7D57byHEmcuvqycgBBkbSTf1ITwyqtNDJUQEsS7dD3pPgEOfw3m/b/8g339JVUgfsqzxxId7JuiKCwtkc90oxpRugOKjEJ3a7jHsBz/ngL0PQTG92m7shsOhZzGn9F2jgn9QeKfHa2+5CKcIiz/hQX4+y3RprbnkpbXsz2m6n+2ZeSO4bmJvn8xBCHFmkkyX6FoNNnTWVjbbBpIQHtTp4eLDA6mw2qgbeCnk7YXC79s3QHUxHF3NkbiZAB5ZXgRjM/23dUONDzqyxFieg8rcxJf2s0iICPbInHJjJmDGDsfXeWS840XV+JtVh/4ee8f4rmzEsaJq9ueUc8nIRB66cBAPXTiIpMhgPt2T45P3F0KcuSToEl0rby+qvopNtgEkRHQ+6HL+wM9NdtSx2re8fQMc+ATsNnaEzcBsUkRZAjo9JzAyXduqYtGRveHAp+0fYP9HKDQfN0yipwe+TgDWHmdh1f7YOnvS0yGzuJpeURbMplb25WkN9TVgb2j2ki9rda07XAjAry4YyJ0z+nPnjP7MGtaDLceKqbU1n5sQQniKBF2ia2VuAmCrfSDxHsh0OYOuEw3R0Ots2N/OoGvfhxCVwl57KrGhAZhaCyDaKT4sEJsdrAMvMzJdVUXtnNcHlIYNJF0nkRjpmUxXfFQEW+yDsKd3vn4YwPHiqublItJXwt+HwzNJ8GQUPJ0Ar05qdil5r2gLWcU12O3aI3M5nfXphSRGBJEae3Lv2ZR+sdTa7Gw/Xur19xdCnLkk6BJdK2MjtZYEson10PKiMUZeuRWGXW4sMRYccq+zY2mRoZdTUFlHfJhnMkqN55XT62Kw29oXDJZlQeYm0qKNi7cTPZTp6hkZzHr7cAKK0qCyoNPjZRRVNz25WF0MH94BZn8YezNMfxCm/xoKD8HXjzfp2zvaQl2DnbwKz1bIP5XdrlmfXsSU/rFNTspO7BuN2aRYn17o1fcXQpzZ3Aq6lFIXKqUOKqUOK6UeaeH16Uqp7Uopm1Jq/imvNSildjp+rfDUxMWPgNaQsZG8yDEAHgm6nEuUeeVWGHqZ8dDdAOfA/4yAaNg8CiprPXZyEXCNlRXQD2IHwt5l7nd2LJGuCzqH6JAAgvzNHplTYmQQ6+zDjA+Oru7UWGU19ZRbbU2Drs8ehupCmP8mXPgMnPs749fZd8LWN+Dod66mrrIRRd5dYtyfU05pdT1T+sc0eR4W5M+IpAjX0qMQQnhDm0GXUsoMvALMAYYC1yqlhp7SLAO4BXi3hSFqtNajHb/mdnK+4sekLBMqskkPGo5SeGRPV2igH6GBfuSWWyG8J/Se5P6+rn3LISoFEkeRX17rsU30YCwvAuRX1sHw+XB8PZSdcHNeH0DiKPZaYzyW5QLoGRHMXp1KrTm0c0VbOVnuITnKsfSZ9jHseQ+mPQg9T7na6dzfQVQqrLgH6pz9jKArq6SmU/NoizOomtwvttlrU/rHsCurjAprB4vqCiFEG9zJdE0ADmutj2it64DFwGWNG2itj2mtdwNS7Ea4L8PYz7VbDSYuNJAAP8+sdseHB5Jf7rhTcOjlkL+v7SXG6mI4sgqGzaNBQ1FVnVcyXQUVtTBiPqCN/WNtKTkGJ7bBsHnklFpJ9NDJRYDgADPhlkDSQ8Z0OtOVVWIET72iLcZ+tf/dDwkjYNoDzRsHWGDuS1ByFFY+DUDPyCCU8kHQlV5E//hQerSQVZ3SL5YGu2bz0c5d1SSEEK1x56dcEpDZ6OMsxzN3BSmltiqlNiqlLm+pgVLqdkebrQUFnd9bIn4gMtZDQBjba3vS00Obw8HY83Si1PHDe+hcQLW9xHjgf6AbYOjllFTX0WDXHqvRBWAJMDJw+RVWiOkHiaNh79K2OzoDs2HzyC6roWek5zJdAIkRwezwG2kEdyXHOzyOM1jqFWWBTx+EmlK4fCH4tXL6M3UanPUT2PgqZG0l0M9Mj7AgV/DmDXU2O1uOFjOlX0yLr4/tE0Wgn4l1h9t5yEEIIdzkTtDV0vGt9hwx6q21HgdcB/xDKdWv2WBav6a1Hqe1HhcXF9eOocUP2pHVkDKFzLI6kjwYdCVHWk4GXa4lxjaySvs+NJa8EkcZ2Sg8V6PLKS4s0DU2I+ZD9g4oSm97XklnUWlJpsJq82imC6BnRBCr6ztRP8whs7iasCA/IvI3G8uh5zwECW1cFn7BUxCaAF8aBWyTo4LJ9GLQtSOjhJr6Bib3b760CBDkb2ZcSpRsphdCeI07QVcW0LgEdjKQ7e4baK2zHb8fAVYBY9oxP/FjVZYFxenolGlkl3o2g9MrOpiCilqs9Y6aS8Muh/z9UHCw5Q7VxUYAOGweKEW+M+jy4PKiczzn2AybZ/y+94PWOxSlQ84uGHYFOY4g0uOZrsggNlXEGXvZ0v7X4XEyS2qMLNf6F8ESA5PvabtTUDhM+JmR8Sw8THJUsFeXF9elF2FScHbfljNdYOz1OpBbQWFlrdfmIYQ4c7kTdG0BBiilUpVSAcACwK1TiEqpKKVUoOPPscAUYH9HJyt+RBwn1yoSJ2Ott3s0g+PclO3Kdg2ZC8oEa54zTkyeavO/jKXFYcbqd4EXg65CZ9AVkQy9JxtLjC3NCWDnO8bvwy4nu8wopeDpTFdiRDBlVhv1gy419rTVlHRonMziasaH5BlXL024HfzdnOfo64x7Mne+Q69oi3FpdoN3toauP1zIiORIIoL9W20zxZEFW58uS4xCCM9rM+jSWtuAu4EvgDTgPa31PqXUU0qpuQBKqfFKqSzgKuCfSql9ju5DgK1KqV3ASuDPWmsJuoRxWs4SQ4a/cQehJ/d0OU/QubIm4Ykw4zew530j8Gpsz1JY9YyReUowLtz2VtAV3zjTBTDiSig4YGyUP9XR72DtP2DYFRCR7LVMl3O83KQLwV4PBz9v9xhaa7JKarjc+iH4BcP4n7nfOSwBBlwAuxbRK8KfBrsmp8zztboqa23szCxtdT+X0/Ce4YQF+rFeSkcIIbzAreNiWutPtdYDtdb9tNZPO549prVe4fjzFq11stY6RGsdo7Ue5ni+Xms9Qms9yvH76977VIQnNdi1965E0do4LZcyjewyIwjx6J4uV/mBRvuDpj8Io641Tsvtft94dmS1Ubyzz1Rj07ejWGZBRS2hgX5YAjx7H3xcWCCVtTaq62zGg2FXQGgPePcayN1zsmHZCXj/FmPD/aUvAJBdZkUpWjx11xnOzNmxwIEQ0Qv2f9TuMQor6witL2Rk0Rcw5noIOX1g08yYG6Aih+E1RvDpjSXGzUeLsNm1K5PVGj+ziYl9Y1gn+7qEEF4gFelFM3a75uY3NjPvlfXeWeopPgLlJyB1OtmODE6iBzM48WGB+JsVmcWNfngrZQQwfabAR47inEtugJj+sOAd8D/5/p4ujHpyXsZ7uDbTW6Lhlk/BLxD+cwmc2A62WnjvRuP3a94x9j0BOaU1js/Ls//J9nQEXTlltcYybPo3za7oaUtWSTW3+H2BSdtg0l3tn8SA2WCJpU/mB67xPG3d4SIC/Eyc1SeqzbZT+seQWVzjqj0mhBCeIkGXaGbxlkzWHi5kf045H2x3s4BnezhrQqWeQ06ZlQA/EzEhnrlYGsBkUiRFBjf/4e0XCNe8DZG9jTpSgWFwwzIIjmzSLL/c6vGTi9CoQGrjJcbY/vCTTyEoAv7vMnjvJmO58fJXIW6gq1lOmWdrdDn1iDDmlF1WY1Twb6iDQ1+0a4zs/EJuMH9NZd85EN23/ZPwC4BRC7Ac/ZJYVU6mFzJdG48UcVbvKLeq+TuzYRuOyL4uIYRnSdAlmsgrt/KnT9OY1DeGUckR/OPrQydPAXrKkdUQ1hNi+nGitIakyOAm9+B5QnKUpeVlKks0XPeeUTT1hmUQ0bzknLcyXU0KpDYWlQI/+QxC442N6FPvd9QXO8kbNboAAv3MxIUFklNqheTxEJbY7kvCw9MWEaGq8Z92b8cnMuYGlN3GTZaNHs901TfYOZRXwahekW03BvrHhWIJMLM/u30ZPyGEaIsEXaKJxz/aR12DnT9dMYJfzx5MdpmVdzdleO4N7HY49h30PQeU8ni5CKfTlh+I6QdX/xfih7T4ckGFt5YXHZmu8hY2ikckGYHXZa/Cub9v8pLW2uPV6BvrGRFkZLpMJmOJ8fDXUFvpXmdbLcMz3mY7QwhKPbvjk4gfAknjmKe+JcvDy3rpBZXUN2iGJIa51d5kUgzsEcaBXAm6hBCeJUGXcPl8bw6f78vl/gsGkhIbwtQBsUzuF8MrKw9TWWvzzJvk74fqIkidDkC2l4KJ5KhgCitr252ls9Y3UGG1eSXoirIE4GdSFLRWAyo03tiIbmq6BFZWU09NfYNH711sLDEi+OSJwaGXgc0Kh79yr/OWfxNVn8fyiBs6P5ExN9DLlkFY0e7Oj9XIgZwKAIYkhrvdZ0hiGAdyK9CtlfMQQogOkKBLAMYP9sc+2sewnuHcNjXV9fzXswdRVFXHG2uPeuaNnBcrp0yjvsFOfoXVo+UinDp6gXKeIwvljaDLZFLEhja6F9JN2aXGnLzxdQLjEEN2aY0RYPQ+G0Li3TvFWFMCq59lk3kMxT0md34iw6+k3hTITOtX1Nk8d4AjLbecALOJ1NgQt/sMTgintLqevHb+XQkhxOlI0CUAWLg6ncLKWv58xUj8Gp2QG9M7illDe/CvNUcoqarr/BsdXW1sto7sRV65FbuGJC8sL/aKdtbqat9S1fEio32faIvH5wSOq4DaWe08p8xxwtNLma6eEcFU1zVQXmMzsmxDLjU209e18bVb+3e0tYw/Wq8xLrrurKBwchLO5SLTRnKLPbe0dyCngv7xoe06+Tk4wViKTJMlRiGEB0nQJQBYfbCAs/vGMCI5otlrD84eRGWdjYVr2rgnsC0NNji2DlLPAbybweloputYURVAu7Ii7REfFtju7ImzGr03M13G+zi+ViOvhvpqo6ZZa0ozYeNCaoZcxZ6G3sYVQB5QNegKolUllfvbd4LydNJyytu1tAhGpgtOLk36UoNdy7KmED9SEnQJymrqScstZ2JqjFG4dPtbsPJPsPbvsHEhA/M+Y9bASD7fm9u5N8reDnUVjfZzOTM4ng8m4kIDCTCb2h90FVZjCTB7ZXkRoFe0hczi6nb9UM0prcHPsTTpDYmuWl2Or1Xvs2H8bbDhZWNTfUscAdnBob8ETt4C0FmhQ2dTrEMJPXiaOynboaiylvyKWrc30TtFWPzpGRHk8830dTY7V/y/9fzs/7ZJ4CXEj5BnS26LH6Rtx4vRGib2jYZ1L8DXjzdr82jsOVxQ9FPyy63Ed7Qq+s53wC8I+s0ETmZWvHF60WRSJEUFk9nO5cVjRVX0iQnxeAkLp9TYECprbRRU1Lr9dcwps9IjPAizyTtzcn79nZlHAGb90chKfvgL+MV6CI1rNKHdsGsxTLmX9LooIMMzy4tAYnQYi+yTWJC30ijSGtS+DNWpDuYamSpn5qo9BieGu/r7ysLV6ezKLAVg0eZMrpvY26fvL4TwLsl0CTYdKSbAbGJszUb4+gnjeprHiuE3OfDQUZjzLCmFq3nB/2W2HMnv2JtYy43rd4ZfCcFGVfDs0hqiLP4ev27H6bRlI1pxrKiKlBjv7OcC6BtnLFumF1S53cdbZTWc4sOMgM6V6QLjwur5r4O1DJb/wsiA1lthy+uw6FqjoOzU+8ksrkYpzwXOfmYTa4PPxd9eCwf+1+nx9ucYmaommS57A3zxW3h+EPz3Uvjyd8YdnFVNi6EOTgjjcH6lRzf1n87h/Ape/vYwF49MZHK/GJ7+ZL9XqvMLIbqOBF2CTUeLuSShhICPbofEUXDZK8aG6gCLUUx04s9pmPUMF5k303vV/cYPrfbavQTqq2DcT12Psku9c3LRKTkqmBPt+KFla7CTWVxNipf2c8HJvWJHC90PurxVjd7JbFL0cBZIbazHMJj9tFE+YulP4IWR8MmvjEuqFyyC4EgyS6pJCA8i0K/tSu/uKo0eTZ45wfg300kHciuICwskxrk0W281PpcNL0PCcKitgE3/hGU/hdfOgepiV9/BieHY7Jr0AjdrlnWC3a55ZNkeggPMPHHpMP5y5Ug08OgHe2SZUYgfEQm6znBVtTZOnMjksco/QEAILHjXCLZOYZ58F4sibmNE6dfw0V1G5sNdWsPWNyFhJCSNdT3OLq3xajCRHGWhsLKOmjr3gsScMiv1DZrUGO8FXT0jggnyN3HEzR/kdrsmt8zq0bspW5IYGXxyI31j42+DQRfBvg+NAqY3fwy3fQ19JgHGQQVPbaJ36hUTwid6qlFepDynU2MdyC13nUSkpgTemmeUw5j1tHEjwe2r4DfZcMMHUJkHy25z/U/FEEc/X+zremfTcbYeL+H3lwwlLiyQXtEWHr1oCN99X8jiLZlef38hhG9I0HWG25FRyp/MCwmvLzQCrhauxXHKH3EHf7PNh12LjEyBuzI3Qf4+GP9T4+Jph+zSGq+Ui3Bybu4+UepetsuZferjxeVFk0mREhPCETczXUVVddQ12F0XU3tLz8hGBVIbUwrmvwl3b4WbPjIOQTT6O8wqrvbYJnqn5Khg3rFOBG2Hvcs6PI6twc6hvEqGJoZDZQG8MQdObIUrX4fJd59saPaH/ufBnL8YF36veQ6AlNgQAswmr59gzC6t4S+fH2TagFiuHHvyv7/rJ/RmUt8Ynv4kjROlnr+PUgjhexJ0neHS927kfPMO6qc+CMnjTtt2fGoUL9rmkZ88G756HDI2uvcmW9+AwHAYPt/1qMJaT7nV5vXlRcDtC5SPe7lchFPfuBC3lxe9XaPLqWdEEDllVuz2FjKY/kEQO6DZ4zqbnZxyK8kermmWHGUh3Z5EbdxI2PNeh8c5WlhFnc3O4IRQ+N99UHwErl8KI+a33OGsn8Coa2HVn+H7r/E3m+gfH0qalzfTP/fFQRrsmmfmjWhygMNkUjw7fyQNds1L33zv1TkIIXzDraBLKXWhUuqgUuqwUuqRFl6frpTarpSyKaXmn/LazUqp7x2/bvbUxIVn9D70H6wEEnj2z9psO6ZXFP5mE+/0eAgie8P7P4GqwtN3qioylqZGXgOBoa7HOV6uPQXtr9V11MvlIpz6xoaSUVzt1gZtb1ejd0qMCKLOZqeoHQVwjSr20MvDmS7neFm9L4WcXZCf1qFxnMHS+KrVxqb8mb8x7vxsjVJw8d8gfih8cBuUZjA4MYwDOd5bXrQ12Pk6LY+5o3q2eAK0V7SFmYPjWH2oQPZ2CfEj0GbQpZQyA68Ac4ChwLVKqaGnNMsAbgHePaVvNPA4MBGYADyulIrq/LSFJ1iLTzCl+lv2xl9ibJhvQ3CAmRFJEazNqjMujK4ugg9uNy6xbs3Ot6GhDsbd2uSxc7nEm6fy4kIDCfAzuX0CzNvlIpz6xoXQYNdkuHGxs68yXYmRp9TqcoOzHIenykU4OTNnuyJnGSVG1rdjKbuRtJxyepgrSNrwGCSdBZPubrtTgAWuecso5PvZIwxJCCe/opZiT9zG0ILdJ8qosNqYPjCu1TbTBsSRU2b1ySsTfeIAACAASURBVIZ+IYR3uZPpmgAc1lof0VrXAYuByxo30Fof01rvBk796Tsb+EprXay1LgG+Ai70wLyFBxStehU/7NSe9XO3+4xPjWZ3VinW2OEn98Cs/GPLG+trK42lxd6ToEfTOD3HBxkck0mRHOl+2Qhvl4twas8JxpwyK4F+JqJDArw6J+eesexTTzCeRmax8XX1dNDVIywQP5PicFUQjL0Zdi82KuC304Gccp61vIWqrYDLXgWzm6VJYvrB5Hvg4CecFXjcGMtLm+m/O1SIUjClf0yrbab2jwVgzaE2sspCiG7PnaArCWj8HS/L8cwdbvVVSt2ulNqqlNpaUFDg5tCiU+qqid7/Fl/bxzJsxBi3u01Iiaa+QbMjoxTOugVGXw/f/RXevQYq8k42zNoKC6dCyXGYcm+zcbJLazCbFPFh3s3gJLlZq8sX5SKc+sYay6zunGA0TngGeT375jwd2Z5MV1ZJNX4mRUJHi+W2ws9sIjEyyPh7m3yP8XD9S+0eJ+HEl5xTvxZmPALxg9vX+ew7ICiCYYcWAt67Dui77wsYmRRBpKX1oLpXtIW+sSF89718bxTih86doKul7/bubi5wq6/W+jWt9Tit9bi4uNbT7MKDdi8m2FbGlxHzT/sN/1Tj+kSjFGw5VmzsgZn7Msx51rjI+v9Ngv0rYPWz8PossNvglk9g0Jxm42SX1pDgxSrrTslRFrLcWsYzykX4ItMVYfEnJiSAI24USPV2jS6nmJAAAvxMLZ9gbEVmSQ09I4O98nfYK8piLAtH9oJRC2D7f40TiG4qKzjBr+r/SUHYEJjcPOhvU1AETLqbwPTPmRqS6ZVMV7m1nh2ZpUwb0Pb3vKkDYtl4pJhaWwdq5Akhug13gq4soFejj5OBbDfH70xf4S12O3rDq+zVfbH0n9aurhEWfwb1CDOCLgCTCSb+HG5fDeFJ8N6Nxr18w+bBHWshZUqL42SXebfKulNyVDBFVXVU19lO28651JfixRpdjbl7gjGntMbrNboAlFIkRgS57sN0R2ZxNb2ivRMQJkcFnzx1OuV+sNXCxlfc62y3Y//wF4RRw/Fpf3V/WfFUE++AoEh+5f8BB7xwgnFjehENds20AbFttp02II6a+ga2Hy/1+DyEEL7jTtC1BRiglEpVSgUAC4AVbo7/BTBLKRXl2EA/y/FMdKXDX6GKvue1+jlM6Nv6XpLWTEiNZtvxEmwNjbbwxQ+G276B85+E+W8YV8gER7Y6hrer0Tu5anW1scToq3IRTn1jQzlSePrlxdLqOrLLrPSLCz1tO09JdJSNcEeDXXMo7/+3d+fhUVbXA8e/d2ay7yuQBUIWdsIWdkRAUBQEVERc6gKtCy61Vlut1vqzttbWWtSqoOJWFUS0ioogSwHZ90BYk0jITkIgO1lm5v7+eGdiyDohswRyP8+Th8zMOzM3L5PMmXvPPaeMeAeNLSrIm8KyaqpqTRAaD/1nwa534bwNQcfORQTlbuIF4x107zvs4gfh6Q9jHmJo1U7c8w9gaqqcRjv8mHoGb3c9Q7q3vrdoVGwwBp1QS4yKcolrNeiSUhqBh9CCpaPAcinlYSHE80KIGQBCiOFCiGzgZmCxEOKw5b5ngT+jBW67gect1ymutOc9KtzDWGUeyYiere9abGh4TDCVNSYO5zZYcjG4w7hHtf6KLTCbJXkl550UdNlWNsJZ5SKseob5cKa8hpLztc0esy/zHABJPZyz4TciwMvmma70wnIqa0wkRjUfWLeHtUBthiUYZtxjUFMGu99p+Y65B2Dts6T4jWO153XtzxkccR/VbgEsEJ//PBY7+TG1kNGxIbgbWv/s6+fpxtDuQfyYqpLpFeVSZlOdLinlKillLyllnJTyL5brnpVSrrR8v1tKGSWl9JFShkgp+9e773tSynjL1/uO+TEUm5UXQupatnhPIjLE/6LelKyBWt0SYxudKa+m1iSdEnTV1XxqpWzEKSeVi7CKtWEH495T5zDohMMCm0ZjCvMhr6SKksrmA0GrA1najNOg6ACHjKVvN38AjlgD+26JkHA1bH+j+bpd1eVaD0WfMP6sW0DfCDuMzdOfc4PvY5L+ALkpW9r/eBZZZyvJKKq0aWnRalxCKCm5JQ4rX6EoiuOpivSdTcoXIE28Xz6KIdEX92bexd+T7sHe7Dp5cUGXNVcnwsG1pwBC62p1tTLT5aRyEVaxYa3vYNyTcY7+Ef54uduvmXRLrMtcB7JbX8I7mF2Mr4ehbiemvcWG+uBh0P0cdAFM+TPoPeDdyXD8+wvvUHYavrofitKpnrGIPYWCwRf5+m4oeMJDnJN+dDuw0C6PB9TNWI2zIYne6oqEUKSErWnOm+0ymyVpBWXszzzHltQzrE7JU/XCFKUdVNDV2RxcRk3YQHaUd7Epl6Q5I3oGszvj7EVVyT5qqfDd29qI2IF0OkGPYG+On24+Edpklk4rF2HVPdgbvU40u4Ox1mQmObuYYT3avvx7sRKjAhACDmTaEnSVMDAyAJ2Ddp8a9Dr6dPW7cAk7vA/c+z+tJdHSW+HHV6AsH1b/AV5NhGPfweTnSDYMxGSWDOlun6DL3SeAlb6ziS/ZDlm77fKYP6YWEhHgSVyY7a+5xKhA/D0NTs3r+v0XB5n8ymZueHMbdyzZyf0f72Pu2ztsbiKvKMqFVNDVmRQeh9z9pHabBtCuN6URMcGcq6wlraDtn3pTckoI9HYj0gnLiwCj40LY2cJ2+9zi804rF2HlbtARHeTV7PLi0bxSqmrNDHNSPhdoeUMJ4b7szzrX4nHVRhNH80pJdNDSolW/CH+O5JVeGNj7R8A938OAG2H9/8Er/WDnW9D/Rq0p97hH2W/JhRtkx2XZrLjbOCv9kBtfbPdjGU1mtqad4YqEsMbL2VJq1fCboNcJxiWE8mPqGae0BDqeX8aKfdncNDSK9+8ezvL7RvParUMoLKvmw+0ZDn9+RbkcqaCrM0leBkLPGnEFHgYdfbr6X/RDDbfkde26iLyulFxtlsRZ+VPjLdvt955qOphwdrkIq9gw32aXavZkaGMd2sM5+VxWg6MDOZBV3OKb+rG8MmpN0q5BTVP6RQRQcr62rmVUHTcvuGkJXP0CDP2FFmzd8JZWSR4t36x7sDchvvbbFNGnRwSLjdMR6esha1e7HutgTgmlVUbG1c/nMtbAvo/gtcHwYhR8cjPsegfOnrzgvj+3BLJvUn9TXl1/Ah93A89M68vEPuGM6BnMjEERTOgdxlsb0ymtaj33T1GUC6mgq7Mwm+HgcoibxJY8wcDIAJt2TTUnJsSbUF8Pdrcxr6vGaOZ4fhn97ZHkbKNRcSEYdKLZNirWchHOXF4ErTxFRlEF5iZKEezNPEdkoJdTCqPWN6R7EMWVtWQUNb/x4KAl5ysxyrH/h/0jtA8FjXbJglaYd8zDcP2rdcGW1f7MYrstLVoNigrgI9MUqtyDoZ2zXVtSra1/QsFUq7XKen0YrHxYK8o6+DY4kwqrHteCsC9+WTf79XNLIMcuMR7NK2XVoXzuGRtDUIMWVI9f3ZuS87W8u/knh45BUS5HKujqLE5tgdJsagfOISW3tN1vSkIIRvYMZndGy0tRDZ04rc2SDIx0XtDl62FgaI+gZnNhrOUiwp1ULsIqNsyHqlozeaWNa2PtO3XOqUuLVtbXhXWJrinJ2SWE+Lg7fHm4T1c/hODCZPpW5JdUkV9aZbckeqvYMF907j5sCr0V0jdA5s6LfqwtqWcYEBFAsLsZlt0G3/4GfMPhts+1IsPTX4FfH4CH98G438Chz+G/94LJSHSwNz1CvNmWXmTHn66xV9el4udh4JfjYhvdNiAygGkDu7Fky0mKyqsdOg5FudyooKuzSP4M3P044j+eGqO5XUn0VsNjgsgpPt9qOYb6UnJKABgQefFLmxfjyl5hHM4t5UwTbxLOLhdhZS3E2nAHY07xefJKqlwSdCWE++Hjrq8rCdGUg9nFlqR7x54vb3cDsaE+Tc90NeOAJR/N3kGXXicYEBnAkpqrwCcMNv71oh6notrI/qxzXBnrC5/dDqk/wLRX4JfroNfV2gyeVUgcTH4Opjyv7Tr+6n4wmxgTF8LOk0V2L9ZqdTi3hNWH87lnXE8CvN2aPOY3U3pxvtbEWxvTHTIGRblcqaCrM6iphCNfQf+Z7MvV8mPssfwy/CLqdaXkluDnaaB7sPOS1oG6ekhbmigu6exyEVZxdWUjLszPseaeuSLo0lvqgu1vZgdjRbWRtIJyp9UO6xcRwJHcEpuP359ZjLteR78I+wf1g6IDOZBXg3HMo/DTRjjydZsfY1fGWfSmKuZn/QHS1mu9S4fPvzDYamjsr+GqZ7UZr68WMDo2iLIqI4fbcF7aYuG6VPw8Dcwf17PZY+LDfblpaBQf7TjVpibpitLZqaCrMzi+CmrKIXEu+zOL6ervaZdcoT5d/fHzMLDrpO1LjIdyShkQ4bwkeqsBEQEEebuxucES4/kak9PLRViF+3ng465vtINx36lzeLvr6eOEkhpNGdI90LJ7svFuz5ScEszScUVRG+of4U9uSRXnbCwIuj+rmH4R/ngY7F/bLDEqgBqTmaPRcyFiKHzzayhtWyvZ3cezeN/9ZQLzt8GsN7WNALa44rcw8Rk4uIyr8rSq/I5YYkzJKWHtkdPMH9eTAK+mZ7msHrkqASklb/wvze7jUJTLlQq6OoPkpRAQDT3Gsj/rnN2SjPU6wbCYIJtnumpNZo7mlTp9aRG0el3jEsIabbd/a1M6tSbJVX3CnT4mIQSxYb5sTy+6IMDZc+osg6MDMehd8+s5ODoQo1nWLQXXdzBbu85ZM13WZPojea0vMRpNZg5ll9h9adHKulszObcCbnxHa8L91QPaJhVbVJcz9eAjjNQdRdywWEuYb4srn4Ahd+Cz8zVuDM5guwOCrve3ZuDnYWBeC7NcVtHB3lw/KIKvD+Q2GaAritKYCroud2WntcTfxDkUVtSSdfa8XXd2jegZTFpBuU0JtWkF5dQYzQxwYhJ9fVckhFJYVs2xfK1QatbZShZvSuf6QREkxTivCGl9D06M50RBGY8tP4DZLKmoNnI0r8wlS4tWg+uS6RsvMSZnFxMZ6EWoHcsxtKRfw3ZALTh+uozztSa771y0igryIsjbTdu9GRoPU1/Ulhl3vNn6navLqP3oRvrVHmFtnz/DoFsubhBTX4Lgnjxb+yrHMrKoMdoY8NmgqtbEmsP5XDuwK/6eLc9yWc0aHElZlZGNx53fiPuj7Rn88sPdbcopVRRXU0HX5S5lBUizZWnRUvvJDkn0ViNirHldrS8x/pxE75qga7yl5Yp1F+NfVx1FJwRPXdvHJeMBmDqgK09f15dVh/L566qjJGcVYzJLhrow6Ar38yQqyKvJIqkHs0scXiqivhBfD7r6e9qUv2RN/h8S7ZhzJ4SW72ad7WPoXdB7mlakNf9Q83esKoWPb0Kfu4dHah8ifMztFz8ID1+48V38jUU8I9/mYCuFbNti3dHTlFcbmTU40ub7jIkLIdTXna8P5NhtHK0xmSXPrTzMs18fZsOxAqa/vsXhJTQUxV5U0HW5S16q5Z+E9WJ/VjEGyy4sexkYpdX7smWJ8XBuKT7ueno6uQipVdcAT3p18eXH1DNsSzvD9yn5LJgQ55TG2y2ZP64nd4+J4d0tJ3n+2yMADHVQ4GCrwdGBjdoBnauoIfNspdOWFq36RfjbtINxf2YxwT7uRAc77v9zUFQAJ06XUVlj1JLfZ7wOXkHw6VzY9x+tyKmV2Qwn1sAH0yBnL592/z9+dBvX/nIpUcOoHvc7rtfv4Oz2/7Tvser5an8uXfw9GBkbYvN9DHod0xMjWH+swCnFUs/XmLj/4718sC2DeWN7svaxK+nq78ld7+/itfWpTda8U5SORAVdl7PTh7VP4IPmAlrtpX4R/ni62S/J2MOgZ3B0oE1B16GcEvpF+DusX58txieEsfPkWf608jDRwV78anzjOkTOJoTgj9P7cXW/LhzLL6NXF99mt+o7y5DuQeSWVJFf8nMNsYOWmcpBTpzpAi2vK72wvNW8oQNZxQyJDnToJo3EqEDMsl7BVp8QmPspeAfDyofgtSGwczHseAteHwqfzoGKQrjlY94uHKAV6rVDrp7XxMdJMfRnXOpLcLb9RUqLK2vYdKKAGYMi0Df1+1l5Vpuxa8LMwRHUGM2sSclv9zhacraihrnv7GDd0dP86fp+PHt9P+LCfPlywRhmDY7klbUneGTZfoeOQVHaSwVdl7PkZaAzwICbMJrMHMwuYYgDkoxH9gzmcG4pFdVN94wDbUngSG6py5YWra7oFUaN0UxqQTlPX9fPrgFoe+h1glfnDuHKXmHMbMPyjqNYk9EP1Fu+OmhZvhvggqDLLKnLxWtKyXmtD6ijkuitrP0mk+vXMYtKgvs2w+0rICAKvv8drH5SK3g6+z149BBZYVeSebaSsXG2zyK1SKdnQ98XqDELzMvvhtrGBXbbYtWhfGpN8ufXnskIGVtg3f/B4vHw957a1wfTYdvrWsV8i8HRgXQP9mZlctt2crbVP9Yc40huCW/dPox7xv6c6O/tbuCVOYN4ZFI83x7MY1t6050nFKUjsCnoEkJMFUIcF0KkCSGebOJ2DyHEZ5bbdwohYizXxwghzgshDli+Ftl3+EqzzCatrk/8FPAJ5cTpciprTHYpitrQ8JhgTGbJvhaqmP9UWM75WhMDnNj+pykjewbj5aZnXHwo1/Tv4tKxNOTlrufDeSN4cGK8q4dC/wh/3PSC/ZnFpOSUsHDdCT7dlUlsmI/NSdb20q+b9pppKa/L2pposIOS6K3C/TzpFuD5c16XlRCQMAXmr4FfbtCCsPk/wICbQO/G1jQtELig32I7Dejfn9/W3o8uPxl+eKZdj/XVgRziwny03aIFR+Gdidqy6NZXwc1HK1cx5hFtxuuHZ+DfSbBiPtRUIIRg5uAItqadoaCsfcFfc7LPVfL5nmzmDu/O1AFdG90uhGDBxHi6+nvyr7UnnNIQXFEuhqG1A4QQeuANYAqQDewWQqyUUh6pd9h84JyUMl4IMRd4CbBuz0mXUg6287iV1pzcDGV52g4rYO8pbfnPETMBQ3sEodcJfkw9wxWWZPWGUnJdm0Rv5emmZ8UDo4kM9HJ6rbBLiaebnn4RASze/BOLN/+EEDAkOpBHJ/dy+liig73w8zC0uIPxQGYxQmgFTB0tMSqgLshrUtSwRldtTS8i3M+jriCuPQyPCeZXJLEn4naSdr8DMWOh/w1tfpyc4vPsOnmW306OR2x/A9Y/Dx5+MGsR9LlO6wdpNflPUJypNefe/DIUHIFbPmbm4Ahe35DGt8l5NpWbaKu3NqYjBNw/Ia7ZYzzd9Dw4MY4/fn2YLWnN/y1SFFeyZaZrBJAmpfxJSlkDLANmNjhmJvCh5fsVwFVCvaO5VvIy8AiAXtcCsPF4IVFBXvRwQOV1Xw8DU/p2YdmuTMqaSaZNySnF001HXJhrkujr6x8RQKC3e+sHdnLzxsYwLbEb/5idyO6nJ/PlgrGM7+X8NzIhBH1bSKY3myXfHsyjb1d/p8zCJUYFklFUSUmlbYnjZrNkW9oZxsaH2jXQ9/N0Y2BkAP8wzoWo4fD1w1DU9rY8Kw/k0pUifpXxG/jhaYi/ChbsgMG3XhhwWQV2h0nPwB1fQFk+vD2B+HNb6B/hz9cOWGLMLT7P8j1Z3JwU3Wq/zznDo4kI8OQVNduldFC2BF2RQFa9y9mW65o8RkppBEoAa/JCTyHEfiHEJiHEFU09gRDiXiHEHiHEnsJCtfW33arL4eg30H8WuHlSVWtia/oZJvUJd9jszgMT4iitMrJ0V2aTtx/KKaFvN3+XFfxU2m7m4EjeuG0oNydFO60uV3P6R/hzLL+U8zWNk+m/O5TH8dNlLc6C2JO15MqWNNtyh46fLqOoooax8fZbWrQaExfC3uxyKme8Czo9LL8Tygva9BhZu79hjdfTeBYka22J5n4KvjYE1/FXwX2bIDgWls7lydCtJGcVk9Ggw0J7Wfs7LrDh/9fDoOehSQnszyxmkyojoXRAtrwDNvUu3fAjRHPH5AHdpZRDgMeAT4UQjcqRSynfllImSSmTwsLUlHC7JS+F2goYcgeApeK5mUkOrLo+KDqQsfEhvPvjSaqNF74xmi1J9O3eKq90WtMGdqOq1swra49fcL3JLFm47gS9uvgyfWA3p4xlRM9guvp7smJvVusH83O/z7Hxdkqir2d0XAhGs2TnOW8tab8oHd65SsvLao3ZxJlvnuOF8ucw+4TDvRu1tkRt+WAW2B3mrYFe13LFiRe5Q7+Wr+xYsyu/pIrPdmcxe1gUUUENZumNNXD2ZKNNBNqxXiq3S+mQbAm6soHoepejgIZzyHXHCCEMQABwVkpZLaUsApBS7gXSAecnhXQmZrNWITsyCaJHALDhWAFebnpGtaH+zsV44Mp4Csqq+XLfhX90v9yfQ3m10en1nZTLR1JMMLeO6M6SLScv2Dm4MjmH9MIKfjO5l9NKkeh1ghuHRrLpROEFJTWaYjZLlu3OZGBkgF36nTaU1CMYb3c9qw/lazNP96wCUw0suRrS1jV/x9z98PGNhO79F1+Zx2Gavw5CEy5uEG6eMOdD6HUtL7i9T+2OdzHZqV7Wok3pmKVkwYR4qCqBjS9pOyj/NQD+0gVeGwwvJ8DXD8HJH8Fsxt2g45FJCSRnl7DhWNtm/RTF0WwJunYDCUKInkIId2AusLLBMSuBuyzfzwY2SCmlECLMkoiPECIWSADaX1RGad6J1VrdntEPAiClZMOxAsbGhzi8PMLY+BASowJYvCm97o/u9vQinvryIKNjQ5gxKMKhz69c3p66rg9hfh78/ouD1BjNGE1mXl2XSt9u/lzTv/GONke6OSkas4Qv92e3eNym1ELSCyuYNy7GIePwctczPbEb3x7M1Uq2RA6FX62HwB7wyRxY/QdI+UKb+aqt0tIO3rsW3p6AzN7Nc/JeNvZ9ntCgdu5qNnjAnA853XUiTxgXk/bdwnb/bKdLq/h0Vya3DA4m+vAiWJgIG/8Kteehx1gY/wRc/yr0vR4O/xc+nA4LB8KxVdwwNJIeId4sXJeqZruUDqXVoMuSo/UQsAY4CiyXUh4WQjwvhJhhOWwJECKESENbRrSWlRgPHBRCJKMl2N8vpbStO7Jycba/oTW37qv916QWlJNTfJ6JTmjoLITggSvjyCiq5PuUPNIKyrjvP3voEeLDol8Mw92g8rmUi+fv6cYLswZyLL+MxZvS+XJ/DhlFlTw2xXmzXFY9Q30YERPM53uyW3xTf2/LScL9PJg20HEfOOYkRVNRY+K7Q3naFQFRMO976Dsddi6CFfPgzVHazNBnd0BJNlz9Fz4bt4YPqidwt712Gxo8CLp7KZtFEr33PqfNSrUj4Fm8MZ0b2cDzJ2/XWi1Fj9RKcfxqPdy4GCb+AYbdDbPehMdT4aYlWpHaZbfitumvLBgfw6GcErY5oDG4olysVktGAEgpVwGrGlz3bL3vq4Cbm7jfF8AX7RyjYqvcA3BqC1z9Aui1/1rr9Loj87nqu6Z/V2LDfHh9fRoVNUbcDTrev3s4AV6urbCuXB6m9OvC9MRuvL4hjSAfbffe5L7OeW03NDspit+tOMjeU+eabJh+4nQZP6ae4fGrezn0A8ewHkHEhvrw+Z4s5iRZMkE8/GDOR2CshjMntJmuwuPQdSD0mY7U6XnnlU0kRgXYtWCyu6cXu4YvpGD7H5i98a9QfAqmLwRD23YLn8k+zuQ99zLGkAJho2Hyp9B9ZAtP7A0DZ0Of6bDqt7D5H8yO3cvbvneyaFO6QzYxKMrFUFMPl5Mdb4K7Lwy9s+6qDccK6NvN3yH5JE3R6QT3XxnH8dNlnCmvZsldw4kOtn+ZCqXzem5Gf7w99JwureaxKb1cVm9t2sBueLvrWb6n6YT697eexMOg47aRPRw6DiEENydFszvjHD8Vll94o8FDC7QS58BVf9R2NOsNbEk7Q3phBXePibH7+btlVBxPGO9jW/S9cOAT+GS2lo9lC5MRdryF/3vjGSjSOTPhJbh7VcsBV31untoOzOkL0Z/awpeGpzmZdoSUHBufX1EcTAVdl4vSXC13Y+iddbV1Sipr2XvqHJP6OHdH6KzBkcweFsWiO4Y5pVil0rmE+nrw+q1DWDAhjgm9Xbfb2cfDYMmnymvUAutsRQ1f7svhxqGRBPs4vibcTUMj0esEn+9tOcfM6sNtGYT6ujMt0f47PqODvRmfEM5jp6dimvEWnNoKb42FnW9DTTPlJGqrYPcSrV/l6ifZZurLvxI+InTC/aBr49uUEJB0D9zzPX5UsMz9Lyxfv739P5ii2IEKui4Xu94BaYaR99VdtSm1EJNZMqmPc9vduBt0vHzzICb0ds2yj3L5uyIhjN9N7ePyrgI3J0VTWWNilTWfyuKTHaeoNpqZN9b+1dmbEu7vyYReYXyxNxujydzisaeKKlh/rIDbRnTHw+CYzTW3j+xOfmkVGzyvgru+Bf8I+P4Jbdfh/16E1LXah8S9H2qXX02E7x4DnzA+T/g782oe5/ZrxrRvEFFJ6O78ilBDJfPSHiEnU+3hUlxPBV2Xg/JC2LME+kyDoJi6q/93rIBgH3eHNwFWlM4qqUcQPUN9+Gx3FuXVRsxmSY3RzEc7TjG+VxgJXfycNpabk6IpKKtmc2rLRUE/2n4KvRDcPspxy56T+oTT1d+TT3aegh6jtT6U89ZoyfCb/qYtOa6YB988ol0O7wt3fUPR3O949lgPrh8UaZ+WSRFDqLh5OaGiBPdPZrW5cKyi2JtNifRKB7f2j1BTCZP+WHeVySzZeLyACb3D0Tt5Z5eidBZaPlUUf199nAF/WoMQ4OWmp7LGxD9mxzh1LJP6hBPi487y3dnNzm5XVBtZvieLawd2o4u/p8PGYtDrGeYs7gAADIZJREFUmDM8mtc3pJJ1tlLL6+w+Cm5bppW0qSgCD18tB9XTvy4l4t3Vx6gymnh4kv2avof0GcebPf/B3Scfx/jBDAz3fAc+jq1ZqCjNUUHXpe7kZq0C/RWPQ1jvuqv3ZZ7jXGWtU0pFKEpnds+YnoT6elBcWUN5lZHSKiOB3m6Md3LDZXeDjhuGRPLBtgwKy6oJ87uwdZPRZObp/x6irMrI3WNiHD6eucOj+feGVP69IY2XZif+fENwrPbVwJnyaj7alsG0gd2ID7fvDOHV193A/IX5fHT2ZfjPLLhrJXi1szaZolwEFXRdyozV8O1j2pLi+McvuOnfG9II8HJjogsTjRWlM/By1/9cqsHF5o6I5v1tGdzy9nYW3zGsbnnTaDLzm+XJfJOcyxPX9GZYD8cHHBGBXtw7Po5Fm9IZlxDK9S0UR641mXnwk33UmiWPTr7IyvgtiA/3I6j/VTxwzMQ7Ba8gPr4JfvGVNsumKE6kcrouZdteg6JUuO6f4PZzSYjt6UVsOlHIgxPj8PNU9bEUpbOID/fj4/kjKT1fy8w3tvL1gRxqTWYeWbafb5JzefLaPjw40X5Ld6357dW9GNYjiKe+PNRiI+znvznCzpNn+ftNiXaf5bJ6Zlo/tosh/Cv4GWRespZXVl3e+h0VxY5U0HWpOvsTbH4Z+s2ChMl1V0speWn1MboFeHLn6BjXjU9RFJcYHRfCd49cQb9u/vx62QGmLtzMqkP5PDOtL/dfGefUsbjpdbx26xD0OsFDS/dRbTQ1OubTnZn8Z8cp7hsfy6whkQ4bS0SgF49f05vXsuPZk/RPyN4DH98IlapJiuI8Kui6FNVUwJf3gc4Npr54wU1rDp/mQFYxj05OcHivRUVROqYu/p4svXcU88f15OSZCp6d3o9fXtE4j8oZIgO9ePnmQaTklPLiqmMX3LY74yx/WpnClb20EiCOdufoGBKjAnhgXyQVM97VungsmQJnTzr8uRUFQHS0ZqBJSUlyz549rh5Gx1VTCZ/O0QoOzn4P+t9Qd5PRZOaahZsBWPPoeAx6FVMrSmdXWWPE29316bvPf3OE97aeJDbMB70QCAG5xVWE+3nw3wfHOq1VWEpOCTPf2MqcpCheHFYBy24FnQFu+wwihzllDMrlRQixV0qZZMux6l35UlJbBZ/dDhlb4IbFFwRcAF/syya9sIInrumjAi5FUQA6RMAF8OS1fbhvfCx9u/mT0MWXuDBfJvcNZ4mTe7MOiAxg3tgYlu7KYqepF8xfq+XEfjAdkpe1q0m3orRGzXRdKozV8NkdkPoDzHwThtx+wc0V1UYmv7KJLv6e/HfBGJdX6lYURemoKmuMXLNwM8WVtSy6Yxhju5q1v69ZOyF+Ckx/BQK7u3qYyiVCzXRdbn7aCG9P1AKu6QsbBVw/FZZzw5tbOV1axVPXur41iqIoSkfm7W5g2b2j6RbgyV3v7eLzY9Vwz/cw9SU4tQ3eGAU7FoGxxtVDVS4zKujqyIrSYelt8NFMqCmHuZ9qjVzrWXvkNDP/vZXCsmo+mjeSkbGq0rKiKEprIgO9WPHAGEbFhvDEioP8c10acuR98OAOrXXR6t/Dv/rB+uehONPVw1UuEzYtLwohpgKvAnrgXSnl3xrc7gF8BAwDioBbpJQZltueAuYDJuARKeWalp6rUy8vSglnUiFtrTarlbEVDB5wxW9h1AJw87QcJsk8W8nSXVks2pTOwMgA3rpjKFFB3i7+ARRFUS4ttSYzz/w3hc/2ZNGnqx+zhkQyI7EbEUXbYPcSOLFaOzB2IvQcD91HQ8Rg7W+zotC25cVWgy4hhB44AUwBsoHdwK1SyiP1jlkAJEop7xdCzAVukFLeIoToBywFRgARwDqgl5SycbEWiw4bdElpSbDU/pXSjFlKzGYzZmlGmiVmacZslphNtWA0YjbVYDbVIk21mI1GpKkGWVsF1WVQXQpVpegr8jCUZuJWmoV7yU+4VeQDUBWYQGn0JLJ630OxPpjyaiPFlbUcyCpmx09F5JVUATAnKYrnZw5Q5SEURVEukpSSz/dk8+muTA5kFSMEjIgJZkBkAHHu50gq+pro3DV4lWUAYNa7UxXUm1rfyLovk1cw0t0P6e6H8PAFgwdCb0CndwedHp3BHZ3egNC7af/qdOgE6HQCnbB+r0MnBDoh0Ot1CATUpYs0+B5auNzge8Wh7B10jQaek1JeY7n8FICU8sV6x6yxHLNdCGEA8oEw4Mn6x9Y/rrnnc0rQ9c8+UFVyQRDV1L9SSgSO32hQIAPJkmFkynD2mHuz0TSIHJpu3xPq687I2BBG9QxmdFyIw6o3K4qidEYZZypYmZzLqkN5nCqq5Hztz3MEoZQwTHeCJN1xeolsIkQRkeIM3qLahSNuG7PUgrH672wS0cz3Vq3d3vi2pi7X96R8kHWMtGXI7RIX5ss3D49z6HO0JeiyZS9xJJBV73I2NDpTdcdIKY1CiBIgxHL9jgb3bVRyWAhxL3Cv5WK5EOK4LYO/SKHAGQc+/kUoBaw5A9+3eOQpYC/wpoNHZNEBz1WHpM6T7dS5so06T7Zx6nmy/v1921lPaF8d7DX1nFOe5SggHmnTXS7mPPWw9UBbgq6mQtWGAW5zx9hyX6SUb+Ok17EQYo+tEWlnp86VbdR5sp06V7ZR58k26jzZTp0r2zj6PNmyezEbiK53OQrIbe4Yy/JiAHDWxvsqiqIoiqJc9mwJunYDCUKInkIId2AusLLBMSuBuyzfzwY2SC1ZbCUwVwjhIYToCSQAu+wzdEVRFEVRlEtHq8uLlhyth4A1aCUj3pNSHhZCPA/skVKuBJYA/xFCpKHNcM213PewEGI5cAQwAg+2tHPRSS7R5XiXUOfKNuo82U6dK9uo82QbdZ5sp86VbRx6njpcGyBFURRFUZTLkapIryiKoiiK4gQq6FIURVEURXGCThV0CSGmCiGOCyHShBBPuno8HZEQIloI8T8hxFEhxGEhxK9dPaaOTgihF0LsF0J86+qxdFRCiEAhxAohxDHLa2u0q8fUEQkhfmP5vUsRQiwVQni6ekwdhRDiPSFEgRAipd51wUKItUKIVMu/Qa4cY0fRzLn6h+X376AQ4r9CiEBXjrEjaOo81bvtcSGEFEKE2vM5O03QZWln9AZwLdAPuNXSpki5kBH4rZSyLzAKeFCdp1b9Gq0Gn9K8V4HVUso+wCDU+WpECBEJPAIkSSkHoG1cmuvaUXUoHwBTG1z3JLBeSpkArLdcVpo+V2uBAVLKRLTWfk85e1Ad0Ac0Pk8IIaLRWh/avdN5pwm60Po/pkkpf5JS1gDLgJkuHlOHI6XMk1Lus3xfhvbm2KiLgKIRQkQB04B3XT2WjkoI4Q+MR9vljJSyRkpZ7NpRdVgGwMtS79AbVdewjpRyM9ru+PpmAh9avv8QmOXUQXVQTZ0rKeUPUkqj5eIOtLqZnVozrymAfwG/o4li7u3VmYKuptoZqWCiBUKIGGAIsNO1I+nQFqL9cppdPZAOLBYoBN63LMO+K4TwcfWgOhopZQ7wMtqn6zygREr5g2tH1eF1kVLmgfaBEQh38XguFfNoredcJyWEmAHkSCmTHfH4nSnosqklkaIRQvgCXwCPSilLXT2ejkgIMR0okFLudfVYOjgDMBR4S0o5BKhALQM1YslHmgn0BCIAHyHEHa4dlXK5EUI8jZZG8omrx9LRCCG8gaeBZx31HJ0p6FItiWwkhHBDC7g+kVJ+6erxdGBjgRlCiAy05epJQoiPXTukDikbyJZSWmdMV6AFYcqFJgMnpZSFUspa4EtgjIvH1NGdFkJ0A7D8W+Di8XRoQoi7gOnA7VIV6WxKHNqHnmTL3/UoYJ8Qoqu9nqAzBV22tDPq9IQQAi335qiU8hVXj6cjk1I+JaWMklLGoL2eNkgp1cxEA1LKfCBLCNHbctVVaF0qlAtlAqOEEN6W38OrUBsOWlO/Bd1dwNcuHEuHJoSYCvwemCGlrHT1eDoiKeUhKWW4lDLG8nc9Gxhq+RtmF50m6LIkEFrbGR0FlkspD7t2VB3SWOAXaLM2Byxf17l6UMol72HgEyHEQWAw8FcXj6fDscwErgD2AYfQ/j6r1i0WQoilwHagtxAiWwgxH/gbMEUIkYq22+xvrhxjR9HMufo34AestfxdX+TSQXYAzZwnxz6nmmFUFEVRFEVxvE4z06UoiqIoiuJKKuhSFEVRFEVxAhV0KYqiKIqiOIEKuhRFURRFUZxABV2KoiiKoihOoIIuRVEURVEUJ1BBl6IoiqIoihP8Pxrw5UqK+ZpVAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(10,3))\n",
+ "sns.distplot(approved_word_count, hist=False, label=\"Approved Projects\")\n",
+ "sns.distplot(rejected_word_count, hist=False, label=\"Not Approved Projects\")\n",
+ "plt.legend()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-rcU9zgXoc6Q"
+ },
+ "source": [
+ "### 1.2.7 Univariate Analysis: Text features (Project Essay's)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Aa4a8iSioc6R"
+ },
+ "outputs": [],
+ "source": [
+ "# merge two column text dataframe:\n",
+ "project_data[\"essay\"] = project_data[\"project_essay_1\"].map(str) +\\\n",
+ " project_data[\"project_essay_2\"].map(str) + \\\n",
+ " project_data[\"project_essay_3\"].map(str) + \\\n",
+ " project_data[\"project_essay_4\"].map(str)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "hrGwRq-Hoc6U",
+ "outputId": "e941226f-9b5c-413e-c8a4-e6223108bd54"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#How to calculate number of words in a string in DataFrame: https://stackoverflow.com/a/37483537/4084039\n",
+ "word_count = project_data['essay'].str.split().apply(len).value_counts()\n",
+ "word_dict = dict(word_count)\n",
+ "word_dict = dict(sorted(word_dict.items(), key=lambda kv: kv[1]))\n",
+ "\n",
+ "\n",
+ "ind = np.arange(len(word_dict))\n",
+ "plt.figure(figsize=(20,5))\n",
+ "p1 = plt.bar(ind, list(word_dict.values()))\n",
+ "\n",
+ "plt.ylabel('Number of projects')\n",
+ "plt.xlabel('Number of words in each eassay')\n",
+ "plt.title('Words for each essay of the project')\n",
+ "plt.xticks(ind, list(word_dict.keys()))\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "rhK9C3-Foc6W",
+ "outputId": "8562b5fc-14a8-48f3-9c59-924949aaa938"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "D:\\installed\\Anaconda3\\lib\\site-packages\\matplotlib\\axes\\_axes.py:6571: UserWarning:\n",
+ "\n",
+ "The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.\n",
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.distplot(word_count.values)\n",
+ "plt.title('Words for each essay of the project')\n",
+ "plt.xlabel('Number of words in each eassay')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "yxJn8uEooc6Y"
+ },
+ "outputs": [],
+ "source": [
+ "approved_word_count = project_data[project_data['project_is_approved']==1]['essay'].str.split().apply(len)\n",
+ "approved_word_count = approved_word_count.values\n",
+ "\n",
+ "rejected_word_count = project_data[project_data['project_is_approved']==0]['essay'].str.split().apply(len)\n",
+ "rejected_word_count = rejected_word_count.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "aWXaw61joc6a",
+ "outputId": "eb06fdff-4c51-41c6-c268-94a40a2b12f5"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# https://glowingpython.blogspot.com/2012/09/boxplot-with-matplotlib.html\n",
+ "plt.boxplot([approved_word_count, rejected_word_count])\n",
+ "plt.title('Words for each essay of the project')\n",
+ "plt.xticks([1,2],('Approved Projects','Rejected Projects'))\n",
+ "plt.ylabel('Words in project title')\n",
+ "plt.grid()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "gC2O5Xhqoc6d",
+ "outputId": "6a594b11-9ef3-4499-de8b-0936e6bba97a"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(10,3))\n",
+ "sns.distplot(approved_word_count, hist=False, label=\"Approved Projects\")\n",
+ "sns.distplot(rejected_word_count, hist=False, label=\"Not Approved Projects\")\n",
+ "plt.title('Words for each essay of the project')\n",
+ "plt.xlabel('Number of words in each eassay')\n",
+ "plt.legend()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SBVSAdFRoc6f"
+ },
+ "source": [
+ "### 1.2.8 Univariate Analysis: Cost per project"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "xk4fJ4rKoc6g",
+ "outputId": "6ab35ab8-9bff-4637-8bcc-96f02a90b4ee"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " id \n",
+ " description \n",
+ " quantity \n",
+ " price \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " p233245 \n",
+ " LC652 - Lakeshore Double-Space Mobile Drying Rack \n",
+ " 1 \n",
+ " 149.00 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " p069063 \n",
+ " Bouncy Bands for Desks (Blue support pipes) \n",
+ " 3 \n",
+ " 14.95 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " id description quantity \\\n",
+ "0 p233245 LC652 - Lakeshore Double-Space Mobile Drying Rack 1 \n",
+ "1 p069063 Bouncy Bands for Desks (Blue support pipes) 3 \n",
+ "\n",
+ " price \n",
+ "0 149.00 \n",
+ "1 14.95 "
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# we get the cost of the project using resource.csv file\n",
+ "resource_data.head(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "gLd9rR8Goc6i",
+ "outputId": "4c5d6f5a-c1dd-49d1-a1d4-0f3a77900919"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " id \n",
+ " price \n",
+ " quantity \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " p000001 \n",
+ " 459.56 \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " p000002 \n",
+ " 515.89 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " id price quantity\n",
+ "0 p000001 459.56 7\n",
+ "1 p000002 515.89 21"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# https://stackoverflow.com/questions/22407798/how-to-reset-a-dataframes-indexes-for-all-groups-in-one-step\n",
+ "price_data = resource_data.groupby('id').agg({'price':'sum', 'quantity':'sum'}).reset_index()\n",
+ "price_data.head(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "-510M9TKoc6k"
+ },
+ "outputs": [],
+ "source": [
+ "# join two dataframes in python:\n",
+ "project_data = pd.merge(project_data, price_data, on='id', how='left')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "WxMMxRiEoc6p"
+ },
+ "outputs": [],
+ "source": [
+ "approved_price = project_data[project_data['project_is_approved']==1]['price'].values\n",
+ "\n",
+ "rejected_price = project_data[project_data['project_is_approved']==0]['price'].values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Irsdqxnuoc6r",
+ "outputId": "e4ad6e99-39c0-4e72-dfca-189446a9b567"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# https://glowingpython.blogspot.com/2012/09/boxplot-with-matplotlib.html\n",
+ "plt.boxplot([approved_price, rejected_price])\n",
+ "plt.title('Box Plots of Cost per approved and not approved Projects')\n",
+ "plt.xticks([1,2],('Approved Projects','Rejected Projects'))\n",
+ "plt.ylabel('Words in project title')\n",
+ "plt.grid()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "WjeL5bs5oc6s",
+ "outputId": "d7f9b31c-6f88-494d-979e-5c3ff95eb5f3"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(10,3))\n",
+ "sns.distplot(approved_price, hist=False, label=\"Approved Projects\")\n",
+ "sns.distplot(rejected_price, hist=False, label=\"Not Approved Projects\")\n",
+ "plt.title('Cost per approved and not approved Projects')\n",
+ "plt.xlabel('Cost of a project')\n",
+ "plt.legend()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "m1YvWrg5oc6v",
+ "outputId": "55ff1776-ae3d-45fa-ae25-1eae0176070c"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "+------------+-------------------+-----------------------+\n",
+ "| Percentile | Approved Projects | Not Approved Projects |\n",
+ "+------------+-------------------+-----------------------+\n",
+ "| 0 | 0.66 | 1.97 |\n",
+ "| 5 | 13.59 | 41.9 |\n",
+ "| 10 | 33.88 | 73.67 |\n",
+ "| 15 | 58.0 | 99.109 |\n",
+ "| 20 | 77.38 | 118.56 |\n",
+ "| 25 | 99.95 | 140.892 |\n",
+ "| 30 | 116.68 | 162.23 |\n",
+ "| 35 | 137.232 | 184.014 |\n",
+ "| 40 | 157.0 | 208.632 |\n",
+ "| 45 | 178.265 | 235.106 |\n",
+ "| 50 | 198.99 | 263.145 |\n",
+ "| 55 | 223.99 | 292.61 |\n",
+ "| 60 | 255.63 | 325.144 |\n",
+ "| 65 | 285.412 | 362.39 |\n",
+ "| 70 | 321.225 | 399.99 |\n",
+ "| 75 | 366.075 | 449.945 |\n",
+ "| 80 | 411.67 | 519.282 |\n",
+ "| 85 | 479.0 | 618.276 |\n",
+ "| 90 | 593.11 | 739.356 |\n",
+ "| 95 | 801.598 | 992.486 |\n",
+ "| 100 | 9999.0 | 9999.0 |\n",
+ "+------------+-------------------+-----------------------+\n"
+ ]
+ }
+ ],
+ "source": [
+ "# http://zetcode.com/python/prettytable/\n",
+ "from prettytable import PrettyTable\n",
+ "\n",
+ "x = PrettyTable()\n",
+ "x.field_names = [\"Percentile\", \"Approved Projects\", \"Not Approved Projects\"]\n",
+ "\n",
+ "for i in range(0,101,5):\n",
+ " x.add_row([i,np.round(np.percentile(approved_price,i), 3), np.round(np.percentile(rejected_price,i), 3)])\n",
+ "print(x)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "BKYdquMWoc6y"
+ },
+ "source": [
+ "1.2.9 Univariate Analysis: teacher_number_of_previously_posted_projects "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TyN-Q1aGoc6z"
+ },
+ "source": [
+ "Please do this by yourself\n",
+ "\n",
+ "observe the data analysis that was done in the above cells"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "q9XTOMsLT4OE"
+ },
+ "outputs": [],
+ "source": [
+ "approved_number_of_previously_posted = project_data[project_data['project_is_approved']==1]['teacher_number_of_previously_posted_projects']\n",
+ "approved_number_of_previously_posted = approved_number_of_previously_posted.values\n",
+ "\n",
+ "rejected_number_of_previously_posted = project_data[project_data['project_is_approved']==0]['teacher_number_of_previously_posted_projects']\n",
+ "rejected_number_of_previously_posted = rejected_number_of_previously_posted.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "5QjLj0hYT4OE"
+ },
+ "outputs": [],
+ "source": [
+ "plt.figure(figsize=(10,3))\n",
+ "sns.distplot(approved_number_of_previously_posted, hist=False, label=\"Approved Projects\")\n",
+ "sns.distplot(rejected_number_of_previously_posted, hist=False, label=\"Not Approved Projects\")\n",
+ "plt.title('No. of previously posted projects per approved and not approved Projects')\n",
+ "plt.xlabel('No. of previously posted project')\n",
+ "plt.legend()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "U04xDc8ioc6z"
+ },
+ "source": [
+ "1.2.10 Univariate Analysis: project_resource_summary "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "WA3QCrCCoc60"
+ },
+ "source": [
+ "Please do this by yourself\n",
+ "\n",
+ "check the `presence of the numerical digits` in the `project_resource_summary` effects the acceptance of the project\n",
+ "\n",
+ "if you feel like it will helpfull in the classification, please include in the further process or you can ignore it."
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/exercises.md b/exercises.md
index ae40576..c9d18c4 100644
--- a/exercises.md
+++ b/exercises.md
@@ -1,32 +1,178 @@
--
+May 4th 2024
- Question: [(0,"x"), (1, 12), (0, 34), (1,90), (1,89), (0,"s"), (1, "7")]
+ Q1. create a Base X (eg:- 12,14,18) numering system.
+ - Write two digit numbers in that number system.
+ - Perform Sigle digit addition
+ - Perform double digit addition.
+ - Multipliaction table for 1-10 in this number system.
+ - Perform double digit multiplication.
+ - Convert from Base X to Base 10
+ - Convert from Base 10 to Base X
+--
+May 11th and 12th 2024
+
+ Q1. Calculate grosspay given hours and rate
+ Q2. Rewrite the pay computation to give the employee 1.5 times the hourly rate for hours worked above 40 hours
+ Q3. Write a program to compute the total amount after compounded interest
+
+ Principle: Rate: Time (year):
+ Print the total amount after applying compound interest.
+ Total = Principle * (1 + rate/100)**years
+ Q4. Print all strings that can be generated from a list of letters.
+ Input: abc
+ Output:
+ abc
+ acb
+ bac
+ bca
+ cab
+ cba
+ Q5. BLEU Score. Code it
+
+ Bleu(N) =Brevity Penalty * Geometric Average Precision scores
+ c= predicted length
+ r= target length
+
+ Brevity Penalty
+ =1 , if c>r
+ =e**(1-r/c) , if c<=r
+
+ Geometric Average Precision scores = p1**(1/4) * p2**(1/4) *p3**(1/4) * p4**(1/4)
+ Q6. Code multiplication without using * or loops
+--
+May 18th and 19th 2024
+
+ Q1. Code Tower of Hanoi Problem
+ Q2. Write a wild card character matcher. * matches 0 or more chars. ? matches only one character.
+
+ a* -> abc, ab, ax, a
+ a? -> a1, a2, aa,
+ a*b -> axyzb, a123b, ab
+ a*b*c -> abc, abbbc, a1b1c
+ aa?b*:
+ match: aa1b, aaxby, aa1bcdeffgshshshsh
+ not match: aab, ab,
+ def ismatch(pat, text):
+ return True/False
+
+ Q3. Given two vectors (arrays or list of numbers)
+
+ return the difference of the two vectors
+ diff([1,2,3], [2,3,4]) => [-1, -1, -1]
+ Q4. Given two vectors (arrays or list of numbers). Find absolute distance between them. L1 Distance.
+
+ abs_distance([1,2,3], [2, 3, 4]) -> 3
+ abs_distance([5,4,1], [2, 3, 4]) -> 3+1+3 = 7
+ Q5. Given two 2D vectors representing two points, find the distance between two points.
+
+ distance([1,2], [3,5]) -> sqrt((3 - 1)**2 + (5-2)**2)
+ sqrt(13) = 3.605551275463989
+ Q6. Given two 3D vectors representing two points, find the distance between two points. distance3d([1, 2, 3], [2,3,4] ) -> math.sqrt(3)
- Move all zeros to the begining and all 1s to end without using another list in
- order of n
- Input: [(0,"x"), (1, 12), (0, 34), (1,90), (1,89), (0,"s"), (1, "7")]
- Expected Output: [(0,"x"), (0, 34), (0,"s"), (1,89), (1,90), (1, 12), (1, "7")]
+ Q7. Generalize problem Q4 for n dimensions.
+--
+May 25th and 26th 2024
+
+ Q1. Coin Toss: Create a function that return 0 or 1 with equal probability. Hint: random.random()
+ Q2. Coin Toss: Create a function that return 0, 1, 2 with equal probability.
+ Q3. Create a n faced die which generates number from 0 to n - 1 with equal probability.
+ Q4. Unfair Coin:
+ def coin_toss(p1, p2): # p1 /(p1 + p2), p2/(p1+p2) # return 0 p1 probability # return 1 with p2 probability
+ Q5. coin_toss, takes three probability p1, p2, p3 as arguments.
+ Return 0 with p1 probability. 1 with p2 probability. 2 with p3 probability. coin_toss_3(0.7, 0.2, 0.1) # 0 - 70% of times, 1 - 20% of time, 2 - 10% of time
+ Q6. Generalize coin_toss, takes n probability p1, p2, p3..pn-1 as arguments.
+--
+June 1st and 2nd 2024
+
+ Q1. Code the SQRT
+ Q2. Code the 1/4 power?
+ Q3. Code the 1/5 power
+ Q4. Convert to probability
+ 10, 20, 30 -> 10/(10+20+30), 20/(10+20+30), 30/(10+20+30)
+ Q5: Softmax
+ [1,2,3] -> 10^1, 10^2, 10^3 -> convert_to_prob
+--
+June 7th and 8th 2024
+ Q1. Given array of sorted numbers, check if a number exists in them. Mention the time complexity
+ - using loops
+ - using recursion
+ Q2. You have a list of 0s and 1s....find the count of 0s. the array is sorted. Mention the time complexity
+ count_zeros([0,0,0,0,0,1,1,1,1,1,1]) -> 5
+ Q3. Count 1s in a sorted array of numbers. Mention complexity
+ count_ones([0,0,0,0,0,1,1,1,1,1,1,1.1, 1.2, 2]) ->6
+ Q4. Given a number represented as string convert it integer. Mention time Complexity
+ to_num("145") -> 145
+ Q5. You have two lists of numbers:
+ First list: 100 numbes. not sorted m
+ Second List: Millions of numbers. not sorted n
+ Find all numbers which are common in these two lists
+ Mention time complexity.
--
+June 14th and 15th 2024
- Counting Sort:
- Q: Sort the numbers containing age of people. Billion numbers.
+ Q1. Rewrite this code to make it using circular buffer
+ read a line
+ append a line
+ if size of the buffer is > 10, knock out the first
+ last_n_lines(file_name, num=10)
+
+ Q2. Write a program to simulate circular buffer
+ circle_append(lst, element)
+
+ Q3. Find Longest Line in the file
+ Q4. Implement last_n_lines method using traversing
+ Q5. Find the frequency of words in a file
+ Q6. Checkout the animations and try to code them and calculate their complexity.
+ - BubbleSort
+ - Insertion Sort
+ - MergeSort
+ - QuickSort
+--
+June 22nd and 23rd 2024
- I maintain an array of 200 numbers. 0th index is for people with 0 yrs....
- 200th elements contains count of people with 200 age.
+ Q1. Write code to solve equations
+
+--
+July 6th and 7th 2024
+ Q1. [(0,"x"), (1, 12), (0, 34), (1,90), (1,89), (0,"s"), (1, "7")]
+ Move all zeros to the begining and all 1s to end without using another list in order of n
+ Input: [(0,"x"), (1, 12), (0, 34), (1,90), (1,89), (0,"s"), (1, "7")]
+ Expected Output: [(0,"x"), (0, 34), (0,"s"), (1,89), (1,90), (1, 12), (1, "7")]
+
+ Q2. Counting Sort:
+ Sort the numbers containing age of people. Billion numbers.
+ I maintain an array of 200 numbers. 0th index is for people with 0 yrs....
+ 200th elements contains count of people with 200 age.
+ Q3. Attempt the encode and decorder problem in problem.pdf
+ - dictionary approach
+ - without dictionary approach
--
+July 13th and 14th 2024
- Check problem.pdf
+ Q1. Implement Binary Search Tree Delete (Insert and Find done in class)
+
--
- Code the Bubble Sort: https://yongdanielliang.github.io/animation/web/BubbleSortNew.html
- Insertion Sort
- Quick Sort - Version with O(n) space complexity.
- Merge Sort
+July 20th and 21st 2024
+
+ https://docs.python.org/3/library/heapq.html
+ Q1. Check out Traversal of Tree
+ - Depth first
+ - Breath First
+ Q2.Implement a simple pattern matcher that matches . with single character and * with any number (0 or more) of any character
+ Q3: Write regular expression to match email address
+ Q4: Write regular expression to match URL
+ Q5: Build a regular expression to extract URLs from the server logs 'access.log.41'
--
- Question: Implement Deletion in Binary Search Tree
- Look at animation: https://www.cs.usfca.edu/~galles/visualization/BST.html
- Look at implementaiton in Jul 13 Notebook
+July 27th and 28th 2024
+
+ Go through the below blog on sentiment Analysis
+ https://cloudxlab.com/blog/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on/
+
+ code available in below repo
+ https://github.com/cloudxlab/Hands-On-LLMs-with-OpenAI-and-Langchain/blob/main/Sentiment%20Analysis%20with%20LLMs/Sentiment%20Analysis%20with%20LLMs.ipynb
+
-
\ No newline at end of file