Skip to content

Latest commit

 

History

History
88 lines (57 loc) · 2.01 KB

README.md

File metadata and controls

88 lines (57 loc) · 2.01 KB

Learning Pandas

Pandas is a data analysis library for the Python.
Pandas is a tool for data wrangling or munging. It is designed for quick and easy data manipulation, reading, aggregation, and visualization.

It take data in a CSV or TSV file or a SQL database and create a Python object with rows and columns called a data frame. The data frame is very similar to a table in statistical software, say Excel or SPSS.

Below is a list of things that can be achieved using Pandas:

  • Indexing, manipulating, renaming, sorting, merging data frame
  • Update, Add, Delete columns from a data frame
  • Impute missing files, handle missing data or NANs
  • Plot data with histogram or box plot

Table of Contents

  1. Install Pandas
  2. Getting started
  3. DataFrame and Series data types
  4. Indexes
  5. Filtering
  6. Updating Rows and Columns
  7. Add and Remove rows and columns

Get a sample data to work with

Stack Overflow survey can be good sample to start learning analysis.

Browse to: https://insights.stackoverflow.com/survey/

And download a Zip file for any year and extract it in your machine.


Install Pandas (on Virtual Environment)

Create a new environment:

$ python3 -m venv pandas_env

And activate it:

$ source pandas_env/bin/activate

Install Pandas:

pip install pandas

Jupyter Notebook

It's not a necessity to have Jupyter Notebooks. But it allows to see data more easily in the browser.

Install jupyter with:

$ pip install jupyterlab

And run it in a separate terminal - with the same virtual environment. Because the Jupyter will run as long as the terminal is active.:

$ jupyter notebook

In the browser app, create a new Python3 Notebook.
Give it a name (instead of Untitled).


We are ready to use Pandas.


References