Pandas is a data analysis library for the Python.
Pandas is a tool for data wrangling or munging. It is designed for quick and easy data manipulation, reading, aggregation, and visualization.
It take data in a CSV or TSV file or a SQL database and create a Python object with rows and columns called a data frame. The data frame is very similar to a table in statistical software, say Excel or SPSS.
Below is a list of things that can be achieved using Pandas:
- Indexing, manipulating, renaming, sorting, merging data frame
- Update, Add, Delete columns from a data frame
- Impute missing files, handle missing data or NANs
- Plot data with histogram or box plot
- Install Pandas
- Getting started
- DataFrame and Series data types
- Indexes
- Filtering
- Updating Rows and Columns
- Add and Remove rows and columns
Stack Overflow survey can be good sample to start learning analysis.
Browse to: https://insights.stackoverflow.com/survey/
And download a Zip file for any year and extract it in your machine.
Create a new environment:
$ python3 -m venv pandas_env
And activate it:
$ source pandas_env/bin/activate
Install Pandas:
pip install pandas
It's not a necessity to have Jupyter Notebooks. But it allows to see data more easily in the browser.
Install jupyter with:
$ pip install jupyterlab
And run it in a separate terminal - with the same virtual environment. Because the Jupyter will run as long as the terminal is active.:
$ jupyter notebook
In the browser app, create a new Python3 Notebook.
Give it a name (instead of Untitled).
We are ready to use Pandas.
- Top 10 Python Libraries for Data Science
- Python Pandas Tutorial