Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

init quickstart #65

Merged
merged 3 commits into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/gettingstarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ we encourage you to open an issue on the
:maxdepth: 1

Installing nested-pandas <gettingstarted/installation>
Contribution Guide <gettingstarted/contributing>
Contribution Guide <gettingstarted/contributing>
Quickstart Guide <gettingstarted/quickstart>
233 changes: 233 additions & 0 deletions docs/gettingstarted/quickstart.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
{
dougbrn marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quickstart"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With a valid Python environment, nested-pandas and it's dependencies are easy to install using the `pip` package manager. The following command can be used to install it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# % pip install nested-pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nested-Pandas is tailored towards efficient analysis of nested datasets. Let's load a toy dataset to show how it works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from nested_pandas.datasets import generate_data\n",
"\n",
"# generate_data creates some toy data\n",
"nf = generate_data(10, 100) # 10 rows, 100 nested rows per row\n",
"nf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above dataframe is a `NestedFrame`, which extends the capabilities of the Pandas `DataFrame` to support columns with nested information. In this example, we have the top level dataframe with 10 rows and 2 typical columns, \"a\" and \"b\". The \"nested\" column contains a dataframe in each row. We can inspect the contents of the \"nested\" column using pandas API tooling like `loc`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nf.loc[0][\"nested\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see that within the \"nested\" column there are `NestedFrame` objects with their own data. In this case we have 3 columns (\"t\", \"flux\", and \"band\"). Alternatively, we could inspect the available columns using some custom properties of the `NestedFrame`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Shows which columns have nested data\n",
"nf.nested_columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Provides a dictionary of \"base\" (top-level) and nested column labels\n",
"nf.all_columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"nested-pandas extends the Pandas API, meaning any operation you could do in Pandas is available within nested-pandas. However, nested-pandas has additional functionality and tooling to better support working with Nested datasets. For example, let's look at `query`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Normal queries work as expected, rejecting rows from the dataframe that don't meet the criteria\n",
"nf.query(\"a > 0.2\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above query is native Pandas, however with nested-pandas we can use hierarchical column names to extend `query` to nested layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Applies the query to \"nested\", filtering based on \"t >17\"\n",
"nf_g = nf.query(\"nested.t > 17.0\")\n",
"nf_g"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This query does not affect the rows of the top-level dataframe, but rather applies the query to the \"nested\" dataframes. If we look at one of them, we can see the effect of the query."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# All t <= 17.0 have been removed\n",
"nf_g.loc[0][\"nested\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A limited set of functions have been extended in this way so far, with the aim being to fully support this hierarchical access where applicable in the Pandas API."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to Pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# use hierarchical column names to access the flux column\n",
"# passed as an array to np.mean\n",
"nf.reduce(np.mean, \"nested.flux\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This can be used to apply any custom functions you need for your analysis, and just to illustrate that point further let's define a custom function that just returns it's inputs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def show_inputs(*args):\n",
" return args"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Applying some inputs via reduce, we see how it sends inputs to a given function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nf_inputs = nf.reduce(show_inputs, \"a\", \"nested.band\")\n",
"nf_inputs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nf_inputs.loc[0]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}