Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revise intro #81

Merged
merged 7 commits into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# nested-pandas
Efficient pandas representation for nested associated datasets.
An extension of pandas for efficient representation of nested
associated datasets.

Nested-Pandas extends the [pandas](https://pandas.pydata.org/) package with
tooling and support for nested dataframes packed into values of top-level
dataframe columns. [Pyarrow](https://arrow.apache.org/docs/python/index.html)
is used intrinsically to aid in scalability and performance.
is used internally to aid in scalability and performance.

![image](./nestedframe.png)

Expand Down
58 changes: 31 additions & 27 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,49 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to nested-pandas's documentation!
========================================================================================
Nested-Pandas
=============

Dev Guide - Getting Started
---------------------------
An extension of pandas for efficient representation of nested
associated datasets.

Before installing any dependencies or writing code, it's a great idea to create a
virtual environment. LINCC-Frameworks engineers primarily use `conda` to manage virtual
environments. If you have conda installed locally, you can run the following to
create and activate a new environment.
Nested-Pandas extends the `pandas <https://pandas.pydata.org/>`_ package with
tooling and support for nested dataframes packed into values of top-level
dataframe columns. `Pyarrow <https://arrow.apache.org/docs/python/index.html>`_
is used internally to aid in scalability and performance.

.. code-block:: console
.. image:: ../nestedframe.png
:width: 600
:align: center
:alt: Example NestedFrame

>> conda create env -n <env_name> python=3.10
>> conda activate <env_name>
Nested-Pandas is motivated by time-domain astronomy use cases, where we see
typically two levels of information, information about astronomical objects and
then an associated set of `N` measurements of those objects. Nested-Pandas offers
a performant and memory-efficient package for working with these types of datasets.

Core advantages being:

Once you have created a new environment, you can install this project for local
development using the following commands:
* hierarchical column access
* efficient packing of nested information into inputs to custom user functions
* avoiding costly groupby operations

.. code-block:: console

>> pip install -e .'[dev]'
>> pre-commit install
>> conda install pandoc
How to Use This Guide
=====================

Begin with the :doc:`Getting Started <gettingstarted/installation>`
guide to learn the basics of installation and walkthrough a simple example of
using nested-pandas.

Notes:
The :doc:`Tutorials <tutorials>`
section showcases the fundamental features of nested-pandas.

1) The single quotes around ``'[dev]'`` may not be required for your operating system.
2) ``pre-commit install`` will initialize pre-commit for this local repository, so
that a set of tests will be run prior to completing a local commit. For more
information, see the Python Project Template documentation on
`pre-commit <https://lincc-ppt.readthedocs.io/en/latest/practices/precommit.html>`_.
3) Installing ``pandoc`` allows you to verify that automatic rendering of Jupyter notebooks
into documentation for ReadTheDocs works as expected. For more information, see
the Python Project Template documentation on
`Sphinx and Python Notebooks <https://lincc-ppt.readthedocs.io/en/latest/practices/sphinx.html#python-notebooks>`_.
API-level information about nested-pandas is viewable in the
:doc:`API Reference <autoapi/index>`
section.

Learn more about contributing to this repository in our :doc:`Contribution Guide <gettingstarted/contributing>`.

.. toctree::
:hidden:
Expand Down
4 changes: 2 additions & 2 deletions docs/notebooks.rst → docs/tutorials.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Notebooks
Tutorials
========================================================================================

.. toctree::

Lower-level interfaces <notebooks/low_level.ipynb>
Lower-level interfaces <tutorials/low_level.ipynb>
File renamed without changes.
File renamed without changes.