Skip to content

Commit

Permalink
Update Docs (#232)
Browse files Browse the repository at this point in the history
* Fix auto-generated API Reference page

* Add favicon

* Fix IPC heading levels

* Connecting to OmniSci Cloud

* Clean up querying section

* Add faq
  • Loading branch information
randyzwitch authored May 10, 2019
1 parent 590b203 commit 0ffc1af
Show file tree
Hide file tree
Showing 11 changed files with 244 additions and 102 deletions.
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
#
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
html_favicon = 'favicon.ico'


# Theme options are theme-specific and customize the look and feel of a theme
Expand Down
36 changes: 36 additions & 0 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,42 @@ After the bindings are generated, copy them to their respective folders in the p
are their own package within the overall pymapd package. Also, take note to remove unneeded imports as shown in this `commit`_, as the unneeded imports
can be problematic, especially when calling pymapd from other languages (specifically, R).

--------------------------
Updating the Documentation
--------------------------

The documentation for pymapd is generated by ReadTheDocs on each commit. Some pages (such as this one) are manually created,
others such as the API Reference is generated by the docstrings from each method.

If you are planning on making non-trival changes to the documentation and want to preview the result before making a commit,
you need to install sphinx and sphinx-rtd-theme into your development environment:

.. code-block:: shell
pip install sphinx sphinx-rtd-theme
Once you have sphinx installed, to build the documentation switch to the ``pymapd/docs`` directory and run ``make html``. This will update the documentation
in the ``pymapd/docs/build/html`` directory. From that directory, running ``python -m http.server`` will allow you to preview the site on ``localhost:8000``
in the browser. Run ``make html`` each time you save a file to see the file changes in the documentation.

--------------------------
Updating the Documentation
--------------------------

The documentation for pymapd is generated by ReadTheDocs on each commit. Some pages (such as this one) are manually created,
others such as the API Reference is generated by the docstrings from each method.

If you are planning on making non-trival changes to the documentation and want to preview the result before making a commit,
you need to install sphinx and sphinx-rtd-theme into your development environment:

.. code-block:: shell
pip install sphinx sphinx-rtd-theme
Once you have sphinx installed, to build the documentation switch to the ``pymapd/docs`` directory and run ``make html``. This will update the documentation
in the ``pymapd/docs/build/html`` directory. From that directory, running ``python -m http.server`` will allow you to preview the site on ``localhost:8000``
in the browser. Run ``make html`` each time you save a file to see the file changes in the documentation.

--------------------------------
Publishing a new package version
--------------------------------
Expand Down
74 changes: 74 additions & 0 deletions docs/source/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
.. _faq:

FAQ and Known Limitations
=========================

This page contains information that doesn't fit into other pages or is
important enough to be called out separately. If you have a question or tidbit
of information that you feel should be included here, please create an `issue`_
and/or `pull request`_ to get it added to this page.

.. note::
While we strive to keep this page updated, bugfixes and new features
are being added regularly. If information on this page conflicts with
your experience, please open an `issue`_ or drop by our `Community forum`_
to get clarification.


FAQ
***

:Q: Why do ``select_ipc()`` and ``select_ipc_gpu()`` give me errors, but ``execute()``
works fine?

:A: Both ``select_ipc()`` and ``select_ipc_gpu()`` require running the pymapd code
on the same machine where OmniSci is running. This also implies that these two
methods will not work on Windows machines, just Linux (CPU and GPU) and OSX (CPU-only).

..
:Q: Why do geospatial data get uploaded as ``TEXT ENCODED DICT(32)``?

:A: When using ``load_table`` with ``create=True`` or ``create='infer'``, data
where type cannot be easily inferred will default to ``TEXT ENCODED DICT(32)``.
To solve this issue, create the table definition before loading the data.



Helpful Hints
*************

* Convert your timestamps to UTC
OmniSci stores timestamps as UTC. When loading data to OmniSci, plain Python
``datetime`` objects are assumed to be UTC. If the ``datetime`` object has
localization, only ``datetime64[ns, UTC]`` is supported.

* When loading data, hand-create table schema if performance is critical
While the ``load_table()`` does provide a keyword argument ``create`` to
auto-create the table before attempting to load to OmniSci, this functionality
is for *convenience purposes only*. The user is in a much better position
to know the exact data types of the input data than the heuristics used by pymapd.

Additionally, pymapd does not attempt to use the smallest possible column
width to represent your data. For example, significant reductions in disk
storage and a larger amount of 'hot data' can be realized if your data fits
in a ``TINYINT`` column vs storing it as an ``INTEGER``.

Known Limitations
*****************

* OmniSci ``BIGINT`` is 64-bit
Be careful using pymapd on 32-bit systems, as we do not check for integer
overflow when returning a query.

* ``DECIMAL`` types returned as Python ``float``
OmniSci stores and performs ``DECIMAL`` calculations within the
database at the column-definition level of precision. However, the results
are currently returned back to Python as float. We are evaluating how to
change this behavior, so that exact decimal representations is consistent on
the server and in Python.


.. _issue: https://github.com/omnisci/pymapd/issues
.. _pull request: https://github.com/omnisci/pymapd/issues
.. _Community forum: https://community.omnisci.com/forum
Binary file added docs/source/favicon.ico
Binary file not shown.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ the `Apache Arrow`_-based `cudf GPU DataFrame`_ format for efficient data interc
api
contributing
releasenotes
faq


.. _DB-API-2.0: https://www.python.org/dev/peps/pep-0249/
Expand Down
78 changes: 54 additions & 24 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ clients will feel similar to pymapd.
.. note::

This tutorial assumes you have an OmniSci server running on ``localhost:6274`` with the
default logins and databases, and have loaded the example "flights_2008_10k"
dataset.
default logins and databases, and have loaded the example ``flights_2008_10k``
dataset. This dataset can be obtained from the ``insert_sample_data`` script included
in the OmniSci install directory.

Installing pymapd
-----------------
Expand All @@ -31,7 +32,7 @@ pymapd
pip install pymapd
If you have an NVIDIA GPU in the same machine where your pymapd code will be running, you'll want to `install
cudf`_ as well if you want to return results sets into GPU memory as a cudf GPU DataFrame:
cudf`_ as well to return results sets into GPU memory as a cudf GPU DataFrame:

cudf via conda
**************
Expand Down Expand Up @@ -59,7 +60,15 @@ cudf via PyPI/pip
Connecting
----------

Create a :class:`Connection` with
Self-Hosted Install
*******************

For self-hosted OmniSci installs, use ``protocol='binary'`` (this is the default)
to connect with OmniSci, as this will have better performance than using
``protocol='http'`` or ``protocol='https'``.

To create a :class:`Connection` using the ``connect()`` method along with ``user``,
``password``, ``host`` and ``dbname``:

.. code-block:: python
Expand All @@ -69,22 +78,22 @@ Create a :class:`Connection` with
>>> con
Connection(mapd://mapd:***@localhost:6274/mapd?protocol=binary)
or by passing in a connection string
Alternatively, you can pass in a `SQLAlchemy`_-compliant connection string to
the ``connect()`` method:

.. code-block:: python
>>> uri = "mapd://mapd:HyperInteractive@localhost:6274/mapd?protocol=binary"
>>> con = connect(uri=uri)
Connection(mapd://mapd:***@localhost:6274/mapd?protocol=binary)
See the `SQLAlchemy`_ documentation on what makes up a connection string. The
components are::
OmniSci Cloud
*************

dialect+driver://username:password@host:port/database
When connecting to OmniSci Cloud, the two methods are the same as above,
however you can only use ``protocol='https'``. For a step-by-step walk-through with
screenshots, please see this `blog post`_.

For ``pymapd``, the ``dialect+driver`` will always be ``mapd``, and we look for
a ``protocol`` argument in the optional query parameters (everything following
the ``?`` after ``database``).

Querying
--------
Expand All @@ -109,11 +118,13 @@ that your OmniSci database is running on the same machine.
and microseconds granularity. Support for nanoseconds, ``Timestamp(9)`` is in
progress.

GPU Select
^^^^^^^^^^
GPU Shared Memory
*****************

Use :meth:`Connection.select_ipc_gpu` to select data into a ``GpuDataFrame``,
provided by `cudf`_
provided by `cudf`_. To use this method, **the Python code must be running
on the same machine as the OmniSci installation AND you must have an NVIDIA GPU
installed.**

.. code-block:: python
Expand All @@ -127,11 +138,13 @@ provided by `cudf`_
3 4 -3
4 12 7
CPU Shared Memory Select
^^^^^^^^^^^^^^^^^^^^^^^^
CPU Shared Memory
*****************

Use :meth:`Connection.select_ipc` to select data into a pandas ``DataFrame``
using CPU shared memory to avoid unnecessary intermediate copies.
using CPU shared memory to avoid unnecessary intermediate copies. To use this
method, **the Python code must be running on the same machine as the OmniSci
installation.**

.. code-block:: python
Expand All @@ -144,10 +157,28 @@ using CPU shared memory to avoid unnecessary intermediate copies.
3 4 -3
4 12 7
pandas.read_sql()
*****************

With a :class:`Connection` defined, you can use ``pandass.read_sql()`` to
read your data in a pandas ``DataFrame``. This will be slower than using
:meth:`Connection.select_ipc`, but works regardless of where the Python code
is running (i.e. ``select_ipc()`` must be on the same machine as the OmniSci
install, ``pandas.read_sql()`` works everywhere):

.. code-block:: python
>>> from pymapd import connect
>>> import pandas as pd
>>> con = connect(user="mapd", password="HyperInteractive", host="localhost",
... dbname="mapd")
>>> df = pd.read_sql("SELECT depdelay, arrdelay FROM flights_2008_10k limit 100", con)
Cursors
-------
*******

A cursor can be created with :meth:`Connection.cursor`
After connecting to OmniSci, a cursor can be created with :meth:`Connection.cursor`:

.. code-block:: python
Expand Down Expand Up @@ -225,13 +256,11 @@ If you aren't using arrow or pandas you can pass list of tuples to
The high-level :meth:`Connection.load_table` method will choose the fastest
method available based on the type of ``data`` and whether or not ``pyarrow`` is
installed.
method available based on the type of ``data``.

* lists of tuples are always loaded with :meth:`Connection.load_table_rowwise`
* If ``pyarrow`` is installed, a ``pandas.DataFrame`` or ``pyarrow.Table`` will
be loaded using :meth:`Connection.load_table_arrow`
* If ``pyarrow`` is not installed, a ``pandas.DataFrame`` will be loaded using
* A ``pandas.DataFrame`` or ``pyarrow.Table`` will be loaded using :meth:`Connection.load_table_arrow`
* If upload fails using the arrow method, a ``pandas.DataFrame`` can be loaded using
:meth:`Connection.load_table_columnar`

Database Metadata
Expand Down Expand Up @@ -260,3 +289,4 @@ Some helpful metadata are available on the ``Connection`` object.
.. _Apache Arrow: http://arrow.apache.org/
.. _conda-forge: http://conda-forge.github.io/
.. _install cudf: https://github.com/rapidsai/cudf#installation
.. _blog post: https://www.omnisci.com/blog/using-pymapd-to-load-data-to-omnisci-cloud
2 changes: 1 addition & 1 deletion pymapd/_mutators.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def set_tdf(self, tdf):
Parameters
----------
tdf : TDataFrame
tdf: TDataFrame
A SQL select statement
Example
Expand Down
4 changes: 2 additions & 2 deletions pymapd/_transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ def change_dashboard_sources(dashboard, remap):
Parameters
----------
dashboard : A dictionary containing the old dashboard state
dashboard: A dictionary containing the old dashboard state
remap: A dictionary containing the new dashboard state to be mapped
Returns
-------
dashboard : A base64 encoded json object containing the new dashboard state
dashboard: A base64 encoded json object containing the new dashboard state
"""
dm = json.loads(dashboard.dashboard_metadata)
tlst = map(str.strip, dm.get('table', '').split(','))
Expand Down
Loading

0 comments on commit 0ffc1af

Please sign in to comment.