Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update intake to v2 #56

Merged
merged 17 commits into from
Jul 19, 2024
2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
fail-fast: false
matrix:
os: ["macos-latest", "ubuntu-latest", "windows-latest"]
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.9", "3.10", "3.11"]
steps:
- name: Checkout source
uses: actions/checkout@v2
Expand Down
28 changes: 11 additions & 17 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,12 @@ repos:
exclude: docs/conf.py
args: [--max-line-length=105 ]

- repo: https://github.com/pre-commit/mirrors-isort
rev: v5.10.1
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
additional_dependencies: [toml]
exclude: ^(docs|setup.py)
args: [--project=gcm_filters, --multi-line=3, --lines-after-imports=2, --lines-between-types=1, --trailing-comma, --force-grid-wrap=0, --use-parentheses, --line-width=88]

- repo: https://github.com/asottile/seed-isort-config
rev: v2.2.0
hooks:
- id: seed-isort-config
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files", "--lines-after-imports=2", "--project=gcm_filters", "--multi-line=3", "--lines-between-types=1", "--trailing-comma", "--force-grid-wrap=0", "--use-parentheses", "--line-width=88"]

- repo: https://github.com/psf/black
rev: 22.10.0
Expand All @@ -56,9 +50,9 @@ repos:
exclude: docs/source/conf.py
args: [--ignore-missing-imports]

# - repo: https://github.com/codespell-project/codespell
# rev: v1.16.0
# hooks:
# - id: codespell
# args:
# - --quiet-level=2
- repo: https://github.com/codespell-project/codespell
rev: v2.1.0
hooks:
- id: codespell
args:
- --quiet-level=2
8 changes: 4 additions & 4 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ build:
# uncomment to build from this exact version of package
# the downside is the version listed in the docs will be a dev version
# if uncommenting this, comment out installing pypi version of package in docs/env file
# python:
# install:
# - method: pip
# path: ./
python:
install:
- method: pip
path: ./

conda:
environment: docs/environment.yml
Expand Down
9 changes: 0 additions & 9 deletions MANIFEST.in

This file was deleted.

142 changes: 44 additions & 98 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,13 @@ For changes prior to 2022-10-19, all contributions are Copyright James Munroe, s



Intake is a lightweight set of tools for loading and sharing data in data
science projects. Intake ERDDAP provides a set of integrations for ERDDAP.
Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake ERDDAP provides a set of integrations for ERDDAP.

- Quickly identify all datasets from an ERDDAP service in a geographic region,
or containing certain variables.
- Quickly identify all datasets from an ERDDAP service in a geographic region, or containing certain variables.
- Produce a pandas DataFrame for a given dataset or query.
- Get an xarray Dataset for the Gridded datasets.

The Key features are:
The key features are:

- Pandas DataFrames for any TableDAP dataset.
- xarray Datasets for any GridDAP datasets.
Expand All @@ -59,7 +57,7 @@ project is available on PyPI, so it can be installed using `pip`
The following are prerequisites for a developer environment for this project:

- [conda](https://docs.conda.io/en/latest/miniconda.html)
- (optional but highly recommended) [mamba](https://mamba.readthedocs.io/en/latest/) Hint: `conda install -c conda-forge mamba`
- (optional but highly recommended) [mamba](https://mamba.readthedocs.io/en/latest/). Hint: `conda install -c conda-forge mamba`

Note: if `mamba` isn't installed, replace all instances of `mamba` in the following instructions with `conda`.

Expand All @@ -83,126 +81,74 @@ Note: if `mamba` isn't installed, replace all instances of `mamba` in the follow
pip install -e .
```

Note that you need to install with `pip install .` once to get the `entry_points` correct too.

## Examples

To create an intake catalog for all of the ERDDAP's TableDAP offerings use:
To create an `intake` catalog for all of the ERDDAP's TableDAP offerings use:

```python
import intake
catalog = intake.open_erddap_cat(
import intake_erddap
catalog = intake_erddap.ERDDAPCatalogReader(
server="https://erddap.sensors.ioos.us/erddap"
)
).read()
```


The catalog objects behave like a dictionary with the keys representing the
dataset's unique identifier within ERDDAP, and the values being the
`TableDAPSource` objects. To access a source object:
The catalog objects behave like a dictionary with the keys representing the dataset's unique identifier within ERDDAP, and the values being the `TableDAPReader` objects. To access a Reader object (for a single dataset, in this case for dataset_id "aoos_204"):

```python
source = catalog["datasetid"]
dataset = catalog["aoos_204"]
```

From the source object, a pandas DataFrame can be retrieved:
From the reader object, a pandas DataFrame can be retrieved:

```python
df = source.read()
df = dataset.read()
```

Find other dataset_ids available with

```python
list(catalog)
```

Consider a case where you need to find all wind data near Florida:

```python
import intake
import intake_erddap
from datetime import datetime
bbox = (-87.84, 24.05, -77.11, 31.27)
catalog = intake.open_erddap_cat(
catalog = intake_erddap.ERDDAPCatalogReader(
server="https://erddap.sensors.ioos.us/erddap",
bbox=bbox,
intersection="union",
start_time=datetime(2022, 1, 1),
end_time=datetime(2023, 1, 1),
standard_names=["wind_speed", "wind_from_direction"],
)
variables=["wind_speed", "wind_from_direction"],
).read()

df = next(catalog.values()).read()
dataset_id = list(catalog)[0]
print(dataset_id)
df = catalog[dataset_id].read()
```

Using the `standard_names` input with `intersection="union"` searches for datasets that have both "wind_speed" and "wind_from_direction". Using the `variables` input subsequently narrows the dataset to only those columns, plus "time", "latitude", "longitude", and "z".

<table class="align-default">
<thead>
<tr style="text-align: right;">
<th></th>
<th>time (UTC)</th>
<th>wind_speed (m.s-1)</th>
<th>wind_from_direction (degrees)</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2022-12-14T19:40:00Z</td>
<td>7.0</td>
<td>140.0</td>
</tr>
<tr>
<th>1</th>
<td>2022-12-14T19:20:00Z</td>
<td>7.0</td>
<td>120.0</td>
</tr>
<tr>
<th>2</th>
<td>2022-12-14T19:10:00Z</td>
<td>NaN</td>
<td>NaN</td>
</tr>
<tr>
<th>3</th>
<td>2022-12-14T19:00:00Z</td>
<td>9.0</td>
<td>130.0</td>
</tr>
<tr>
<th>4</th>
<td>2022-12-14T18:50:00Z</td>
<td>9.0</td>
<td>130.0</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>48296</th>
<td>2022-01-01T00:40:00Z</td>
<td>4.0</td>
<td>120.0</td>
</tr>
<tr>
<th>48297</th>
<td>2022-01-01T00:30:00Z</td>
<td>3.0</td>
<td>130.0</td>
</tr>
<tr>
<th>48298</th>
<td>2022-01-01T00:20:00Z</td>
<td>4.0</td>
<td>120.0</td>
</tr>
<tr>
<th>48299</th>
<td>2022-01-01T00:10:00Z</td>
<td>4.0</td>
<td>130.0</td>
</tr>
<tr>
<th>48300</th>
<td>2022-01-01T00:00:00Z</td>
<td>4.0</td>
<td>130.0</td>
</tr>
</tbody>
</table>
```python
time (UTC) latitude (degrees_north) ... wind_speed (m.s-1) wind_from_direction (degrees)
0 2022-01-01T00:00:00Z 28.508 ... 3.6 126.0
1 2022-01-01T00:10:00Z 28.508 ... 3.8 126.0
2 2022-01-01T00:20:00Z 28.508 ... 3.6 124.0
3 2022-01-01T00:30:00Z 28.508 ... 3.4 125.0
4 2022-01-01T00:40:00Z 28.508 ... 3.5 124.0
... ... ... ... ... ...
52524 2022-12-31T23:20:00Z 28.508 ... 5.9 176.0
52525 2022-12-31T23:30:00Z 28.508 ... 6.8 177.0
52526 2022-12-31T23:40:00Z 28.508 ... 7.2 175.0
52527 2022-12-31T23:50:00Z 28.508 ... 7.4 169.0
52528 2023-01-01T00:00:00Z 28.508 ... 8.1 171.0

[52529 rows x 6 columns]
```
5 changes: 4 additions & 1 deletion ci/environment-py3.10.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@ channels:
- conda-forge
dependencies:
- python=3.10
- appdirs
- fsspec
- numpy
- dask
- pandas
- erddapy
- panel
- intake
- intake-xarray>=0.6.1
- pytest
- pytest-cov
- isort
Expand All @@ -19,6 +20,8 @@ dependencies:
- mypy
- codecov
- coverage[toml]
- xarray
- pip
- pip:
- git+https://github.com/intake/intake
- cf-pandas
9 changes: 6 additions & 3 deletions ci/environment-py3.8.yml → ci/environment-py3.11.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@ name: test-env
channels:
- conda-forge
dependencies:
- python=3.8
- python=3.11
- appdirs
- fsspec
- numpy
- dask
- pandas
- erddapy
- panel
- intake
- intake-xarray>=0.6.1
# - intake
- pytest
- pytest-cov
- isort
Expand All @@ -19,6 +20,8 @@ dependencies:
- mypy
- codecov
- coverage[toml]
- xarray
- pip
- pip:
- git+https://github.com/intake/intake
- cf-pandas
7 changes: 5 additions & 2 deletions ci/environment-py3.9.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@ channels:
- conda-forge
dependencies:
- python=3.9
- appdirs
- numpy
- dask
- pandas
- erddapy
- fsspec
- panel
- intake
- intake-xarray>=0.6.1
# - intake
- pytest
- pytest-cov
- isort
Expand All @@ -19,6 +20,8 @@ dependencies:
- mypy
- codecov
- coverage[toml]
- xarray
- pip
- pip:
- git+https://github.com/intake/intake
- cf-pandas
29 changes: 7 additions & 22 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,12 @@
``intake-erddap`` Python API
=============================

.. toctree::
:maxdepth: 2
:caption: Documentation
.. currentmodule:: intake_erddap

.. autosummary::
:toctree: generated/
:recursive:

``intake-erddap`` catalog
-------------------------


.. autoclass:: intake_erddap.erddap_cat.ERDDAPCatalog
:members: get_client, get_search_urls

``intake-erddap`` source
------------------------


.. autoclass:: intake_erddap.erddap.ERDDAPSource
:members: get_client

.. autoclass:: intake_erddap.erddap.TableDAPSource
:members: read, read_partition, read_chunked

.. autoclass:: intake_erddap.erddap.GridDAPSource
:members: read_partition, read_chunked, to_dask, close
ERDDAPCatalogReader
TableDAPReader
GridDAPReader
Loading
Loading