Skip to content

Commit

Permalink
doc: add pandas code and use markdown tables
Browse files Browse the repository at this point in the history
  • Loading branch information
juba committed Apr 9, 2024
1 parent 72e8b4a commit 44be57a
Showing 1 changed file with 54 additions and 27 deletions.
81 changes: 54 additions & 27 deletions doc/getting_started.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,15 @@ To create a Lifemap data visualization, you will have to follow these steps:

## Prepare your data

The date you want to visualize on the Lifemap tree of life must be in a [pandas](https://pandas.pydata.org) or [polars](https://pola.rs) DataFrame. They must contain at least observations (species) as rows, and variables as columns, and at least one column must contain the NCBI taxonomy identifier of the species.
The data you want to visualize on the Lifemap tree of life must be in a [pandas](https://pandas.pydata.org) or [polars](https://pola.rs) DataFrame. They must contain observations (species) as rows, and variables as columns, and one column must contain the NCBI taxonomy identifier of the species.

`pylifemap` includes an example polars data file generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species.
`pylifemap` includes an example dataset generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species.

We can import it as a polars DataFrame with the following code:
We can import it as a polars or pandas DataFrame with the following code:

::: {.panel-tabset}

## Polars

```{python}
import polars as pl
Expand All @@ -28,7 +32,20 @@ iucn = pl.read_csv(
)
```

If we display the resulting table, we can see that it only has two columns, one called `taxid` which contains the species identifiers, and another called `status` with the Red List category of each species:
## Pandas

```{python}
#| eval: false
import pandas as pd
iucn = pd.read_csv(
"https://raw.githubusercontent.com/juba/pylifemap/main/data/iucn.csv"
)
```

:::

The resulting table only has two columns: `taxid`, which contains the species identifiers, and `status`, with the Red List category of each species.

```{python}
iucn
Expand Down Expand Up @@ -59,24 +76,19 @@ Lifemap(iucn, taxid_col="taxid", width="100%", height=800)

After initializing our `Lifemap` object, we have to add visualization layers to create graphical representations. There are several different layers available:

<table class="table">
<thead>
<tr><th>Layer</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>[layer_points](layers/layer_points.qmd)</td><td>Displays each observation with a point. Radius and color can be dependent of an attribute in the DataFrame.</td></tr>
<tr><td>[layer_lines](layers/layer_lines.qmd)</td><td>Using aggregated data, highlights branches of the tree by lines of varying width and color.</td></tr>
<tr><td>[layer_donuts](layers/layer_donuts.qmd)</td><td>Displays aggregated categorical data as donut charts.</td></tr>
<tr><td>[layer_heatmap](layers/layer_heatmap.qmd)</td><td>Displays a heatmap of the observations distribution in the tree.</td></tr>
<tr><td>[layer_screengrid](layers/layer_screengrid.qmd)</td><td>Displays the observations distribution with a colored grid with fixed-size cells..</td></tr>
</tbody>
</table>
| Layer | Description |
| :---------------------------------------------- | :----------------------------------------------------------------------------------------------------------- |
| [layer_points](layers/layer_points.qmd) | Displays each observation with a point. Radius and color can be dependent of an attribute in the DataFrame. |
| [layer_lines](layers/layer_lines.qmd) | Using aggregated data, highlights branches of the tree with lines of varying width and color. |
| [layer_donuts](layers/layer_donuts.qmd) | Displays aggregated categorical data as donut charts. |
| [layer_heatmap](layers/layer_heatmap.qmd) | Displays a heatmap of the observations distribution in the tree. |
| [layer_screengrid](layers/layer_screengrid.qmd) | Displays the observations distribution with a colored grid with fixed-size cells.. |

To add a layer, we just have to call the corresponding `layer_` method of our `Lifemap` object. For example, to add a points layer:

```{python}
#| eval: false
éLifemap(iucn, taxid_col="taxid").layer_points()
Lifemap(iucn, taxid_col="taxid").layer_points()
```

We can add several layers by calling several methods. For example we could display a heatmap layer, and a points layer above it:
Expand Down Expand Up @@ -110,6 +122,7 @@ Lifemap(iucn, taxid_col="taxid").layer_points().save("lifemap.html")

Each layer accepts a certain number of arguments to customize its appearance. For example we can change the radius and opacity of our points and make their color depend on their `status` value:


```{python}
(
Lifemap(iucn, taxid_col="taxid")
Expand All @@ -124,23 +137,37 @@ Each layer accepts a certain number of arguments to customize its appearance. Fo

`pylifemap` provides several aggregation functions that allow to aggregate data along the branches of the tree:

<table class="table">
<thead>
<tr><th>Function</th><th>Description</th></tr>
</thead>
<tbody>
<tr><td>[aggregate_count](`~pylifemap.aggregations.aggregate_count`)</td><td>Aggregates the number of children of each tree node.</td></tr>
<tr><td>[aggregate_num](`~pylifemap.aggregations.aggregate_num`)</td><td>Aggregates a numerical variable along the tree branches with a given function (sum, mean, max...).</td></tr>
<tr><td>[aggregate_freq](`~pylifemap.aggregations.aggregate_freq`)</td><td>Aggregates the frequencies of the levels of a categorical variable.</td></tr>
</tbody>
</table>


| Function | Description |
| :----------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- |
| [aggregate_count](`~pylifemap.aggregations.aggregate_count`) | Aggregates the number of children of each tree node. |
| [aggregate_num](`~pylifemap.aggregations.aggregate_num`) | Aggregates a numerical variable along the tree branches with a given function (sum , mean, max...). |
| [aggregate_freq](`~pylifemap.aggregations.aggregate_freq`) | Aggregates the frequencies of the levels of a categorical variable. |



For example, we could filter out in our data set the species which have an "extinct" status:


::: {.panel-tabset}

## Polars

```{python}
iucn_extinct = iucn.filter(pl.col("status") == "Extinct")
```

## Pandas

```{python}
#| eval: false
iucn_extinct = iucn[iucn["status"] == "Extinct"]
```

:::


We can then aggregate their count along the branches with [aggregate_count](`~pylifemap.aggregations.aggregate_count`):

```{python}
Expand Down

0 comments on commit 44be57a

Please sign in to comment.