From 44be57a4dc25077466c1ab208223ba51229f743f Mon Sep 17 00:00:00 2001
From: Julien Barnier <julien.barnier@cnrs.fr>
Date: Tue, 9 Apr 2024 10:24:51 +0200
Subject: [PATCH] doc: add pandas code and use markdown tables

---
 doc/getting_started.qmd | 81 +++++++++++++++++++++++++++--------------
 1 file changed, 54 insertions(+), 27 deletions(-)
diff --git a/doc/getting_started.qmd b/doc/getting_started.qmd
index 9b9b6b6..94b22c1 100644
--- a/doc/getting_started.qmd
+++ b/doc/getting_started.qmd
@@ -14,11 +14,15 @@ To create a Lifemap data visualization, you will have to follow these steps:
 
 ## Prepare your data
 
-The date you want to visualize on the Lifemap tree of life must be in a [pandas](https://pandas.pydata.org) or [polars](https://pola.rs) DataFrame. They must contain at least observations (species) as rows, and variables as columns, and at least one column must contain the NCBI taxonomy identifier of the species.
+The data you want to visualize on the Lifemap tree of life must be in a [pandas](https://pandas.pydata.org) or [polars](https://pola.rs) DataFrame. They must contain observations (species) as rows, and variables as columns, and one column must contain the NCBI taxonomy identifier of the species.
 
-`pylifemap` includes an example polars data file generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species.
+`pylifemap` includes an example dataset generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species.
 
-We can import it as a polars DataFrame with the following code:
+We can import it as a polars or pandas DataFrame with the following code:
+
+::: {.panel-tabset}
+
+## Polars
 
 ```{python}
 import polars as pl
@@ -28,7 +32,20 @@ iucn = pl.read_csv(
 )
 ```
 
-If we display the resulting table, we can see that it only has two columns, one called `taxid` which contains the species identifiers, and another called `status` with the Red List category of each species:
+## Pandas
+
+```{python}
+#| eval: false
+import pandas as pd
+
+iucn = pd.read_csv(
+    "https://raw.githubusercontent.com/juba/pylifemap/main/data/iucn.csv"
+)
+```
+
+:::
+
+The resulting table only has two columns: `taxid`, which contains the species identifiers, and `status`, with the Red List category of each species.
 
 ```{python}
 iucn
@@ -59,24 +76,19 @@ Lifemap(iucn, taxid_col="taxid", width="100%", height=800)
 
 After initializing our `Lifemap` object, we have to add visualization layers to create graphical representations. There are several different layers available:
 
-<table class="table">
-<thead>
-<tr><th>Layer</th><th>Description</th></tr>
-</thead>
-<tbody>
-<tr><td>[layer_points](layers/layer_points.qmd)</td><td>Displays each observation with a point. Radius and color can be dependent of an  attribute in the DataFrame.</td></tr>
-<tr><td>[layer_lines](layers/layer_lines.qmd)</td><td>Using aggregated data, highlights branches of the tree by lines of varying width and color.</td></tr>
-<tr><td>[layer_donuts](layers/layer_donuts.qmd)</td><td>Displays aggregated categorical data as donut charts.</td></tr>
-<tr><td>[layer_heatmap](layers/layer_heatmap.qmd)</td><td>Displays a heatmap of the observations distribution in the tree.</td></tr>
-<tr><td>[layer_screengrid](layers/layer_screengrid.qmd)</td><td>Displays the observations distribution with a colored grid with fixed-size cells..</td></tr>
-</tbody>
-</table>
+| Layer                                           | Description                                                                                                  |
+| :---------------------------------------------- | :----------------------------------------------------------------------------------------------------------- |
+| [layer_points](layers/layer_points.qmd)         | Displays each observation with a point. Radius and color can be dependent of an  attribute in the DataFrame. |
+| [layer_lines](layers/layer_lines.qmd)           | Using aggregated data, highlights branches of the tree with lines of varying width and color.                |
+| [layer_donuts](layers/layer_donuts.qmd)         | Displays aggregated categorical data as donut charts.                                                        |
+| [layer_heatmap](layers/layer_heatmap.qmd)       | Displays a heatmap of the observations distribution in the tree.                                             |
+| [layer_screengrid](layers/layer_screengrid.qmd) | Displays the observations distribution with a colored grid with fixed-size cells..                           |
 
 To add a layer, we just have to call the corresponding `layer_` method of our `Lifemap` object. For example, to add a points layer:
 
 ```{python}
 #| eval: false
-éLifemap(iucn, taxid_col="taxid").layer_points()
+Lifemap(iucn, taxid_col="taxid").layer_points()
 ```
 
 We can add several layers by calling several methods. For example we could display a heatmap layer, and a points layer above it:
@@ -110,6 +122,7 @@ Lifemap(iucn, taxid_col="taxid").layer_points().save("lifemap.html")
 
 Each layer accepts a certain number of arguments to customize its appearance. For example we can change the radius and opacity of our points and make their color depend on their `status` value:
 
+
 ```{python}
 (
     Lifemap(iucn, taxid_col="taxid")
@@ -124,23 +137,37 @@ Each layer accepts a certain number of arguments to customize its appearance. Fo
 
 `pylifemap` provides several aggregation functions that allow to aggregate data along the branches of the tree:
 
-<table class="table">
-<thead>
-<tr><th>Function</th><th>Description</th></tr>
-</thead>
-<tbody>
-<tr><td>[aggregate_count](`~pylifemap.aggregations.aggregate_count`)</td><td>Aggregates the number of children of each tree node.</td></tr>
-<tr><td>[aggregate_num](`~pylifemap.aggregations.aggregate_num`)</td><td>Aggregates a numerical variable along the tree branches with a given function (sum, mean, max...).</td></tr>
-<tr><td>[aggregate_freq](`~pylifemap.aggregations.aggregate_freq`)</td><td>Aggregates the frequencies of the levels of a categorical variable.</td></tr>
-</tbody>
-</table>
+
+
+| Function                                                     | Description                                                                                         |
+| :----------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- |
+| [aggregate_count](`~pylifemap.aggregations.aggregate_count`) | Aggregates the number of children of each tree node.                                                |
+| [aggregate_num](`~pylifemap.aggregations.aggregate_num`)     | Aggregates a numerical variable along the tree branches with a given function (sum , mean, max...). |
+| [aggregate_freq](`~pylifemap.aggregations.aggregate_freq`)   | Aggregates the frequencies of the levels of a categorical variable.                                 |
+
+
 
 For example, we could filter out in our data set the species which have an "extinct" status:
 
+
+::: {.panel-tabset}
+
+## Polars
+
 ```{python}
 iucn_extinct = iucn.filter(pl.col("status") == "Extinct")
 ```
 
+## Pandas
+
+```{python}
+#| eval: false
+iucn_extinct = iucn[iucn["status"] == "Extinct"]
+```
+
+:::
+
+
 We can then aggregate their count along the branches with [aggregate_count](`~pylifemap.aggregations.aggregate_count`):
 
 ```{python}

Layer	Description
[layer_points](layers/layer_points.qmd)	Displays each observation with a point. Radius and color can be dependent of an attribute in the DataFrame.
[layer_lines](layers/layer_lines.qmd)	Using aggregated data, highlights branches of the tree by lines of varying width and color.
[layer_donuts](layers/layer_donuts.qmd)	Displays aggregated categorical data as donut charts.
[layer_heatmap](layers/layer_heatmap.qmd)	Displays a heatmap of the observations distribution in the tree.
[layer_screengrid](layers/layer_screengrid.qmd)	Displays the observations distribution with a colored grid with fixed-size cells..
Function	Description
[aggregate_count](`~pylifemap.aggregations.aggregate_count`)	Aggregates the number of children of each tree node.
[aggregate_num](`~pylifemap.aggregations.aggregate_num`)	Aggregates a numerical variable along the tree branches with a given function (sum, mean, max...).
[aggregate_freq](`~pylifemap.aggregations.aggregate_freq`)	Aggregates the frequencies of the levels of a categorical variable.