From 44be57a4dc25077466c1ab208223ba51229f743f Mon Sep 17 00:00:00 2001 From: Julien Barnier Date: Tue, 9 Apr 2024 10:24:51 +0200 Subject: [PATCH] doc: add pandas code and use markdown tables --- doc/getting_started.qmd | 81 +++++++++++++++++++++++++++-------------- 1 file changed, 54 insertions(+), 27 deletions(-) diff --git a/doc/getting_started.qmd b/doc/getting_started.qmd index 9b9b6b6..94b22c1 100644 --- a/doc/getting_started.qmd +++ b/doc/getting_started.qmd @@ -14,11 +14,15 @@ To create a Lifemap data visualization, you will have to follow these steps: ## Prepare your data -The date you want to visualize on the Lifemap tree of life must be in a [pandas](https://pandas.pydata.org) or [polars](https://pola.rs) DataFrame. They must contain at least observations (species) as rows, and variables as columns, and at least one column must contain the NCBI taxonomy identifier of the species. +The data you want to visualize on the Lifemap tree of life must be in a [pandas](https://pandas.pydata.org) or [polars](https://pola.rs) DataFrame. They must contain observations (species) as rows, and variables as columns, and one column must contain the NCBI taxonomy identifier of the species. -`pylifemap` includes an example polars data file generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species. +`pylifemap` includes an example dataset generated from [The IUCN Red List of Threatened Species](https://www.gbif.org/dataset/19491596-35ae-4a91-9a98-85cf505f1bd3). It is a CSV file with the Red List category (in 2022) of more than 84000 species. -We can import it as a polars DataFrame with the following code: +We can import it as a polars or pandas DataFrame with the following code: + +::: {.panel-tabset} + +## Polars ```{python} import polars as pl @@ -28,7 +32,20 @@ iucn = pl.read_csv( ) ``` -If we display the resulting table, we can see that it only has two columns, one called `taxid` which contains the species identifiers, and another called `status` with the Red List category of each species: +## Pandas + +```{python} +#| eval: false +import pandas as pd + +iucn = pd.read_csv( + "https://raw.githubusercontent.com/juba/pylifemap/main/data/iucn.csv" +) +``` + +::: + +The resulting table only has two columns: `taxid`, which contains the species identifiers, and `status`, with the Red List category of each species. ```{python} iucn @@ -59,24 +76,19 @@ Lifemap(iucn, taxid_col="taxid", width="100%", height=800) After initializing our `Lifemap` object, we have to add visualization layers to create graphical representations. There are several different layers available: - - - - - - - - - - - -
LayerDescription
[layer_points](layers/layer_points.qmd)Displays each observation with a point. Radius and color can be dependent of an attribute in the DataFrame.
[layer_lines](layers/layer_lines.qmd)Using aggregated data, highlights branches of the tree by lines of varying width and color.
[layer_donuts](layers/layer_donuts.qmd)Displays aggregated categorical data as donut charts.
[layer_heatmap](layers/layer_heatmap.qmd)Displays a heatmap of the observations distribution in the tree.
[layer_screengrid](layers/layer_screengrid.qmd)Displays the observations distribution with a colored grid with fixed-size cells..
+| Layer | Description | +| :---------------------------------------------- | :----------------------------------------------------------------------------------------------------------- | +| [layer_points](layers/layer_points.qmd) | Displays each observation with a point. Radius and color can be dependent of an attribute in the DataFrame. | +| [layer_lines](layers/layer_lines.qmd) | Using aggregated data, highlights branches of the tree with lines of varying width and color. | +| [layer_donuts](layers/layer_donuts.qmd) | Displays aggregated categorical data as donut charts. | +| [layer_heatmap](layers/layer_heatmap.qmd) | Displays a heatmap of the observations distribution in the tree. | +| [layer_screengrid](layers/layer_screengrid.qmd) | Displays the observations distribution with a colored grid with fixed-size cells.. | To add a layer, we just have to call the corresponding `layer_` method of our `Lifemap` object. For example, to add a points layer: ```{python} #| eval: false -éLifemap(iucn, taxid_col="taxid").layer_points() +Lifemap(iucn, taxid_col="taxid").layer_points() ``` We can add several layers by calling several methods. For example we could display a heatmap layer, and a points layer above it: @@ -110,6 +122,7 @@ Lifemap(iucn, taxid_col="taxid").layer_points().save("lifemap.html") Each layer accepts a certain number of arguments to customize its appearance. For example we can change the radius and opacity of our points and make their color depend on their `status` value: + ```{python} ( Lifemap(iucn, taxid_col="taxid") @@ -124,23 +137,37 @@ Each layer accepts a certain number of arguments to customize its appearance. Fo `pylifemap` provides several aggregation functions that allow to aggregate data along the branches of the tree: - - - - - - - - - -
FunctionDescription
[aggregate_count](`~pylifemap.aggregations.aggregate_count`)Aggregates the number of children of each tree node.
[aggregate_num](`~pylifemap.aggregations.aggregate_num`)Aggregates a numerical variable along the tree branches with a given function (sum, mean, max...).
[aggregate_freq](`~pylifemap.aggregations.aggregate_freq`)Aggregates the frequencies of the levels of a categorical variable.
+ + +| Function | Description | +| :----------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- | +| [aggregate_count](`~pylifemap.aggregations.aggregate_count`) | Aggregates the number of children of each tree node. | +| [aggregate_num](`~pylifemap.aggregations.aggregate_num`) | Aggregates a numerical variable along the tree branches with a given function (sum , mean, max...). | +| [aggregate_freq](`~pylifemap.aggregations.aggregate_freq`) | Aggregates the frequencies of the levels of a categorical variable. | + + For example, we could filter out in our data set the species which have an "extinct" status: + +::: {.panel-tabset} + +## Polars + ```{python} iucn_extinct = iucn.filter(pl.col("status") == "Extinct") ``` +## Pandas + +```{python} +#| eval: false +iucn_extinct = iucn[iucn["status"] == "Extinct"] +``` + +::: + + We can then aggregate their count along the branches with [aggregate_count](`~pylifemap.aggregations.aggregate_count`): ```{python}