Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another categorical plotter API #3

Merged
merged 11 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 22 additions & 7 deletions docs/_scripts/_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,32 @@ def _add_images(matchobj: re.Match[str]) -> str:
other = code
return "```python\n" + other + "\n```"

line, other = code.split("\n", 1)
assert line.startswith("#!name:")
name = line.split(":", 1)[1].strip()
dest = f"_images/{name}.png"
code, name = _get_image_name(code)
code, width = _get_image_width(code)

reldepth = "../" * page.file.src_path.count(os.sep)
dest = f"{reldepth}_images/{name}.png"
link = f"\n![]({dest}){{ loading=lazy, width=360px }}\n\n"
new_md = "```python\n" + other + "\n```" + link
link = f"\n![]({dest}){{ loading=lazy, width={width}px }}\n\n"
new_md = "```python\n" + code + "\n```" + link
return new_md

md = re.sub("``` ?python\n([^`]*)```", _add_images, md, re.DOTALL)
md = re.sub("``` ?python\n([^`]*)```", _add_images, md, flags=re.DOTALL)

return md

def _get_image_name(code: str) -> tuple[str, str]:
line, other = code.split("\n", 1)
assert line.startswith("#!name:")
name = line.split(":", 1)[1].strip()
return other, name

def _get_image_width(code: str) -> tuple[str, int]:
"""Get the width of the image from the code."""
code = code.strip()
if code.startswith("#!width:"):
line, other = code.split("\n", 1)
width = int(line.split(":", 1)[1].strip())
else:
other = code
width = 360
return other, width
288 changes: 288 additions & 0 deletions docs/categorical/cat_num.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
# Categorical × Numerical Data

In this section, following data will be used as an example:

``` python
import numpy as np
from whitecanvas import new_canvas

rng = np.random.default_rng(12345)
df = {
"category": ["A"] * 40 + ["B"] * 50,
"observation": np.concatenate([rng.random(40), rng.random(50) + 1.3]),
"replicate": [0] * 23 + [1] * 17 + [0] * 22 + [1] * 28,
"temperature": rng.normal(scale=2.8, size=90) + 22.0,
}
```

How can we visualize the distributions for each category? There are several plots that
use categorical axis as either the x- or y-axis, and numerical axis as the other.
Examples are:

- Strip plot
- Swarm plot
- Violin plot
- Box plot

Aside from the categorical axis, data points may further be grouped by other features,
such as the marker symbol and the marker size. Things are even more complicated when
the markers represent numerical values, such as their size being proportional to the
value, or colored by a colormap.

`whitecanvas` provides a consistent and simple interface to handle all these cases.
Methods used for this purpose are `cat_x` and `cat_y`, where `cat_x` will deem the
x-axis as categorical, and `cat_y` will do the same for the y-axis.

``` python
#!skip
canvas = new_canvas("matplotlib")

# create the categorical plotter.
cat_plt_x = canvas.cat_x(df, x="category", y="observation")
cat_plt_y = canvas.cat_y(df, x="observation", y="category")
```

`cat_x` and `cat_y` use the argument `x=` and `y=` to specify the columns that are used
for the plot, where `x=` is the categorical axis for `cat_x` and `y=` for `cat_y`.

``` note
This is one of the important difference between `seaborn`. In `seaborn`, `orient` are
used to specify the orientation of the plots. This design forces the user to add the
argument `orient=` to every plot even though the orientation rarely changes during the
use of the same figure. In `whitecanvas`, you don't have to specify the orientation
once a categorical plotter is created by either `cat_x` or `cat_y`.
```

Multiplt columns can be used for the categorical axis, but only one column can be used
for the numerical axis.

``` python
#!skip
# OK
canvas.cat_x(df, x=["category", "replicate"], y="observation")
# OK
canvas.cat_y(df, x="observation", y=["category", "replicate"])
# NG
canvas.cat_x(df, x="category", y=["observation", "temperature"])
```

## Non-marker-type Plots

Since plots without data point markers are more straightforward, we will start with
them. It includes `add_violinplot`, `add_boxplot`, `add_pointplot` and `add_barplot`.

``` python
#!name: categorical_axis_violin_0
canvas = new_canvas("matplotlib")
canvas.cat_x(df, x="category", y="observation").add_violinplot()
canvas.show()
```

Violins can also be shown in different color. Specify the `color=` argument to do that.

``` python
#!name: categorical_axis_violin_1
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(color="replicate")
)
canvas.show()
```

By default, groups with different colors do not overlap. This is controlled by the
`dodge=` argument. Set `dodge=False` to make them overlap (although it is not the way
we usually do).

``` python
#!name: categorical_axis_violin_2
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(color="replicate", dodge=False)
)
canvas.show()
```

`hatch=` can also be specified in a similar way. It will change the hatch pattern of the
violins.

``` python
#!name: categorical_axis_violin_4
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(hatch="replicate")
)
canvas.show()
```

`color` and `hatch` can overlap with each other or the `x=` or `y=` argument.

``` python
#!name: categorical_axis_violin_5
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(color="category")
)
canvas.show()
```

`add_boxplot`, `add_pointplot` and `add_barplot` is very similar to `add_violinplot`.

``` python
#!name: categorical_axis_many_plots
#!width: 700
from whitecanvas import hgrid

canvas = hgrid(ncols=3, size=(1600, 600), backend="matplotlib")

c0 = canvas.add_canvas(0)
c0.cat_x(df, x="category", y="observation").add_boxplot()
c0.title = "boxplot"

c1 = canvas.add_canvas(1)
c1.cat_x(df, x="category", y="observation").add_pointplot()
c1.title = "pointplot"

c2 = canvas.add_canvas(2)
c2.cat_x(df, x="category", y="observation").add_barplot()
c2.title = "barplot"

canvas.show()
```

## Marker-type Plots

``` python
#!name: categorical_axis_stripplot
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot(color="replicate")
)
```

``` python
#!name: categorical_axis_stripplot_dodge
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot(color="replicate", dodge=True)
)
```

As for the `Markers` layer, `as_edge_only` will convert the face features to the edge features.

``` python
#!name: categorical_axis_stripplot_dodge_edge_only
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot(color="replicate", dodge=True)
.as_edge_only(width=2)
)
```

Each marker size can represent a numerical value. `with_size` will map the numerical
values of a column to the size of the markers.

``` python
#!name: categorical_axis_stripplot_by_size
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot()
.with_size("temperature")
)
```

Similarly, each marker color can represent a numerical value. `with_colormap` will map the value with an arbitrary colormap.

``` python
#!name: categorical_axis_stripplot_by_color
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot()
.with_colormap("temperature", cmap="coolwarm")
)
```

Swarm plot is another way to visualize all the data points with markers.

``` python
#!name: categorical_axis_swarmplot
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_swarmplot(sort=True)
.with_colormap("temperature", cmap="coolwarm")
)
```

## Aggregation

Showing the aggregated data is a common way to efficiently visualize a lot of data. This
task is usually done by the module specific group-by methods, but `whitecanvas` provides
a built-in method to simplify the process.

In following example, `mean()` is used to prepare a mean-aggregated plotter, which has
`add_markers` method to add the mean markers to the plotter.

``` python
#!name: categorical_axis_stripplot_and_agg_mean
canvas = new_canvas("matplotlib")

# create a categorical plotter
cat_plt = canvas.cat_x(df, x="category", y="observation")

# plot all the data
cat_plt.add_stripplot(color="category")
# plot the mean
cat_plt.mean().add_markers(color="category", size=20)

canvas.show()
```

Similar `add_*` methods include `add_line()` and `add_bars()`.

``` python
#!name: categorical_axis_stripplot_and_agg_line
canvas = new_canvas("matplotlib")

# create a categorical plotter
cat_plt = canvas.cat_x(df, x="category", y="observation")

# plot all the data
cat_plt.add_stripplot(color="category")
# plot the mean
cat_plt.mean().add_line(width=3, color="black")

canvas.show()
```

Count plot is a special case of the aggregation. Use `count()` to make the plotter.

``` python
#!name: categorical_axis_countplot
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category")
.count()
.add_bars(color="replicate", dodge=True)
)
canvas.show()
```
Loading
Loading