Skip to content

Commit

Permalink
Merge pull request #3 from hanjinliu/split-cat
Browse files Browse the repository at this point in the history
Yet another categorical plotter API
  • Loading branch information
hanjinliu authored Feb 5, 2024
2 parents ca05c19 + f61403f commit 1915c4d
Show file tree
Hide file tree
Showing 45 changed files with 3,396 additions and 2,455 deletions.
29 changes: 22 additions & 7 deletions docs/_scripts/_hooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,32 @@ def _add_images(matchobj: re.Match[str]) -> str:
other = code
return "```python\n" + other + "\n```"

line, other = code.split("\n", 1)
assert line.startswith("#!name:")
name = line.split(":", 1)[1].strip()
dest = f"_images/{name}.png"
code, name = _get_image_name(code)
code, width = _get_image_width(code)

reldepth = "../" * page.file.src_path.count(os.sep)
dest = f"{reldepth}_images/{name}.png"
link = f"\n![]({dest}){{ loading=lazy, width=360px }}\n\n"
new_md = "```python\n" + other + "\n```" + link
link = f"\n![]({dest}){{ loading=lazy, width={width}px }}\n\n"
new_md = "```python\n" + code + "\n```" + link
return new_md

md = re.sub("``` ?python\n([^`]*)```", _add_images, md, re.DOTALL)
md = re.sub("``` ?python\n([^`]*)```", _add_images, md, flags=re.DOTALL)

return md

def _get_image_name(code: str) -> tuple[str, str]:
line, other = code.split("\n", 1)
assert line.startswith("#!name:")
name = line.split(":", 1)[1].strip()
return other, name

def _get_image_width(code: str) -> tuple[str, int]:
"""Get the width of the image from the code."""
code = code.strip()
if code.startswith("#!width:"):
line, other = code.split("\n", 1)
width = int(line.split(":", 1)[1].strip())
else:
other = code
width = 360
return other, width
288 changes: 288 additions & 0 deletions docs/categorical/cat_num.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
# Categorical × Numerical Data

In this section, following data will be used as an example:

``` python
import numpy as np
from whitecanvas import new_canvas

rng = np.random.default_rng(12345)
df = {
"category": ["A"] * 40 + ["B"] * 50,
"observation": np.concatenate([rng.random(40), rng.random(50) + 1.3]),
"replicate": [0] * 23 + [1] * 17 + [0] * 22 + [1] * 28,
"temperature": rng.normal(scale=2.8, size=90) + 22.0,
}
```

How can we visualize the distributions for each category? There are several plots that
use categorical axis as either the x- or y-axis, and numerical axis as the other.
Examples are:

- Strip plot
- Swarm plot
- Violin plot
- Box plot

Aside from the categorical axis, data points may further be grouped by other features,
such as the marker symbol and the marker size. Things are even more complicated when
the markers represent numerical values, such as their size being proportional to the
value, or colored by a colormap.

`whitecanvas` provides a consistent and simple interface to handle all these cases.
Methods used for this purpose are `cat_x` and `cat_y`, where `cat_x` will deem the
x-axis as categorical, and `cat_y` will do the same for the y-axis.

``` python
#!skip
canvas = new_canvas("matplotlib")

# create the categorical plotter.
cat_plt_x = canvas.cat_x(df, x="category", y="observation")
cat_plt_y = canvas.cat_y(df, x="observation", y="category")
```

`cat_x` and `cat_y` use the argument `x=` and `y=` to specify the columns that are used
for the plot, where `x=` is the categorical axis for `cat_x` and `y=` for `cat_y`.

``` note
This is one of the important difference between `seaborn`. In `seaborn`, `orient` are
used to specify the orientation of the plots. This design forces the user to add the
argument `orient=` to every plot even though the orientation rarely changes during the
use of the same figure. In `whitecanvas`, you don't have to specify the orientation
once a categorical plotter is created by either `cat_x` or `cat_y`.
```

Multiplt columns can be used for the categorical axis, but only one column can be used
for the numerical axis.

``` python
#!skip
# OK
canvas.cat_x(df, x=["category", "replicate"], y="observation")
# OK
canvas.cat_y(df, x="observation", y=["category", "replicate"])
# NG
canvas.cat_x(df, x="category", y=["observation", "temperature"])
```

## Non-marker-type Plots

Since plots without data point markers are more straightforward, we will start with
them. It includes `add_violinplot`, `add_boxplot`, `add_pointplot` and `add_barplot`.

``` python
#!name: categorical_axis_violin_0
canvas = new_canvas("matplotlib")
canvas.cat_x(df, x="category", y="observation").add_violinplot()
canvas.show()
```

Violins can also be shown in different color. Specify the `color=` argument to do that.

``` python
#!name: categorical_axis_violin_1
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(color="replicate")
)
canvas.show()
```

By default, groups with different colors do not overlap. This is controlled by the
`dodge=` argument. Set `dodge=False` to make them overlap (although it is not the way
we usually do).

``` python
#!name: categorical_axis_violin_2
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(color="replicate", dodge=False)
)
canvas.show()
```

`hatch=` can also be specified in a similar way. It will change the hatch pattern of the
violins.

``` python
#!name: categorical_axis_violin_4
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(hatch="replicate")
)
canvas.show()
```

`color` and `hatch` can overlap with each other or the `x=` or `y=` argument.

``` python
#!name: categorical_axis_violin_5
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_violinplot(color="category")
)
canvas.show()
```

`add_boxplot`, `add_pointplot` and `add_barplot` is very similar to `add_violinplot`.

``` python
#!name: categorical_axis_many_plots
#!width: 700
from whitecanvas import hgrid

canvas = hgrid(ncols=3, size=(1600, 600), backend="matplotlib")

c0 = canvas.add_canvas(0)
c0.cat_x(df, x="category", y="observation").add_boxplot()
c0.title = "boxplot"

c1 = canvas.add_canvas(1)
c1.cat_x(df, x="category", y="observation").add_pointplot()
c1.title = "pointplot"

c2 = canvas.add_canvas(2)
c2.cat_x(df, x="category", y="observation").add_barplot()
c2.title = "barplot"

canvas.show()
```

## Marker-type Plots

``` python
#!name: categorical_axis_stripplot
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot(color="replicate")
)
```

``` python
#!name: categorical_axis_stripplot_dodge
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot(color="replicate", dodge=True)
)
```

As for the `Markers` layer, `as_edge_only` will convert the face features to the edge features.

``` python
#!name: categorical_axis_stripplot_dodge_edge_only
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot(color="replicate", dodge=True)
.as_edge_only(width=2)
)
```

Each marker size can represent a numerical value. `with_size` will map the numerical
values of a column to the size of the markers.

``` python
#!name: categorical_axis_stripplot_by_size
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot()
.with_size("temperature")
)
```

Similarly, each marker color can represent a numerical value. `with_colormap` will map the value with an arbitrary colormap.

``` python
#!name: categorical_axis_stripplot_by_color
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_stripplot()
.with_colormap("temperature", cmap="coolwarm")
)
```

Swarm plot is another way to visualize all the data points with markers.

``` python
#!name: categorical_axis_swarmplot
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category", y="observation")
.add_swarmplot(sort=True)
.with_colormap("temperature", cmap="coolwarm")
)
```

## Aggregation

Showing the aggregated data is a common way to efficiently visualize a lot of data. This
task is usually done by the module specific group-by methods, but `whitecanvas` provides
a built-in method to simplify the process.

In following example, `mean()` is used to prepare a mean-aggregated plotter, which has
`add_markers` method to add the mean markers to the plotter.

``` python
#!name: categorical_axis_stripplot_and_agg_mean
canvas = new_canvas("matplotlib")

# create a categorical plotter
cat_plt = canvas.cat_x(df, x="category", y="observation")

# plot all the data
cat_plt.add_stripplot(color="category")
# plot the mean
cat_plt.mean().add_markers(color="category", size=20)

canvas.show()
```

Similar `add_*` methods include `add_line()` and `add_bars()`.

``` python
#!name: categorical_axis_stripplot_and_agg_line
canvas = new_canvas("matplotlib")

# create a categorical plotter
cat_plt = canvas.cat_x(df, x="category", y="observation")

# plot all the data
cat_plt.add_stripplot(color="category")
# plot the mean
cat_plt.mean().add_line(width=3, color="black")

canvas.show()
```

Count plot is a special case of the aggregation. Use `count()` to make the plotter.

``` python
#!name: categorical_axis_countplot
canvas = new_canvas("matplotlib")
(
canvas
.cat_x(df, x="category")
.count()
.add_bars(color="replicate", dodge=True)
)
canvas.show()
```
Loading

0 comments on commit 1915c4d

Please sign in to comment.