Setup benchmarks #64

maxrjones · 2022-05-13T22:29:07Z

This PR adds basic benchmarks using ASV, as proposed by @jhamman in #42. The steps for running the benchmarks are documented in a new contributing guide. This PR does not add any CI workflows for running the benchmarks because that could be configured separately and is not needed for the more immediate goal of identifying performance bottlenecks.

As @rabernat noted in #37 (comment), this is only one component of the optimization recipe. While asv can report cprofile results, the results are stored in .json files which are possible but difficult to extract. Instead, it is generally easier to copy the benchmark and interpret separately using analyzers like snakeviz or pyinstrument. Here are couple notes from that approach:

When concat_input_dims is False, calling ds.stack on each batch is the slowest part https://github.com/pangeo-data/xbatcher/blob/34ca3f9b6acaf23b09ef140b69bc7c79d4f38feb/xbatcher/generators.py#L58-L65
When concat_input_dims is True, modifying the coordinates in _drop_input_dims is the slowest part
https://github.com/pangeo-data/xbatcher/blob/34ca3f9b6acaf23b09ef140b69bc7c79d4f38feb/xbatcher/generators.py#L44-L55

In either case, the main bottleneck is that these functions are called once per batch. Especially for (1), I do not think this is needed but would like to modify the tests to explicitly check indexes before submitting a proposed refactor.

Addresses #32 and #42

codecov · 2022-05-14T01:02:05Z

Codecov Report

Merging #64 (b0c67f1) into main (d3e1c2f) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##              main       #64   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines          136       136           
  Branches        32        32           
=========================================
  Hits           136       136

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3e1c2f...b0c67f1. Read the comment docs.

jhamman · 2022-05-18T23:45:16Z

@meghanrjones - can you explain a bit more about the json files that asv produces? Is there one file per parameterized benchmark? Or is there one file per asv run? I'm thinking about the best way to share these results and wondering if we can stage the snakeviz version of the results somewhere public.

maxrjones · 2022-05-19T15:26:05Z

@meghanrjones - can you explain a bit more about the json files that asv produces? Is there one file per parameterized benchmark? Or is there one file per asv run? I'm thinking about the best way to share these results and wondering if we can stage the snakeviz version of the results somewhere public.

Yes, there is one file per asv run. The structure is as follows:

├──asv_bench
    └──.asv
        └──results
            ├──<machine>
            │   ├── <short-commit-hash1>-conda-py<python-version>.json
            │   └── <short-commit-hash2>-conda-py<python-version>.json
            └── benchmarks.json

Where <short-commit-hash1>-conda-py<python-version>.json contains the results for each run (optionally including cprofile results) and benchmarks.json contains general metadata about each benchmark.

Here is an example showing how to use the undocumented asv Python library for extracting the profiling information (copied from airspeed-velocity/asv#971 (comment)), which ~~I expect we could modify to use snakeviz rather than pstats~~ we can modify to use snakeviz by calling snakeviz <pstats_file> on the command line:

import pstats

from asv.results import Results

results = Results.load(<file_path_to_result_json>)
profile_data_bytes = results.get_profile(<benchmark_name>)
with open(<pstats_file>, "wb") as f:
    f.write(profile_data_bytes)

p = pstats.Stats(<pstats_file>)
p.print_stats()

For publishing the results as a snakeviz representation, are you hoping to get a snapshot of the performance right now, use snakeviz as a CI performance monitoring tool, or something else? If it's the former, it might work just to share a notebook as a gist with the snakeviz/pyinstrument results. Edit: It would also be simple to share the json files using the method above, although these wouldn't be useful for continuous comparisons due to differences in runtime environments.

maxrjones · 2022-05-19T19:28:51Z

The current setup in this PR uses the parameterized decorator from xarray. This complicates using the profile outputs because the JSON keys do not include information the ID of the parameterized run. We may need to modify the parameterization method (e.g., https://asv.readthedocs.io/en/stable/writing_benchmarks.html#parameterized-benchmarks) or profile outside of the asv setup.

maxrjones added 5 commits May 12, 2022 17:36

Setup benchmarks using asv

f17b56b

Add asv to dev-requirements.txt

d580f89

Add info about benchmarking to a contributing guide

6c04f75

Add benchmark for concatenating input dims

2746578

Add 4-d case to benchmarks

cdc9658

Fix shape

c368dc6

Merge branch 'main' into asv-setup

b0c67f1

jhamman requested a review from andersy005 August 9, 2022 20:23

jhamman approved these changes Aug 9, 2022

View reviewed changes

maxrjones merged commit 5312998 into xarray-contrib:main Aug 10, 2022

maxrjones deleted the asv-setup branch August 10, 2022 15:11

maxrjones added the maintenance Maintenance tasks label Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup benchmarks #64

Setup benchmarks #64

maxrjones commented May 13, 2022

codecov bot commented May 14, 2022 •

edited

Loading

jhamman commented May 18, 2022

maxrjones commented May 19, 2022 •

edited

Loading

maxrjones commented May 19, 2022

Setup benchmarks #64

Setup benchmarks #64

Conversation

maxrjones commented May 13, 2022

codecov bot commented May 14, 2022 • edited Loading

Codecov Report

jhamman commented May 18, 2022

maxrjones commented May 19, 2022 • edited Loading

maxrjones commented May 19, 2022

codecov bot commented May 14, 2022 •

edited

Loading

maxrjones commented May 19, 2022 •

edited

Loading