feat: add basic profiling to benchmarks #116

ielashi · 2023-08-23T07:36:42Z

Problem

The benchmarks currently return the number of instructions it took to execute each benchmark. While this number is useful to measure performance, it doesn't provide insight into where these instructions are being used and where the performance bottle necks are. Without this information, making informed performance optimizations would require a lot of trial and error.

Solution

The typical solution to this problem is to use some kind of profiler. ic-repl already supports profiling and can output a flamegraph of where instructions are being spent, but it has a few drawbacks that makes it difficult to use:

The names of rust methods are mangled, even when debug = 1 is turned on, making it hard to make sense of the output.
Each benchmark includes logic to first setup, and only after setup would we want to profile, so we'd need a way to programmatically tell the profiler to reset its measurements.
Often we'd like to benchmark blocks of code that aren't functions.

To address the issues above, this commit introduces a "poor man profiler". This profiler is manual, in the sense that the developer adds to the code hints for what they care about profiling. In this PR, I added some basic hints, and the benchmarks now return an output that looks like this:

Benchmarking btreemap_insert_blob_64_1024_v2: Warming up for 1.0000 ms
2023-08-23 07:26:53.560585 UTC: [Canister rwlgt-iiaaa-aaaaa-aaaaa-cai] {
    "node_load_v2": "5_182_358_668 (80%)",
    "node_save_v2": "786_197_957 (12%)",
}

Benchmarking btreemap_insert_blob_64_1024_v2: Collecting 10 samples in estimated 345.63 s (165 iterations
btreemap_insert_blob_64_1024_v2
                        time:   [6474.1 M Instructions 6474.1 M Instructions 6474.1 M Instructions]
                        change: [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
                        No change in performance detected.

This approach is simple and effective, but it does have the draw back that it makes the instructions count slightly inaccurate, as the profiling logic itself consumes cycles. I think we can limit this inaccuracy by making the profiler crate internally account for its own overhead and deducting those from its measurements.

ielashi · 2023-08-23T07:37:53Z

Note to reviewers, this is a reincarnation of #52

profiler/src/lib.rs

ielashi added 6 commits August 22, 2023 11:18

feat: BTreeMap v2 beta

fc214cd

update tests

022eaa0

docs

12e300b

.

5af65b2

.

8620b4f

feat: add basic profiling to benchmarks

d45b6f5

ielashi requested review from roman-kashitsyn and a team as code owners August 23, 2023 07:36

ielashi mentioned this pull request Aug 23, 2023

feat: add profiling info to benchmarks #52

Closed

ielashi added 2 commits August 23, 2023 09:38

.

2eb6d86

.

1d480f1

dsarlis approved these changes Aug 29, 2023

View reviewed changes

profiler/src/lib.rs Outdated Show resolved Hide resolved

profiler/src/lib.rs Show resolved Hide resolved

Base automatically changed from ielashi/btree_v2 to main September 5, 2023 08:39

ielashi added 3 commits September 5, 2023 10:47

Merge branch 'main' into ielashi/profiler

2ba700a

fix merge issue

d1d1873

.

29b32b6

ielashi enabled auto-merge (squash) September 5, 2023 08:52

ielashi merged commit 6d00eca into main Sep 5, 2023
3 checks passed

ielashi deleted the ielashi/profiler branch September 5, 2023 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add basic profiling to benchmarks #116

feat: add basic profiling to benchmarks #116

ielashi commented Aug 23, 2023

ielashi commented Aug 23, 2023

feat: add basic profiling to benchmarks #116

feat: add basic profiling to benchmarks #116

Conversation

ielashi commented Aug 23, 2023

Problem

Solution

ielashi commented Aug 23, 2023