feat: add basic profiling to benchmarks #116
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The benchmarks currently return the number of instructions it took to execute each benchmark. While this number is useful to measure performance, it doesn't provide insight into where these instructions are being used and where the performance bottle necks are. Without this information, making informed performance optimizations would require a lot of trial and error.
Solution
The typical solution to this problem is to use some kind of profiler.
ic-repl
already supports profiling and can output a flamegraph of where instructions are being spent, but it has a few drawbacks that makes it difficult to use:debug = 1
is turned on, making it hard to make sense of the output.To address the issues above, this commit introduces a "poor man profiler". This profiler is manual, in the sense that the developer adds to the code hints for what they care about profiling. In this PR, I added some basic hints, and the benchmarks now return an output that looks like this:
This approach is simple and effective, but it does have the draw back that it makes the instructions count slightly inaccurate, as the profiling logic itself consumes cycles. I think we can limit this inaccuracy by making the
profiler
crate internally account for its own overhead and deducting those from its measurements.