Skip to content

Statistically compare benchmark results #21

Open
fyrchik opened this issue Mar 1, 2023 · 1 comment
Open

Statistically compare benchmark results #21

fyrchik opened this issue Mar 1, 2023 · 1 comment
Labels
discussion Talking about something in order to reach a decision or to exchange ideas enhancement New feature or request

Comments

@fyrchik
Copy link

fyrchik commented Mar 1, 2023

Comparing min/avg etc. values is nice but can be misleading.
I propose to implement a separate script for comparing k6 summaries (extend them if needed), similar to benchstat .
Basically, it should be obvious for a performance engineer what improvement the change in code produces.
As an example, here is benchstat output:

$ benchstat old.txt new.txt
name        old time/op  new time/op  delta
GobEncode   13.6ms ± 1%  11.8ms ± 1%  -13.31% (p=0.016 n=4+5)
JSONEncode  32.1ms ± 1%  31.8ms ± 1%     ~    (p=0.286 n=4+5)

We can see a deviation from the mean as well the change in the second benchmark being statistically insignificant.

The only difficulty I see is that we might need to store results for all operations in the benchmark. Still possible.

I believe automated regression tests could also use such feature.

cc @anikeev-yadro @jingerbread

@anikeev-yadro
Copy link

FYI @dansingjulia

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion Talking about something in order to reach a decision or to exchange ideas enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants