Statistically compare benchmark results #21

fyrchik · 2023-03-01T13:56:07Z

Comparing min/avg etc. values is nice but can be misleading.
I propose to implement a separate script for comparing k6 summaries (extend them if needed), similar to benchstat .
Basically, it should be obvious for a performance engineer what improvement the change in code produces.
As an example, here is benchstat output:

$ benchstat old.txt new.txt
name        old time/op  new time/op  delta
GobEncode   13.6ms ± 1%  11.8ms ± 1%  -13.31% (p=0.016 n=4+5)
JSONEncode  32.1ms ± 1%  31.8ms ± 1%     ~    (p=0.286 n=4+5)

We can see a deviation from the mean as well the change in the second benchmark being statistically insignificant.

The only difficulty I see is that we might need to store results for all operations in the benchmark. Still possible.

I believe automated regression tests could also use such feature.

cc @anikeev-yadro @jingerbread

The text was updated successfully, but these errors were encountered:

anikeev-yadro · 2023-03-01T15:07:05Z

FYI @dansingjulia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistically compare benchmark results #21

Statistically compare benchmark results #21

fyrchik commented Mar 1, 2023 •

edited

Loading

anikeev-yadro commented Mar 1, 2023

Statistically compare benchmark results #21

Statistically compare benchmark results #21

Comments

fyrchik commented Mar 1, 2023 • edited Loading

anikeev-yadro commented Mar 1, 2023

fyrchik commented Mar 1, 2023 •

edited

Loading