Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Manishearth · 2024-12-24T01:48:43Z

Came up in #5935

ICU4X has test-c-tiny and test-js-tiny to show how far codesize can be optimized.

These are incremental, applying optimization on top of optimization to slowly reduce codesize. This shows a nice progression, but it is not helpful when understanding what the effect of each optimization is in isolation.

I think this is an important function of such a benchmark: many of these techniques are not uniformly available and impose additional constraints upon the build: some require nightly, some required paired Rust/Clang versions, some force build-std, some require a particular C compiler, some reduce debuggability, and so on.

Furthermore, a lot of these benchmarks build on top of each other: using a release build will of course help LTO be more effective (percent-wise).

Providing numbers for every combination is going to be a lot of work and likely an overwhelming amount of data. However, I think what we could do is identify a list of optimizations that are potentially relevant but not necessarily always possible, and then provide numbers for:

plain release build
release build with each of these optimizations individually applied
- for optimizations that build on each other; e.g. -Clinker-plugin-lto needs LTO, apply its dependencies too
release build with all but one of these optimizations applied
- similar setup for dependent optimizations: remove both
release build with all optimizations applied

This would both give us an idea of the immediate wins of individual optimizations, and how they cumulatively work together.

The list of optimizations I can identify are:

LTO (off, on, thin, it seems like thin gives us the best perf?)
- -Clinker-plugin-lto
--gc-sections
- --strip-all
panic=abort
- panic-abort std
  - panic-immediate-abort std
one-step vs two-step clang
use of lld (?)
inclusion of debug symbols in the first place (same as strip? unclear)

This list might be larger than necessary, so we could merge some entries if desired. I might also be missing something. I didn't include Rust debug vs release here because I don't think debug build codesize numbers really mean much, and I can't think of a usecase for caring about those numbers.

Thoughts? @sffc

The text was updated successfully, but these errors were encountered:

Manishearth mentioned this issue Dec 25, 2024

Improved c-tiny benchmark #5948

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Manishearth commented Dec 24, 2024

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Comments

Manishearth commented Dec 24, 2024