Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945

Open
Manishearth opened this issue Dec 24, 2024 · 0 comments

Comments

@Manishearth
Copy link
Member

Came up in #5935

ICU4X has test-c-tiny and test-js-tiny to show how far codesize can be optimized.

These are incremental, applying optimization on top of optimization to slowly reduce codesize. This shows a nice progression, but it is not helpful when understanding what the effect of each optimization is in isolation.

I think this is an important function of such a benchmark: many of these techniques are not uniformly available and impose additional constraints upon the build: some require nightly, some required paired Rust/Clang versions, some force build-std, some require a particular C compiler, some reduce debuggability, and so on.

Furthermore, a lot of these benchmarks build on top of each other: using a release build will of course help LTO be more effective (percent-wise).

Providing numbers for every combination is going to be a lot of work and likely an overwhelming amount of data. However, I think what we could do is identify a list of optimizations that are potentially relevant but not necessarily always possible, and then provide numbers for:

  • plain release build
  • release build with each of these optimizations individually applied
    • for optimizations that build on each other; e.g. -Clinker-plugin-lto needs LTO, apply its dependencies too
  • release build with all but one of these optimizations applied
    • similar setup for dependent optimizations: remove both
  • release build with all optimizations applied

This would both give us an idea of the immediate wins of individual optimizations, and how they cumulatively work together.

The list of optimizations I can identify are:

  • LTO (off, on, thin, it seems like thin gives us the best perf?)
    • -Clinker-plugin-lto
  • --gc-sections
    • --strip-all
  • panic=abort
    • panic-abort std
      • panic-immediate-abort std
  • one-step vs two-step clang
  • use of lld (?)
  • inclusion of debug symbols in the first place (same as strip? unclear)

This list might be larger than necessary, so we could merge some entries if desired. I might also be missing something. I didn't include Rust debug vs release here because I don't think debug build codesize numbers really mean much, and I can't think of a usecase for caring about those numbers.

Thoughts? @sffc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant