-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved c-tiny benchmark #5948
base: main
Are you sure you want to change the base?
Conversation
The current numbers are:
An important wrinkle: LTO bloats binaries unless paired with stripping + gc-sections. |
f34fcbd
to
e98f0ce
Compare
e98f0ce
to
990c948
Compare
Praise: this is a good change to make. I'm glad there's someone else who has gone deep into this test. Question: I don't know if I fully understand the difference between "one step" LTO and "two step" LTO. What I think it means: "one step" runs LTO on the C code and Rust code separately, whereas "two step" runs LTO on both at the same time. Question: even in "one step", you have Rust emitting bitcode. Is this necessary? And I think this is one of the things that causes LLVM version mismatches to be problematic? |
Observation: you found that "panic-abort-linker-plugin-lto-clang-twostep.elf" and "panic-abort-linker-plugin-lto-clang-twostep-stripped.elf" are the same size, but not one-step. My expectation is that this is because in "two-step", LTO is already running on the whole binary, so there is nothing left to strip, whereas in "one-step", there is no dead code elimination until you turn on braindead gc-sections. |
In other words, my understanding has been that the thing you're calling "two-step LTO", aka "linker-plugin-lto", aka the thing I've always just called simply "LTO", applies -Os optimizations even at link time, so it can do things like inline function calls between Rust and C and as a result juice a bit extra code size and performance. In my mental model, the thing you're calling "one-step LTO" to me is "LTO separately in C and Rust, with standard non-LTO cross-language linking". I agree that this does not depend on LLVM version between C and Rust since there is no LLVM bitcode or metadata that needs to be shared. |
No, pure rust LTO still requires that flag (it refuses to even initiate compilation without it, try it!) Remember, Rust is still linking together a giant pile of rlibs! Each rlib does contain some machine code, so if Rust wishes to do proper LTO it needs to include LLVM bitcode in its rlibs alongside the machine code. Why that's not implied I don't know. Cargo's LTO setting implies the other flag. I suppose they wanted explicitness in the lower level CLI. 🤷 I considered using Cargo profiles to do the LTO toggle but didn't want to make too many changes. That would have the nice benefit of not needing
I don't think the "two step" and "linker-plugin-lto" are the same thing (though I suspect linker-plugin-lto needs two-step to work?), and I have some theories as to why the numbers are different (largely to do with ThinLTO or LLD) but I haven't yet investigated. Please let's not make our terminology "two step aka linker-plugin-lto", these test results clearly show them as having different results. two-step has an impact even without linker-plugin-lto. I mostly retained this distinction from the previous test but I haven't dug deeper into it. It's possible it's just a matter of the precise combinations of flags rather than the phasing (in which case we should benchmark those flags independently), or it's deficiencies in the ThinLTO model (in which case we should just try with regular LTO). From a quick glance, it could literally just be the distinction between using LLD and the default system linker. I didn't poke too deeply, my focus was on linker-plugin-lto whilst trying not to lose any innovations in the old benchmark. |
Progress on #5945
This does not fully enact the vision in #5945, but it does start testing various combinations of tactics. I didn't want to test all combinations and I didn't want to spend time figuring out which combinations we really want, so I went ahead and tested the ones most relevant to the current investigation (#5935).
This shows the effects of LTO, linker-plugin-lto, and stripping+gc-sections on panic-abort (with panic-immediate-abort std) release Rust builds.
Benchmarks with just panic=abort but no panic-immediate-abort/panic-abort std would probably be useful to clients but I haven't added them now. It may be worth using makefile magic to truly build a matrix of targets.