-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Take advantage of trivial relocation in std::vector::erase #116268
base: main
Are you sure you want to change the base?
Conversation
In particular, test everything with both a normal and a min_allocator, add tests for a few corner cases and add tests with types that are trivially relocatable. Also add tests that count the number of assignments performed by vector::erase, since that is mandated by the Standard. This patch is a preparation for optimizing vector::erase.
In vector::erase(iter) and vector::erase(iter, iter), we can take advantage of a type being trivially relocatable to open up a gap in the vector and then relocate the tail of the vector into that gap. The benefit is that relocating an object is often more efficient than move-assigning and then destroying the original object. For types that can be relocated trivially but that are complicated enough for the compiler not to optimize by itself (like std::string), this provides around a 2x performance speedup in vector::erase (see below). This optimization requires stopping the usage of Clang's __is_trivially_relocatable builtin, which doesn't currently honour assignment operators like is_trivially_copyable does and can lead us to perform incorrect optimizations. It is also worth noting that __uninitialized_allocator_relocate has to be modified so that we can relocate into an overlapping range. This has an unfortunate impact on its exception safety guarantees, which needs to be investigated further. Previous implementation -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_erase_iter_in_middle/vector_int/1024 24.9 ns 24.9 ns 28042962 BM_erase_iter_in_middle/vector_int/4096 107 ns 107 ns 6590592 BM_erase_iter_in_middle/vector_int/10240 271 ns 265 ns 2733478 BM_erase_iter_in_middle/vector_string/1024 349 ns 349 ns 2005886 BM_erase_iter_in_middle/vector_string/4096 1410 ns 1406 ns 498355 BM_erase_iter_in_middle/vector_string/10240 3449 ns 3449 ns 201989 BM_erase_iter_at_start/vector_int/1024 47.1 ns 47.1 ns 14836261 BM_erase_iter_at_start/vector_int/4096 204 ns 204 ns 3430414 BM_erase_iter_at_start/vector_int/10240 504 ns 504 ns 1391373 BM_erase_iter_at_start/vector_string/1024 684 ns 684 ns 1025160 BM_erase_iter_at_start/vector_string/4096 2855 ns 2806 ns 254080 BM_erase_iter_at_start/vector_string/10240 7060 ns 7060 ns 94134 New implementation -------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------- BM_erase_iter_in_middle/vector_int/1024 26.0 ns 25.9 ns 27127367 BM_erase_iter_in_middle/vector_int/4096 105 ns 105 ns 6515204 BM_erase_iter_in_middle/vector_int/10240 259 ns 258 ns 2800795 BM_erase_iter_in_middle/vector_string/1024 148 ns 147 ns 4725706 BM_erase_iter_in_middle/vector_string/4096 608 ns 606 ns 1168205 BM_erase_iter_in_middle/vector_string/10240 1523 ns 1520 ns 459909 BM_erase_iter_at_start/vector_int/1024 47.1 ns 47.1 ns 14762513 BM_erase_iter_at_start/vector_int/4096 205 ns 205 ns 3403130 BM_erase_iter_at_start/vector_int/10240 507 ns 507 ns 1382716 BM_erase_iter_at_start/vector_string/1024 300 ns 300 ns 2327546 BM_erase_iter_at_start/vector_string/4096 1205 ns 1205 ns 580855 BM_erase_iter_at_start/vector_string/10240 4296 ns 4296 ns 162956
Note that this is stacked on top of #116265. |
#else | ||
// | ||
// Note that we don't use Clang's __is_trivially_relocatable builtin because it doesn't honor the presence | ||
// of non-trivial special members like assignment operators, or even a copy constructor, making it possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, this change is required or else we fail the test I added in erase_iter.pass.cpp
which counts the number of assignments. Because of this, and since the semantics of what it means to be trivially relocatable are still being actively debated, I think it makes sense for us to err on the safer side and use is_trivially_copyable
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TrackedAssignment
has a trivial destructor and a trivial copy constructor, so it's trivially relocatable by destroying it and copy-constructing a new instance. Is the library code not permitted to do that? If not, would it make sense to have separate internal traits for "trivially assignment-relocatable" versus "trivially construction-relocatable" that only detect the case where you could relocate specifically using assignment versus specifically using construction?
In vector::erase(iter) and vector::erase(iter, iter), we can take
advantage of a type being trivially relocatable to open up a gap
in the vector and then relocate the tail of the vector into that gap.
The benefit is that relocating an object is often more efficient than
move-assigning and then destroying the original object. For types that
can be relocated trivially but that are complicated enough for the
compiler not to optimize by itself (like std::string), this provides
around a 2x performance speedup in vector::erase (see below).
This optimization requires stopping the usage of Clang's __is_trivially_relocatable
builtin, which doesn't currently honour assignment operators like
is_trivially_copyable does and can lead us to perform incorrect
optimizations.
It is also worth noting that __uninitialized_allocator_relocate has to
be modified so that we can relocate into an overlapping range. This
has an unfortunate impact on its exception safety guarantees, which
needs to be investigated further.
Previous implementation
New implementation