-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize __insert_with_sentinel Function in std::vector #113768
base: main
Are you sure you want to change the base?
Optimize __insert_with_sentinel Function in std::vector #113768
Conversation
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesThis PR includes optimizations and enhancements to the Details:
Testing: Full diff: https://github.com/llvm/llvm-project/pull/113768.diff 2 Files Affected:
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..02f91b537c7e22 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1360,27 +1360,27 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu
for (; this->__end_ != this->__end_cap() && __first != __last; ++__first) {
__construct_one_at_end(*__first);
}
- __split_buffer<value_type, allocator_type&> __v(__a);
- if (__first != __last) {
-#if _LIBCPP_HAS_EXCEPTIONS
- try {
-#endif // _LIBCPP_HAS_EXCEPTIONS
- __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
- difference_type __old_size = __old_last - this->__begin_;
- difference_type __old_p = __p - this->__begin_;
- reserve(__recommend(size() + __v.size()));
- __p = this->__begin_ + __old_p;
- __old_last = this->__begin_ + __old_size;
-#if _LIBCPP_HAS_EXCEPTIONS
- } catch (...) {
- erase(__make_iter(__old_last), end());
- throw;
- }
-#endif // _LIBCPP_HAS_EXCEPTIONS
+
+ if (__first == __last)
+ (void)std::rotate(__p, __old_last, this->__end_);
+ else {
+ __split_buffer<value_type, allocator_type&> __v(__a);
+ auto __guard =
+ std::__make_exception_guard(_AllocatorDestroyRangeReverse<allocator_type, pointer>(__a, __old_last, __end_));
+ __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
+ __split_buffer<value_type, allocator_type&> __merged(__recommend(size() + __v.size()), __off, __a);
+ std::__uninitialized_allocator_relocate(
+ __a, std::__to_address(__old_last), std::__to_address(__end_), std::__to_address(__merged.__end_));
+ __merged.__end_ += __end_ - __old_last;
+ __end_ = __old_last;
+ __guard.__complete();
+ std::__uninitialized_allocator_relocate(
+ __a, std::__to_address(__v.__begin_), std::__to_address(__v.__end_), std::__to_address(__merged.__end_));
+ __merged.__end_ += __v.size();
+ __v.__begin_ = __v.__end_;
+ __p = __swap_out_circular_buffer(__merged, __p);
}
- __p = std::rotate(__p, __old_last, this->__end_);
- insert(__make_iter(__p), std::make_move_iterator(__v.begin()), std::make_move_iterator(__v.end()));
- return begin() + __off;
+ return __make_iter(__p);
}
template <class _Tp, class _Allocator>
diff --git a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
index 934b85ce01c67b..8dce6e5c1a690e 100644
--- a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
+++ b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
@@ -46,6 +46,46 @@ TEST_CONSTEXPR_CXX20 bool tests()
for (; j < 105; ++j)
assert(v[j] == 0);
}
+ { // Vector may or may not need to reallocate because of the insertion -- test both cases.
+ { // The input range is shorter than the remaining capacity of the vector -- ensure no reallocation happens.
+ typedef std::vector<int> V;
+ V v(100);
+ v.reserve(v.size() + 10);
+ int a[] = {1, 2, 3, 4, 5};
+ const int N = sizeof(a) / sizeof(a[0]);
+ V::iterator i =
+ v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+ assert(v.size() == 100 + N);
+ assert(is_contiguous_container_asan_correct(v));
+ assert(i == v.begin() + 10);
+ int j;
+ for (j = 0; j < 10; ++j)
+ assert(v[j] == 0);
+ for (std::size_t k = 0; k < N; ++j, ++k)
+ assert(v[j] == a[k]);
+ for (; j < 105; ++j)
+ assert(v[j] == 0);
+ }
+ { // The input range is longer than the remaining capacity of the vector -- ensure reallocation happens.
+ typedef std::vector<int> V;
+ V v(100);
+ v.reserve(v.size() + 2);
+ int a[] = {1, 2, 3, 4, 5};
+ const int N = sizeof(a) / sizeof(a[0]);
+ V::iterator i =
+ v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+ assert(v.size() == 100 + N);
+ assert(is_contiguous_container_asan_correct(v));
+ assert(i == v.begin() + 10);
+ int j;
+ for (j = 0; j < 10; ++j)
+ assert(v[j] == 0);
+ for (std::size_t k = 0; k < N; ++j, ++k)
+ assert(v[j] == a[k]);
+ for (; j < 105; ++j)
+ assert(v[j] == 0);
+ }
+ }
{
typedef std::vector<int> V;
V v(100);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch, this is a nice find! It looks like a really worthwhile optimization. I have some comments and questions.
libcxx/include/__vector/vector.h
Outdated
@@ -1360,27 +1360,27 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu | |||
for (; this->__end_ != this->__end_cap() && __first != __last; ++__first) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not attached to this line)
Do we have a benchmark for this operation? What is the before/after comparison? If there's no benchmark, one should be added in libcxx/test/benchmarks/vector_operations.bench.cpp
(probably via libcxx/test/benchmarks/ContainerBenchmarks.h
). How to run benchmarks is documented at https://libcxx.llvm.org/TestingLibcxx.html#benchmarks (be careful, these instructions are going to be simplified in the near future but for now they're correct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are currently no benchmarks available for vector::insert
and vector::assign
in libcxx/test/benchmarks/vector_operations.bench.cpp
(current benchmarks mainly focused on various constructors of std::vector
). I will add relevant benchmarks and share the before/after performance comparisons once I have the results.
libcxx/include/__vector/vector.h
Outdated
auto __guard = | ||
std::__make_exception_guard(_AllocatorDestroyRangeReverse<allocator_type, pointer>(__a, __old_last, __end_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really subtle: this exception guard is correct, but only because it stores references to __old_last
and __end_
, which get updated in this function. This deserves at least a comment, since _AllocatorDestroyRangeReverse
is a private helper defined in another header.
By the way, if you stored a copy instead of a reference, do any of our tests for std::vector
start failing? If not, that would point out to missing test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I appreciate your perspective. Yes, it is generally required to pass the iterators by reference to _AllocatorDestroyRangeReverse
if the protected memory range could grow or shrink while the guard is active. A good example is __uninitialized_allocator_copy_impl
, where the guarded target
memory range grows as the copy operations proceed, making it essential to pass the end iterator by reference to keep the guard in sync with growing protected range.
However, my case is somewhat different because the guarded range is a source
memory range [__old_last, __end_)
, which remains fixed during the guard's protection. This made me realize that my initial commit might have been misleading, as I shrink __end_
to __old_last
before calling __guard.complete()
. This shrink operation could actually be done after the call to __guard.complete()
, as its purpose is to ensure that subsequent relocation in __swap_out_circular_buffer
stops at __old_last
. To clarify this, I will add a comment and move the call __guard.complete()
earlier (i.e., release the guard as soon as feasible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added new benchmark tests for vector::insert(const_iterator, _InputIterator, _InputIterator)
and performed local testing for std::vector<int>
and std::vector<std::string>
. The results show significant performance improvements in various scenarios.
For std::vector<int>
, I tested inserting an input iterator range into an empty, half-full, and nearly-full vector. My local testing demonstrated performance improvements of 1.3x, 1.2x, and 5.8x respectively after applying this patch. Similarly, for std::vectorstd::string, my local testing demonstrated performance improvements of 1.3x, 1.5x, and 3.2x for an empty, half-full, and nearly full vector, respectively.
be9ddb6
to
a31c874
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
a31c874
to
9397cb4
Compare
This PR includes optimizations and enhancements to the
__insert_with_sentinel
function in thestd::vector
implementation, focusing on performance improvements and better exception handling.Details:
Performance Improvement:
reserve()
operation, causing a round of memory relocation, followed by an additionalstd::rotate
operation, and finally another memory relocation for elements after the insertion point.__split_buffer
.2 relocations + 1 std::rotate
to1 relocation
without the need forstd::rotate
, improving the overall performance. Note thatstd::rotate
is only needed when the vector has enough space for insertion where reallocation does not happen.Exception Safety:
try-catch
mechanism with the modern approach of exception guard.Testing:
New test cases are added in
libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
, which check both cases where reallocation happens or not. Passed all existing libcxx tests forstd::vector
based on my local testing.