Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize __insert_with_sentinel Function in std::vector #113768

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

winner245
Copy link
Contributor

This PR includes optimizations and enhancements to the __insert_with_sentinel function in the std::vector implementation, focusing on performance improvements and better exception handling.

Details:

  1. Performance Improvement:

    • The existing implementation triggers a reserve() operation, causing a round of memory relocation, followed by an additional std::rotate operation, and finally another memory relocation for elements after the insertion point.
    • The improved version eliminates redundant operations by directly relocating both existing and new elements to their final positions within the allocated __split_buffer.
    • This optimization reduces the total cost from 2 relocations + 1 std::rotate to 1 relocation without the need for std::rotate, improving the overall performance. Note that std::rotate is only needed when the vector has enough space for insertion where reallocation does not happen.
  2. Exception Safety:

    • Replaced the traditional try-catch mechanism with the modern approach of exception guard.

Testing:
New test cases are added in libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp, which check both cases where reallocation happens or not. Passed all existing libcxx tests for std::vector based on my local testing.

@winner245 winner245 requested a review from a team as a code owner October 26, 2024 17:38
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 26, 2024
@llvmbot
Copy link

llvmbot commented Oct 26, 2024

@llvm/pr-subscribers-libcxx

Author: Peng Liu (winner245)

Changes

This PR includes optimizations and enhancements to the __insert_with_sentinel function in the std::vector implementation, focusing on performance improvements and better exception handling.

Details:

  1. Performance Improvement:

    • The existing implementation triggers a reserve() operation, causing a round of memory relocation, followed by an additional std::rotate operation, and finally another memory relocation for elements after the insertion point.
    • The improved version eliminates redundant operations by directly relocating both existing and new elements to their final positions within the allocated __split_buffer.
    • This optimization reduces the total cost from 2 relocations + 1 std::rotate to 1 relocation without the need for std::rotate, improving the overall performance. Note that std::rotate is only needed when the vector has enough space for insertion where reallocation does not happen.
  2. Exception Safety:

    • Replaced the traditional try-catch mechanism with the modern approach of exception guard.

Testing:
New test cases are added in libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp, which check both cases where reallocation happens or not. Passed all existing libcxx tests for std::vector based on my local testing.


Full diff: https://github.com/llvm/llvm-project/pull/113768.diff

2 Files Affected:

  • (modified) libcxx/include/__vector/vector.h (+20-20)
  • (modified) libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp (+40)
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..02f91b537c7e22 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1360,27 +1360,27 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu
   for (; this->__end_ != this->__end_cap() && __first != __last; ++__first) {
     __construct_one_at_end(*__first);
   }
-  __split_buffer<value_type, allocator_type&> __v(__a);
-  if (__first != __last) {
-#if _LIBCPP_HAS_EXCEPTIONS
-    try {
-#endif // _LIBCPP_HAS_EXCEPTIONS
-      __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
-      difference_type __old_size = __old_last - this->__begin_;
-      difference_type __old_p    = __p - this->__begin_;
-      reserve(__recommend(size() + __v.size()));
-      __p        = this->__begin_ + __old_p;
-      __old_last = this->__begin_ + __old_size;
-#if _LIBCPP_HAS_EXCEPTIONS
-    } catch (...) {
-      erase(__make_iter(__old_last), end());
-      throw;
-    }
-#endif // _LIBCPP_HAS_EXCEPTIONS
+
+  if (__first == __last)
+    (void)std::rotate(__p, __old_last, this->__end_);
+  else {
+    __split_buffer<value_type, allocator_type&> __v(__a);
+    auto __guard =
+        std::__make_exception_guard(_AllocatorDestroyRangeReverse<allocator_type, pointer>(__a, __old_last, __end_));
+    __v.__construct_at_end_with_sentinel(std::move(__first), std::move(__last));
+    __split_buffer<value_type, allocator_type&> __merged(__recommend(size() + __v.size()), __off, __a);
+    std::__uninitialized_allocator_relocate(
+        __a, std::__to_address(__old_last), std::__to_address(__end_), std::__to_address(__merged.__end_));
+    __merged.__end_ += __end_ - __old_last;
+    __end_ = __old_last;
+    __guard.__complete();
+    std::__uninitialized_allocator_relocate(
+        __a, std::__to_address(__v.__begin_), std::__to_address(__v.__end_), std::__to_address(__merged.__end_));
+    __merged.__end_ += __v.size();
+    __v.__begin_ = __v.__end_;
+    __p          = __swap_out_circular_buffer(__merged, __p);
   }
-  __p = std::rotate(__p, __old_last, this->__end_);
-  insert(__make_iter(__p), std::make_move_iterator(__v.begin()), std::make_move_iterator(__v.end()));
-  return begin() + __off;
+  return __make_iter(__p);
 }
 
 template <class _Tp, class _Allocator>
diff --git a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
index 934b85ce01c67b..8dce6e5c1a690e 100644
--- a/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
+++ b/libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_iter_iter.pass.cpp
@@ -46,6 +46,46 @@ TEST_CONSTEXPR_CXX20 bool tests()
         for (; j < 105; ++j)
             assert(v[j] == 0);
     }
+    {   // Vector may or may not need to reallocate because of the insertion -- test both cases.
+      { // The input range is shorter than the remaining capacity of the vector -- ensure no reallocation happens.
+        typedef std::vector<int> V;
+        V v(100);
+        v.reserve(v.size() + 10);
+        int a[]     = {1, 2, 3, 4, 5};
+        const int N = sizeof(a) / sizeof(a[0]);
+        V::iterator i =
+            v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+        assert(v.size() == 100 + N);
+        assert(is_contiguous_container_asan_correct(v));
+        assert(i == v.begin() + 10);
+        int j;
+        for (j = 0; j < 10; ++j)
+          assert(v[j] == 0);
+        for (std::size_t k = 0; k < N; ++j, ++k)
+          assert(v[j] == a[k]);
+        for (; j < 105; ++j)
+          assert(v[j] == 0);
+      }
+      { // The input range is longer than the remaining capacity of the vector -- ensure reallocation happens.
+        typedef std::vector<int> V;
+        V v(100);
+        v.reserve(v.size() + 2);
+        int a[]     = {1, 2, 3, 4, 5};
+        const int N = sizeof(a) / sizeof(a[0]);
+        V::iterator i =
+            v.insert(v.cbegin() + 10, cpp17_input_iterator<const int*>(a), cpp17_input_iterator<const int*>(a + N));
+        assert(v.size() == 100 + N);
+        assert(is_contiguous_container_asan_correct(v));
+        assert(i == v.begin() + 10);
+        int j;
+        for (j = 0; j < 10; ++j)
+          assert(v[j] == 0);
+        for (std::size_t k = 0; k < N; ++j, ++k)
+          assert(v[j] == a[k]);
+        for (; j < 105; ++j)
+          assert(v[j] == 0);
+      }
+    }
     {
         typedef std::vector<int> V;
         V v(100);

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, this is a nice find! It looks like a really worthwhile optimization. I have some comments and questions.

@@ -1360,27 +1360,27 @@ vector<_Tp, _Allocator>::__insert_with_sentinel(const_iterator __position, _Inpu
for (; this->__end_ != this->__end_cap() && __first != __last; ++__first) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not attached to this line)

Do we have a benchmark for this operation? What is the before/after comparison? If there's no benchmark, one should be added in libcxx/test/benchmarks/vector_operations.bench.cpp (probably via libcxx/test/benchmarks/ContainerBenchmarks.h). How to run benchmarks is documented at https://libcxx.llvm.org/TestingLibcxx.html#benchmarks (be careful, these instructions are going to be simplified in the near future but for now they're correct).

Copy link
Contributor Author

@winner245 winner245 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently no benchmarks available for vector::insert and vector::assign in libcxx/test/benchmarks/vector_operations.bench.cpp (current benchmarks mainly focused on various constructors of std::vector). I will add relevant benchmarks and share the before/after performance comparisons once I have the results.

Comment on lines 1368 to 1369
auto __guard =
std::__make_exception_guard(_AllocatorDestroyRangeReverse<allocator_type, pointer>(__a, __old_last, __end_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really subtle: this exception guard is correct, but only because it stores references to __old_last and __end_, which get updated in this function. This deserves at least a comment, since _AllocatorDestroyRangeReverse is a private helper defined in another header.

By the way, if you stored a copy instead of a reference, do any of our tests for std::vector start failing? If not, that would point out to missing test coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I appreciate your perspective. Yes, it is generally required to pass the iterators by reference to _AllocatorDestroyRangeReverse if the protected memory range could grow or shrink while the guard is active. A good example is __uninitialized_allocator_copy_impl, where the guarded target memory range grows as the copy operations proceed, making it essential to pass the end iterator by reference to keep the guard in sync with growing protected range.

However, my case is somewhat different because the guarded range is a source memory range [__old_last, __end_), which remains fixed during the guard's protection. This made me realize that my initial commit might have been misleading, as I shrink __end_ to __old_last before calling __guard.complete(). This shrink operation could actually be done after the call to __guard.complete(), as its purpose is to ensure that subsequent relocation in __swap_out_circular_buffer stops at __old_last. To clarify this, I will add a comment and move the call __guard.complete() earlier (i.e., release the guard as soon as feasible).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added new benchmark tests for vector::insert(const_iterator, _InputIterator, _InputIterator) and performed local testing for std::vector<int> and std::vector<std::string>. The results show significant performance improvements in various scenarios.

For std::vector<int>, I tested inserting an input iterator range into an empty, half-full, and nearly-full vector. My local testing demonstrated performance improvements of 1.3x, 1.2x, and 5.8x respectively after applying this patch. Similarly, for std::vectorstd::string, my local testing demonstrated performance improvements of 1.3x, 1.5x, and 3.2x for an empty, half-full, and nearly full vector, respectively.

  • Before:
    before1

  • After:
    after1

@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch from be9ddb6 to a31c874 Compare November 7, 2024 22:11
Copy link

github-actions bot commented Nov 7, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@winner245 winner245 force-pushed the winner245/vector-insert_with_sentinel branch from a31c874 to 9397cb4 Compare November 7, 2024 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants