[libc++] Add input validation for set_intersection() in debug mode. #101508

ichaer · 2024-08-01T16:14:16Z

The use of one-sided binary search introduced by a066217 changes behaviour on invalid, unsorted input (see #75230 (comment)). Add input validation on _LIBCPP_HARDENING_MODE_DEBUG to help users.

Change interface of __is_sorted_until() so that it accepts a sentinel that's of a different type than the beginning iterator, and to ensure it won't try to copy the comparison function object.
Add one assertion for each input range confirming that they are sorted.
Stop validating complexity of set_intersection() in debug mode, it's hopeless and also not meaningful: there are no complexity guarantees in debug mode, we're happy to trade performance for diagnosability.
Fix bugs in ranges_robust_against_differing_projections.pass: we were using an input range as output for std::ranges::partial_sort_copy(), and using projections which return the opposite value means that algorithms requiring a sorted range can't use ranges sorted with ascending values if the comparator is std::ranges::less. Added const where appropriate to make sure we weren't using inputs as outputs in other places.

The use of one-sided binary search introduced by a066217 changes behaviour on invalid, unsorted input (see llvm#75230 (comment)). Add input validation on `_LIBCPP_HARDENING_MODE_DEBUG` to help users. * Change interface of `__is_sorted_until()` so that it accepts a sentinel that's of a different type than the beginning iterator, and to ensure it won't try to copy the comparison function object. * Add one assertion for each input range confirming that they are sorted. * Stop validating complexity of `set_intersection()` in debug mode, it's hopeless and also not meaningful: there are no complexity guarantees in debug mode, we're happy to trade performance for diagnosability. * Fix bugs in `ranges_robust_against_differing_projections.pass`: we were using an input range as output for `std::ranges::partial_sort_copy()`, and using projections which return the opposite value means that algorithms requiring a sorted range can't use ranges sorted with ascending values if the comparator is `std::ranges::less`. Added `const` where appropriate to make sure we weren't using inputs as outputs in other places.

llvmbot · 2024-08-01T16:14:48Z

@llvm/pr-subscribers-libcxx

Author: Iuri Chaer (ichaer)

Changes

The use of one-sided binary search introduced by a066217 changes behaviour on invalid, unsorted input (see #75230 (comment)). Add input validation on _LIBCPP_HARDENING_MODE_DEBUG to help users.

Change interface of __is_sorted_until() so that it accepts a sentinel that's of a different type than the beginning iterator, and to ensure it won't try to copy the comparison function object.
Add one assertion for each input range confirming that they are sorted.
Stop validating complexity of set_intersection() in debug mode, it's hopeless and also not meaningful: there are no complexity guarantees in debug mode, we're happy to trade performance for diagnosability.
Fix bugs in ranges_robust_against_differing_projections.pass: we were using an input range as output for std::ranges::partial_sort_copy(), and using projections which return the opposite value means that algorithms requiring a sorted range can't use ranges sorted with ascending values if the comparator is std::ranges::less. Added const where appropriate to make sure we weren't using inputs as outputs in other places.

Full diff: https://github.com/llvm/llvm-project/pull/101508.diff

4 Files Affected:

(modified) libcxx/include/__algorithm/is_sorted_until.h (+7-7)
(modified) libcxx/include/__algorithm/set_intersection.h (+11)
(modified) libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp (+24-24)
(modified) libcxx/test/std/algorithms/ranges_robust_against_differing_projections.pass.cpp (+19-18)

diff --git a/libcxx/include/__algorithm/is_sorted_until.h b/libcxx/include/__algorithm/is_sorted_until.h
index 53a49f00de31e..f84c990ff2675 100644
--- a/libcxx/include/__algorithm/is_sorted_until.h
+++ b/libcxx/include/__algorithm/is_sorted_until.h
@@ -20,18 +20,18 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class _Compare, class _ForwardIterator>
+template <class _Compare, class _ForwardIterator, class _Sent>
 _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
-__is_sorted_until(_ForwardIterator __first, _ForwardIterator __last, _Compare __comp) {
+__is_sorted_until(_ForwardIterator __first, _Sent __last, _Compare&& __comp) {
   if (__first != __last) {
     _ForwardIterator __i = __first;
-    while (++__i != __last) {
-      if (__comp(*__i, *__first))
-        return __i;
-      __first = __i;
+    while (++__first != __last) {
+      if (__comp(*__first, *__i))
+        return __first;
+      __i = __first;
     }
   }
-  return __last;
+  return __first;
 }
 
 template <class _ForwardIterator, class _Compare>
diff --git a/libcxx/include/__algorithm/set_intersection.h b/libcxx/include/__algorithm/set_intersection.h
index bb0d86cd0f58d..2cfd984b202a2 100644
--- a/libcxx/include/__algorithm/set_intersection.h
+++ b/libcxx/include/__algorithm/set_intersection.h
@@ -11,12 +11,15 @@
 
 #include <__algorithm/comp.h>
 #include <__algorithm/comp_ref_type.h>
+#include <__algorithm/is_sorted_until.h>
 #include <__algorithm/iterator_operations.h>
 #include <__algorithm/lower_bound.h>
+#include <__assert>
 #include <__config>
 #include <__functional/identity.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/next.h>
+#include <__type_traits/is_constant_evaluated.h>
 #include <__type_traits/is_same.h>
 #include <__utility/exchange.h>
 #include <__utility/move.h>
@@ -95,6 +98,14 @@ __set_intersection(
     _Compare&& __comp,
     std::forward_iterator_tag,
     std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {
+    _LIBCPP_ASSERT_INTERNAL(
+        std::__is_sorted_until(__first1, __last1, __comp) == __last1, "set_intersection: input range 1 must be sorted");
+    _LIBCPP_ASSERT_INTERNAL(
+        std::__is_sorted_until(__first2, __last2, __comp) == __last2, "set_intersection: input range 2 must be sorted");
+  }
+#endif
   _LIBCPP_CONSTEXPR std::__identity __proj;
   bool __prev_may_be_equal = false;
 
diff --git a/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp b/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp
index ddf4087ddd6cd..7c0c394d1f23f 100644
--- a/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp
@@ -43,16 +43,15 @@
 
 #include "test_iterators.h"
 
-namespace {
-
-// __debug_less will perform an additional comparison in an assertion
-static constexpr unsigned std_less_comparison_count_multiplier() noexcept {
-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// debug mode provides no complexity guarantees, testing them would be a waste of effort
+// but we still want to run this test, to ensure we don't trigger any assertions
+#ifdef _LIBCPP_HARDENING_MODE_DEBUG
+#  define ASSERT_COMPLEXITY(expression)
 #else
-  return 1;
+#  define ASSERT_COMPLEXITY(expression) assert(expression)
 #endif
-}
+
+namespace {
 
 struct [[nodiscard]] OperationCounts {
   std::size_t comparisons{};
@@ -60,16 +59,16 @@ struct [[nodiscard]] OperationCounts {
     std::size_t proj{};
     IteratorOpCounts iterops;
 
-    [[nodiscard]] constexpr bool isNotBetterThan(const PerInput& other) {
+    [[nodiscard]] constexpr bool isNotBetterThan(const PerInput& other) const noexcept {
       return proj >= other.proj && iterops.increments + iterops.decrements + iterops.zero_moves >=
                                        other.iterops.increments + other.iterops.decrements + other.iterops.zero_moves;
     }
   };
   std::array<PerInput, 2> in;
 
-  [[nodiscard]] constexpr bool isNotBetterThan(const OperationCounts& expect) {
-    return std_less_comparison_count_multiplier() * comparisons >= expect.comparisons &&
-           in[0].isNotBetterThan(expect.in[0]) && in[1].isNotBetterThan(expect.in[1]);
+  [[nodiscard]] constexpr bool isNotBetterThan(const OperationCounts& expect) const noexcept {
+    return comparisons >= expect.comparisons && in[0].isNotBetterThan(expect.in[0]) &&
+           in[1].isNotBetterThan(expect.in[1]);
   }
 };
 
@@ -80,16 +79,17 @@ struct counted_set_intersection_result {
 
   constexpr counted_set_intersection_result() = default;
 
-  constexpr explicit counted_set_intersection_result(std::array<int, ResultSize>&& contents) : result{contents} {}
+  constexpr explicit counted_set_intersection_result(std::array<int, ResultSize>&& contents) noexcept
+      : result{contents} {}
 
-  constexpr void assertNotBetterThan(const counted_set_intersection_result& other) {
+  constexpr void assertNotBetterThan(const counted_set_intersection_result& other) const noexcept {
     assert(result == other.result);
-    assert(opcounts.isNotBetterThan(other.opcounts));
+    ASSERT_COMPLEXITY(opcounts.isNotBetterThan(other.opcounts));
   }
 };
 
 template <std::size_t ResultSize>
-counted_set_intersection_result(std::array<int, ResultSize>) -> counted_set_intersection_result<ResultSize>;
+counted_set_intersection_result(std::array<int, ResultSize>) noexcept -> counted_set_intersection_result<ResultSize>;
 
 template <template <class...> class InIterType1,
           template <class...>
@@ -306,7 +306,7 @@ constexpr bool testComplexityBasic() {
   std::array<int, 5> r2{2, 4, 6, 8, 10};
   std::array<int, 0> expected{};
 
-  const std::size_t maxOperation = std_less_comparison_count_multiplier() * (2 * (r1.size() + r2.size()) - 1);
+  [[maybe_unused]] const std::size_t maxOperation = 2 * (r1.size() + r2.size()) - 1;
 
   // std::set_intersection
   {
@@ -321,7 +321,7 @@ constexpr bool testComplexityBasic() {
     std::set_intersection(r1.begin(), r1.end(), r2.begin(), r2.end(), out.data(), comp);
 
     assert(std::ranges::equal(out, expected));
-    assert(numberOfComp <= maxOperation);
+    ASSERT_COMPLEXITY(numberOfComp <= maxOperation);
   }
 
   // ranges::set_intersection iterator overload
@@ -349,9 +349,9 @@ constexpr bool testComplexityBasic() {
     std::ranges::set_intersection(r1.begin(), r1.end(), r2.begin(), r2.end(), out.data(), comp, proj1, proj2);
 
     assert(std::ranges::equal(out, expected));
-    assert(numberOfComp <= maxOperation);
-    assert(numberOfProj1 <= maxOperation);
-    assert(numberOfProj2 <= maxOperation);
+    ASSERT_COMPLEXITY(numberOfComp <= maxOperation);
+    ASSERT_COMPLEXITY(numberOfProj1 <= maxOperation);
+    ASSERT_COMPLEXITY(numberOfProj2 <= maxOperation);
   }
 
   // ranges::set_intersection range overload
@@ -379,9 +379,9 @@ constexpr bool testComplexityBasic() {
     std::ranges::set_intersection(r1, r2, out.data(), comp, proj1, proj2);
 
     assert(std::ranges::equal(out, expected));
-    assert(numberOfComp < maxOperation);
-    assert(numberOfProj1 < maxOperation);
-    assert(numberOfProj2 < maxOperation);
+    ASSERT_COMPLEXITY(numberOfComp < maxOperation);
+    ASSERT_COMPLEXITY(numberOfProj1 < maxOperation);
+    ASSERT_COMPLEXITY(numberOfProj2 < maxOperation);
   }
   return true;
 }
diff --git a/libcxx/test/std/algorithms/ranges_robust_against_differing_projections.pass.cpp b/libcxx/test/std/algorithms/ranges_robust_against_differing_projections.pass.cpp
index 82792249ef5c7..a5f1ceee6fb3d 100644
--- a/libcxx/test/std/algorithms/ranges_robust_against_differing_projections.pass.cpp
+++ b/libcxx/test/std/algorithms/ranges_robust_against_differing_projections.pass.cpp
@@ -40,19 +40,20 @@ constexpr bool test_all() {
     constexpr auto operator<=>(const A&) const = default;
   };
 
-  std::array in = {1, 2, 3};
-  std::array in2 = {A{4}, A{5}, A{6}};
+  const std::array in  = {1, 2, 3};
+  const std::array in2 = {A{4}, A{5}, A{6}};
 
   std::array output = {7, 8, 9, 10, 11, 12};
   auto out = output.begin();
   std::array output2 = {A{7}, A{8}, A{9}, A{10}, A{11}, A{12}};
   auto out2 = output2.begin();
 
-  std::ranges::equal_to eq;
-  std::ranges::less less;
-  auto sum = [](int lhs, A rhs) { return lhs + rhs.x; };
-  auto proj1 = [](int x) { return x * -1; };
-  auto proj2 = [](A a) { return a.x * -1; };
+  const std::ranges::equal_to eq;
+  const std::ranges::less less;
+  const std::ranges::greater greater;
+  const auto sum   = [](int lhs, A rhs) { return lhs + rhs.x; };
+  const auto proj1 = [](int x) { return x * -1; };
+  const auto proj2 = [](A a) { return a.x * -1; };
 
 #if TEST_STD_VER >= 23
   test(std::ranges::ends_with, in, in2, eq, proj1, proj2);
@@ -67,17 +68,17 @@ constexpr bool test_all() {
   test(std::ranges::find_end, in, in2, eq, proj1, proj2);
   test(std::ranges::transform, in, in2, out, sum, proj1, proj2);
   test(std::ranges::transform, in, in2, out2, sum, proj1, proj2);
-  test(std::ranges::partial_sort_copy, in, in2, less, proj1, proj2);
-  test(std::ranges::merge, in, in2, out, less, proj1, proj2);
-  test(std::ranges::merge, in, in2, out2, less, proj1, proj2);
-  test(std::ranges::set_intersection, in, in2, out, less, proj1, proj2);
-  test(std::ranges::set_intersection, in, in2, out2, less, proj1, proj2);
-  test(std::ranges::set_difference, in, in2, out, less, proj1, proj2);
-  test(std::ranges::set_difference, in, in2, out2, less, proj1, proj2);
-  test(std::ranges::set_symmetric_difference, in, in2, out, less, proj1, proj2);
-  test(std::ranges::set_symmetric_difference, in, in2, out2, less, proj1, proj2);
-  test(std::ranges::set_union, in, in2, out, less, proj1, proj2);
-  test(std::ranges::set_union, in, in2, out2, less, proj1, proj2);
+  test(std::ranges::partial_sort_copy, in, output, less, proj1, proj2);
+  test(std::ranges::merge, in, in2, out, greater, proj1, proj2);
+  test(std::ranges::merge, in, in2, out2, greater, proj1, proj2);
+  test(std::ranges::set_intersection, in, in2, out, greater, proj1, proj2);
+  test(std::ranges::set_intersection, in, in2, out2, greater, proj1, proj2);
+  test(std::ranges::set_difference, in, in2, out, greater, proj1, proj2);
+  test(std::ranges::set_difference, in, in2, out2, greater, proj1, proj2);
+  test(std::ranges::set_symmetric_difference, in, in2, out, greater, proj1, proj2);
+  test(std::ranges::set_symmetric_difference, in, in2, out2, greater, proj1, proj2);
+  test(std::ranges::set_union, in, in2, out, greater, proj1, proj2);
+  test(std::ranges::set_union, in, in2, out2, greater, proj1, proj2);
 #if TEST_STD_VER > 20
   test(std::ranges::starts_with, in, in2, eq, proj1, proj2);
 #endif

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

alexfh · 2024-08-02T12:49:11Z

Thanks for adding the runtime checks! I've added a couple of nits, but I'll leave the actual review to someone familiar with libc++ development though.

philnik777 · 2024-08-03T10:56:11Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.
+#ifdef _LIBCPP_HARDENING_MODE_DEBUG


This is wrong. _LIBCPP_HARDENING_MODE_DEBUG is always defined.

Suggested change

#ifdef _LIBCPP_HARDENING_MODE_DEBUG

#if defined(_LIBCPP_HARDENING_MODE_DEBUG) && _LIBCPP_HARDENING_MODE_DEBUG

We still need to check for if defined(_LIBCPP_HARDENING_MODE_DEBUG) to avoid -Wundef when testing non-libc++ libraries.

philnik777 · 2024-08-03T10:57:36Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-static constexpr unsigned std_less_comparison_count_multiplier() noexcept {
-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.


I disagree here. While we do not conform to the standards requirements, we do try to avoid increasing the complexity. A constant factor is usually OK, and it would be good to set the current increase here to avoid increasing it accidentally.

I agree with @philnik777. @var-const should chime in, but basically while we are OK with larger performance penalties in the debug mode, we don't "not care" about changing the complexity.

@var-const It turns out this is probably something we should have documented in the Hardening mode documentation, and we should go ahead and document it now.

I agree with what you're saying, I unfortunately have a ton of first-hand experience with not being able to use the best tool for the job because it's too slow, but this isn't as simple as it may seem. I'll argue the case for what I did a little bit out of pride, but mostly because I believe I made this choice for valid reasons which I probably should have shared in the commit description or a code comment.

You'll see we had previously a function called std_less_comparison_count_multiplier(), which was already doing some of what we're continuing here: adding a constant factor to an operation when in debug mode. When I added that I was very uncomfortable with how much it looked like a change detector, but it was just the one and it looked harmless enough, so I just did that. I could make it a less effective change detector by adding some padding to my constant factor and using it for all operations, but the improvement we made to set_intersection() has a twist to it: the worst-case scenario complexity is linear, but in most cases the effective complexity is logarithmic, and we do check some of those cases in the complexity test (because it's an improvement and we don't want to break it, right?). Input validation, however, is linear. So, conceptually, using a larger constant doesn't really work. In practice it would, because, in practice, we only test with (small) fixed-size inputs, but it's conceptually wrong. Then there is the question of the arbitrarily padded constant: if it's arbitrarily padded, how much is it really helping? It's still a change detector, a test which will at some point break when someone adds more debug validations. With all that, is the test still a net positive? And let's say we changed the logic of complexity validation in debug mode so that we checked that it's linear with a less-padded constant, would that even make sense?

I agree that performance on debug mode is not something we want to ignore, but I don't have a good solution for any of it, and, since I don't have a good solution for automating validation, I would personally prefer relying on the safety net of code reviews to catch this sort of stuff. Validating the number of operations is really strict, it's a tricky thing to mix with debug instrumentation.

Having said all that, I'm happy to change this proposal if you disagree. We could have a single constant multiplier to be used for all operations, or one constant for comparisons+projections and another one for iterator operations.

What do you think?

Thanks a lot for writing this out!

I think we've been somewhat consciously postponing making the decision on what kind of performance guarantees the debug mode should provide (if any) -- there was no pressing need to make a commitment there, and I wanted to gain some usage experience to inform this decision. I would say that from what I've seen so far, I'm leaning towards making the debug mode guarantee big-O complexity (the Standard sometimes mandates the exact number of operations which would be too strict, IMO). I'm very concerned that unless we keep even the debug mode relatively lightweight and performant, it will end up a mode that no one enables and thus is effectively useless; I'm especially concerned about a potential "death from a thousand cuts" scenario where many checks, each one not that heavyweight on its own, add up to something that is unacceptably slow for too many users.

IIUC, the new checks don't really check the complexity here as well (it makes the average case worse, but worst case is the same).

We do need to test for the exact complexity in regular modes (since it's mandated by the Standard), but I can relate to your perspective that these tests aren't really well-suited for the debug mode. Not checking for complexity in the debug mode makes sense to me; we should change the comment, however, like Louis suggested.

I'm leaning towards making the debug mode guarantee big-O complexity

I like that. It's tricky to automate validation, but it sounds like the right thing to do. I suppose we could either keep that part only in the benchmark tests, making use of the asymptotic complexity calculation feature it has, or come up with a framework for testing both strict number of steps and asymptotic complexity, and we'd do the curve fitting ourselves... maybe something like:

template <typename T, size_t NumDimensions> struct ComplexityTestCaseInput { T input; std::array<size_t, NumDimensions> dimensions; }; struct ComplexityTestCaseExpectation { std::function<bool(const OutputStepsType&)> isNumStepsGood; AsymptoticComplexityFunc standardGuarantee; std::optional<AsymptoticComplexityFunc> nonDebugExpectation; }; template <class InputRangeType, size_t NumDimensions> bool meetsComplexityRequirements( InputRangeType inputRange, std::function<std::array<size_t, NumDimensions>(decltype(*std::declval<InputRangeType>().begin()))> test, std::array<ComplexityTestCaseExpectation, NumDimensions> expectedComplexity);

lol, sorry, got carried away ;). feels a bit clunky too... needs more time to brew.

ldionne

Thanks for the patch! I like the idea but I do have some comments. IMO this is the right way of addressing @alexfh 's original comments on the patch that added one-sided binary search.

ldionne · 2024-08-05T16:13:31Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {


Why is this required?

Because__builtin_expect(), which_LIBCPP_ASSERT() expands to, can't be constant-evaluated. I learned that from a compilation error, btw.

Edit: Sorry, now that I said it I'm not sure =/. Maybe it wasn't __builtin_expect(), but _LIBCPP_VERBOSE_ABORT()? Anyway, something inside _LIBCPP_ASSERT() can't be constant-evaluated. I had been using __check_strict_weak_ordering_sorted() as my blueprint for this change, but I left that bit out and the compiler explained to me why I couldn't.

I still don't understand. Can you try removing the if(is-constant-evaluated) and let's see what the CI says? You're probably right, but I'd like to see what the error is.

+1 -- also curious to see the error.

Yep, you're both right, that was bogus! Thanks for challenging me on it :).

ldionne · 2024-08-05T16:14:01Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {
+    _LIBCPP_ASSERT_INTERNAL(


This should likely be _LIBCPP_ASSERT_ARGUMENT_WITHIN_DOMAIN but I'd like @var-const to chime in.

I should have added a comment: I didn't do that because _LIBCPP_ASSERT_ARGUMENT_WITHIN_DOMAIN is enabled in _LIBCPP_HARDENING_MODE_EXTENSIVE, and I thought the cost wasn't appropriate. More about this in my response to https://github.com/llvm/llvm-project/pull/101508/files#r1702696227.

Ah, I think this should be _LIBCPP_ASSERT_SEMANTIC_REQUIREMENT instead, actually. That matches what we do for __check_strict_weak_ordering_sorted.

+1 to Louis' comment -- internal is meant for checks that aim to catch bugs in our own implementation, bugs that are independent from user input. Since this checks the user-provided arguments, we need to find some other category.

My first intuition is that argument-within-domain is a somewhat better match than semantic-requirement. In __check_strict_weak_ordering_sorted, we're checking at the resulting (presumably) sorted sequence as a way to validate the given comparator -- the comparator has the semantic requirement to provide strict weak ordering, but we cannot check that without resorting to an imperfect heuristic. Here, however, we are checking the given argument directly, and the check is very straightforward, just expensive. argument-within-domain is essentially a catch-all for "the given argument is valid but if it's not, it won't cause UB within our code (but will produce an incorrect result that might well cause UB in user code)", which seems to apply to the situation here.

Since we're wrapping the whole thing in a conditional, it's not really important which modes enable the assertion category we choose -- e.g. if we choose argument-within-domain that is enabled in both extensive and debug, the check for _LIBCPP_HARDENING_MODE_DEBUG still makes sure it only runs in debug. It's a little inelegant, but we already have precedent in __check_strict_weak_ordering_sorted, so I wouldn't try to fix that within this patch.

~~Yep, you're both right, that was bogus! Thanks for challenging me on it :).~~

lol, I somehow responded to the wrong thread, copied the response there. On this thread, I did change it to use _LIBCPP_ASSERT_SEMANTIC_REQUIREMENT().

ldionne · 2024-08-05T16:15:57Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-static constexpr unsigned std_less_comparison_count_multiplier() noexcept {
-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.


I agree with @philnik777. @var-const should chime in, but basically while we are OK with larger performance penalties in the debug mode, we don't "not care" about changing the complexity.

@var-const It turns out this is probably something we should have documented in the Hardening mode documentation, and we should go ahead and document it now.

ldionne

There are some changes to make (per comments) but I think we can move forward with this patch once they are addressed. Thanks!

ldionne · 2024-08-30T14:16:30Z

libcxx/include/__algorithm/is_sorted_until.h

 _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _ForwardIterator
-__is_sorted_until(_ForwardIterator __first, _ForwardIterator __last, _Compare __comp) {
+__is_sorted_until(_ForwardIterator __first, _Sent __last, _Compare&& __comp) {
  if (__first != __last) {
    _ForwardIterator __i = __first;


Suggested change

_ForwardIterator __i = __first;

_ForwardIterator __prev = __first;

This makes the code quite a bit clearer.

ldionne · 2024-08-30T14:18:54Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {
+    _LIBCPP_ASSERT_INTERNAL(


Ah, I think this should be _LIBCPP_ASSERT_SEMANTIC_REQUIREMENT instead, actually. That matches what we do for __check_strict_weak_ordering_sorted.

ldionne · 2024-08-30T14:20:12Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {


I still don't understand. Can you try removing the if(is-constant-evaluated) and let's see what the CI says? You're probably right, but I'd like to see what the error is.

ldionne · 2024-08-30T14:35:50Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-static constexpr unsigned std_less_comparison_count_multiplier() noexcept {
-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.


In light of the discussion above (thanks for the detailed explanation BTW), I would change to this:

Suggested change

// Debug mode provides no complexity guarantees, testing them would be a waste of effort.

// We don't check number of operations in Debug mode because they are not stable enough due to additional validations

That way we're not making a statement about whether the complexity is supposed to be the same or not. I'm basically sweeping this whole thing under the rug.

ldionne · 2024-08-30T14:36:52Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.
+#ifdef _LIBCPP_HARDENING_MODE_DEBUG


Suggested change

#ifdef _LIBCPP_HARDENING_MODE_DEBUG

#if defined(_LIBCPP_HARDENING_MODE_DEBUG) && _LIBCPP_HARDENING_MODE_DEBUG

We still need to check for if defined(_LIBCPP_HARDENING_MODE_DEBUG) to avoid -Wundef when testing non-libc++ libraries.

libcxx/include/__algorithm/is_sorted_until.h

ichaer · 2024-09-01T00:18:15Z

@ldionne, thanks for the review, I'm out on holiday until the last week of September but all your comments make sense to me, I'll make the changes as soon as I'm back.

var-const

Thanks a lot for the patch! Sorry about the slow review, unfortunately I didn't have much time for reviewing recently. My comments mostly echo existing feedback.

var-const · 2024-09-12T22:50:22Z

libcxx/include/__algorithm/is_sorted_until.h

-      if (__comp(*__i, *__first))
-        return __i;
-      __first = __i;
+    while (++__first != __last) {


Question: what is the reason to swap first and i here?

It's because we might skip the loop altogether if (__first == __last), and with the sentinel-friendly interface we can no longer return __last. In the old version __first would always be behind __i, so a fully-sorted non-empty input would have us returning the one-before-last position.

var-const · 2024-09-12T23:06:01Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {
+    _LIBCPP_ASSERT_INTERNAL(


+1 to Louis' comment -- internal is meant for checks that aim to catch bugs in our own implementation, bugs that are independent from user input. Since this checks the user-provided arguments, we need to find some other category.

My first intuition is that argument-within-domain is a somewhat better match than semantic-requirement. In __check_strict_weak_ordering_sorted, we're checking at the resulting (presumably) sorted sequence as a way to validate the given comparator -- the comparator has the semantic requirement to provide strict weak ordering, but we cannot check that without resorting to an imperfect heuristic. Here, however, we are checking the given argument directly, and the check is very straightforward, just expensive. argument-within-domain is essentially a catch-all for "the given argument is valid but if it's not, it won't cause UB within our code (but will produce an incorrect result that might well cause UB in user code)", which seems to apply to the situation here.

Since we're wrapping the whole thing in a conditional, it's not really important which modes enable the assertion category we choose -- e.g. if we choose argument-within-domain that is enabled in both extensive and debug, the check for _LIBCPP_HARDENING_MODE_DEBUG still makes sure it only runs in debug. It's a little inelegant, but we already have precedent in __check_strict_weak_ordering_sorted, so I wouldn't try to fix that within this patch.

var-const · 2024-09-12T23:06:36Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {


+1 -- also curious to see the error.

var-const · 2024-09-12T23:26:49Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-static constexpr unsigned std_less_comparison_count_multiplier() noexcept {
-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.


Thanks a lot for writing this out!

I think we've been somewhat consciously postponing making the decision on what kind of performance guarantees the debug mode should provide (if any) -- there was no pressing need to make a commitment there, and I wanted to gain some usage experience to inform this decision. I would say that from what I've seen so far, I'm leaning towards making the debug mode guarantee big-O complexity (the Standard sometimes mandates the exact number of operations which would be too strict, IMO). I'm very concerned that unless we keep even the debug mode relatively lightweight and performant, it will end up a mode that no one enables and thus is effectively useless; I'm especially concerned about a potential "death from a thousand cuts" scenario where many checks, each one not that heavyweight on its own, add up to something that is unacceptably slow for too many users.

IIUC, the new checks don't really check the complexity here as well (it makes the average case worse, but worst case is the same).

We do need to test for the exact complexity in regular modes (since it's mandated by the Standard), but I can relate to your perspective that these tests aren't really well-suited for the debug mode. Not checking for complexity in the debug mode makes sense to me; we should change the comment, however, like Louis suggested.

* remove bogus __libcpp_is_constant_evaluated() check in __set_intersection() * change comment about checking number of steps in debug mode * s/_LIBCPP_ASSERT_SEMANTIC_REQUIREMENT/_LIBCPP_ASSERT_INTERNAL/ for input validation

ichaer

Sorry for the radio silence, I don't think I've addressed everything, will look into the remaining comments soon.

ichaer · 2024-11-09T15:32:18Z

libcxx/include/__algorithm/is_sorted_until.h

-      if (__comp(*__i, *__first))
-        return __i;
-      __first = __i;
+    while (++__first != __last) {


It's because we might skip the loop altogether if (__first == __last), and with the sentinel-friendly interface we can no longer return __last. In the old version __first would always be behind __i, so a fully-sorted non-empty input would have us returning the one-before-last position.

ichaer · 2024-11-09T16:43:16Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-static constexpr unsigned std_less_comparison_count_multiplier() noexcept {
-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// Debug mode provides no complexity guarantees, testing them would be a waste of effort.


I'm leaning towards making the debug mode guarantee big-O complexity

I like that. It's tricky to automate validation, but it sounds like the right thing to do. I suppose we could either keep that part only in the benchmark tests, making use of the asymptotic complexity calculation feature it has, or come up with a framework for testing both strict number of steps and asymptotic complexity, and we'd do the curve fitting ourselves... maybe something like:

template <typename T, size_t NumDimensions> struct ComplexityTestCaseInput { T input; std::array<size_t, NumDimensions> dimensions; }; struct ComplexityTestCaseExpectation { std::function<bool(const OutputStepsType&)> isNumStepsGood; AsymptoticComplexityFunc standardGuarantee; std::optional<AsymptoticComplexityFunc> nonDebugExpectation; }; template <class InputRangeType, size_t NumDimensions> bool meetsComplexityRequirements( InputRangeType inputRange, std::function<std::array<size_t, NumDimensions>(decltype(*std::declval<InputRangeType>().begin()))> test, std::array<ComplexityTestCaseExpectation, NumDimensions> expectedComplexity);

lol, sorry, got carried away ;). feels a bit clunky too... needs more time to brew.

ichaer · 2024-11-09T16:49:25Z

libcxx/include/__algorithm/set_intersection.h

@@ -95,6 +98,14 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  if (!__libcpp_is_constant_evaluated()) {
+    _LIBCPP_ASSERT_INTERNAL(


~~Yep, you're both right, that was bogus! Thanks for challenging me on it :).~~

lol, I somehow responded to the wrong thread, copied the response there. On this thread, I did change it to use _LIBCPP_ASSERT_SEMANTIC_REQUIREMENT().

ldionne · 2024-11-11T15:01:02Z

Please ping us when you're ready for someone to take another look!

…ing_input_check * llvm/main: (10810 commits) [LLVM][SVE] Extend dup(extract_elt(v,i)) isel patterns to cover more combinations. (llvm#115189) [lldb][test] TestConstStaticIntegralMember.py: skip dsym variant for older compiler versions [mlir][LLVM] Add nneg flag (llvm#115498) [llvm-debuginfo-analyzer] Incorrect DW_AT_call_line/DW_AT_call_file. (llvm#115701) [llvm] Remove `br i1 undef` from some regression tests [NFC] (llvm#115691) [mlir][Transforms][NFC] CSE: Split tests and fix typo (llvm#115680) Reland "Add clang::lifetimebound annotation to llvm::function_ref" [X86] lowerShuffleAsLanePermuteAndPermute - simplify lane crossing mask based on demanded elts [InstCombine] Fold (sext(a) & c1) == c2 to (a & c3) == trunc(c2) (llvm#112646) [SCEVExpander] Don't try to reuse SCEVUnknown values (llvm#115141) [flang][OpenMP] delayed privatisation lowering for TASK (llvm#113591) adding missing lib MLIRDestinationStyleOpInterface (llvm#115703) [MLIR] NFC. Move leftover memref op test cases out of test/IR (llvm#115583) [lldb][ObjC] Fix method list entry offset calculation (llvm#115571) [GlobalISel] Add G_STEP_VECTOR instruction (llvm#115598) [AArch64][GlobalISel] Do not create LIFETIME instructions in functions. (llvm#115669) [SelectionDAG] Add support for extending masked loads in computeKnownBits (llvm#115450) Fix for OpenMP offloading compilation error with GNU++20 option when using complex header (llvm#115306) Fix for codegen Crash in Clang when using locator omp_all_memory with depobj construct (llvm#114221) [Clang][AArch64] svadda is not available in streaming mode ...

ichaer · 2024-11-12T12:48:17Z

Please ping us when you're ready for someone to take another look!

@ldionne, I think this is good for a new pass, the only thing making me unsure is this CI failure on the macos (generic-cxx23, macos-latest) job, it looks unrelated... I went through a few of the errors, the one for build/generic-cxx23/libcxx/test/libcxx/clang_modules_include.gen.py/cctype.compile.pass.cpp looks particularly convincing, I don't think there is a way in which the changes from this MR could have affected that. The comment in that file about issues with localisation looks relevant to the error message:

2024-11-12T04:54:47.5099180Z # | /Users/runner/work/llvm-project/llvm-project/build/generic-cxx23/libcxx/test-suite-install/include/c++/v1/__locale_dir/support/bsd_like.h:140:91: error: use of undeclared identifier 'MB_CUR_MAX_L'

I may be wrong, though, and I wouldn't want for you to spend time on this if you think I may have caused it. Let me know what you think, I can dig deeper if you believe these changes may have caused it.

ldionne

This looks quite reasonable, just a few comments / questions.

ldionne · 2024-11-12T16:03:27Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp

-#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
-  return 2;
+// We don't check number of operations in Debug mode because they are not stable enough due to additional validations.
+#if defined(_LIBCPP_HARDENING_MODE_DEBUG) && _LIBCPP_HARDENING_MODE_DEBUG


Suggested change

#if defined(_LIBCPP_HARDENING_MODE_DEBUG) && _LIBCPP_HARDENING_MODE_DEBUG

#if defined(_LIBCPP_VERSION) && _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG

ldionne · 2024-11-12T16:04:18Z

libcxx/test/std/algorithms/ranges_robust_against_differing_projections.pass.cpp


 #if TEST_STD_VER >= 23
  test(std::ranges::ends_with, in, in2, eq, proj1, proj2);
 #endif
  test(std::ranges::equal, in, in2, eq, proj1, proj2);
  test(std::ranges::lexicographical_compare, in, in2, eq, proj1, proj2);
  test(std::ranges::is_permutation, in, in2, eq, proj1, proj2);
-  test(std::ranges::includes, in, in2, less, proj1, proj2);
+  test(std::ranges::includes, in, in2, greater, proj1, proj2);


Why is this change needed?

ldionne · 2024-11-12T16:06:46Z

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp


-  constexpr void assertNotBetterThan(const counted_set_intersection_result& other) {
+  constexpr void assertNotBetterThan(const counted_set_intersection_result& other) const noexcept {


Just curious, why add noexcept?

ldionne · 2024-11-12T16:09:29Z

libcxx/include/__algorithm/set_intersection.h

@@ -96,6 +99,12 @@ __set_intersection(
    _Compare&& __comp,
    std::forward_iterator_tag,
    std::forward_iterator_tag) {
+#if _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG
+  _LIBCPP_ASSERT_SEMANTIC_REQUIREMENT(


This patch should also add tests to trigger these assertions. You can see an example of how that's done in e.g. libcxx/test/libcxx/algorithms/alg.sorting/assert.min.max.pass.cpp. You should be able to just purposefully call the algorithm with something that's badly sorted.

ichaer requested a review from a team as a code owner August 1, 2024 16:14

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 1, 2024

ichaer mentioned this pull request Aug 1, 2024

[libc++] Speed up set_intersection() by fast-forwarding over ranges of non-matching elements with one-sided binary search. #75230

Merged

Oops, missed std::ranges::includes()! It also requires sorted input.

7f2beae

alexfh reviewed Aug 2, 2024

View reviewed changes

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp Outdated Show resolved Hide resolved

...orithms/alg.sorting/alg.set.operations/set.intersection/set_intersection_complexity.pass.cpp Outdated Show resolved Hide resolved

Addressing @alexfh's feedback.

3006f13

philnik777 reviewed Aug 3, 2024

View reviewed changes

ldionne requested a review from var-const August 5, 2024 16:12

ldionne requested changes Aug 5, 2024

View reviewed changes

ldionne reviewed Aug 30, 2024

View reviewed changes

var-const added the hardening Issues related to the hardening effort label Sep 12, 2024

var-const requested changes Sep 12, 2024

View reviewed changes

* s/__i/__prev/ in __is_sorted_until()

22c9f1c

* remove bogus __libcpp_is_constant_evaluated() check in __set_intersection() * change comment about checking number of steps in debug mode * s/_LIBCPP_ASSERT_SEMANTIC_REQUIREMENT/_LIBCPP_ASSERT_INTERNAL/ for input validation

ichaer commented Nov 9, 2024

View reviewed changes

ichaer added 3 commits November 11, 2024 15:16

Fix hardening mode preprocessor check.

e8469c1

Add release note. Might be too verbose, let me know what you think.

9a7cceb

ldionne requested changes Nov 12, 2024

View reviewed changes

	#ifdef _LIBCPP_HARDENING_MODE_DEBUG
	#if defined(_LIBCPP_HARDENING_MODE_DEBUG) && _LIBCPP_HARDENING_MODE_DEBUG

	_ForwardIterator __i = __first;
	_ForwardIterator __prev = __first;

	// Debug mode provides no complexity guarantees, testing them would be a waste of effort.
	// We don't check number of operations in Debug mode because they are not stable enough due to additional validations

	#if defined(_LIBCPP_HARDENING_MODE_DEBUG) && _LIBCPP_HARDENING_MODE_DEBUG
	#if defined(_LIBCPP_VERSION) && _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG


		constexpr void assertNotBetterThan(const counted_set_intersection_result& other) {
		constexpr void assertNotBetterThan(const counted_set_intersection_result& other) const noexcept {

[libc++] Add input validation for set_intersection() in debug mode. #101508

Are you sure you want to change the base?

[libc++] Add input validation for set_intersection() in debug mode. #101508

Conversation

ichaer commented Aug 1, 2024

llvmbot commented Aug 1, 2024

alexfh commented Aug 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ichaer Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

ldionne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ichaer Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

var-const Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

ichaer Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ldionne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ichaer commented Sep 1, 2024

var-const left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

var-const Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ichaer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ichaer Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

ichaer Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

ldionne commented Nov 11, 2024

ichaer commented Nov 12, 2024

ldionne left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ichaer Nov 9, 2024 •

edited

Loading

ichaer Aug 6, 2024 •

edited

Loading

var-const Sep 12, 2024 •

edited

Loading

ichaer Nov 9, 2024 •

edited

Loading

var-const Sep 12, 2024 •

edited

Loading

ichaer Nov 9, 2024 •

edited

Loading

ichaer Nov 9, 2024 •

edited

Loading