-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop invariant is not deduced in C++-iterator-style loop over pointers #101372
Comments
I think what's missing here is the information that both Without that information, I think we currently have to assume that incrementing For this particular case, one way to add this information would be to add |
Oh good point. I hadn't thought of that dependency. Although it is UB in C to create unaligned pointers, so I think that is a fair assumption for the compiler to make. Otherwise things like pointer subtraction kinda go haywire. |
Clang assumes that references are aligned, but not raw pointers. You can do a |
Ah yes. If I read https://en.cppreference.com/w/cpp/language/reinterpret_cast correctly, I guess those casts would violate 5), but has too much widespread use to be too aggressive here. This also pessimizes other parts of libc++, e.g. not being able to compute trip counts for things like |
Could perhaps clang assume pointers are aligned when it encounters arithmetic with them? |
FWIW, it's not that many of them. A Alternatively, we could use
That's doubly supported by the standard. Not only must pointers always be aligned, but if you ever write In the context of #101370, the analogs to For completeness, there is another thorny corner of pointers and alignment, which is taking the address of fields in a packed structs. There's a warning for it (though it's incomplete; see #97091), but if the compiler ever lowers references to pointers without tagging their (lack of) alignment, that might do weird things. |
Oh the alignment discussion is interesting though. It means I can at least try to fix #101370 for |
One option would be to use |
Missing information about begin and end pointers of std::vector can lead to missed optimizations in LLVM. See llvm#101372 for a discussion of missed range check optimizations in hardened mode. Once llvm#108958 lands, the created `llvm.assume` calls for the alignment should be folded into the `load` instructions, resulting in no extra instructions after InstCombine.
Created #108961 to create assumptions for begin/end pointers of std::vector |
Will there be separate versions of that PR for std::array and std::basic_string, etc? In Chromium, I think I will just do this inside the CheckedContiguousIterator (our version of |
@danakj I don't think the others needed those assumptions, at least for basic ranged-for loops. |
I guess the issue in this bug is about pointers, because the compiler can't see that |
It would be good to check as well |
No, it's just vector whose iterators were bounded by
Figured that out. Will upload a PR shortly. It is indeed the alignment issue discussed here and then needing to relate the pointers together from the other bug. |
Actually, the alignment thing seems to have some connection to #108600, playing around with it. (Though I haven't fully figured out what's up there.) So maybe it'll be useful generally? Not sure. There is a slight risk doing it generally in that the compiler is apparently bad at discarding unnecessary assumes. But hopefully ordering assumptions are safe? |
@fhahn Actually, is alignment enough? Let's suppose
To solve this, I think the programmer needs to write something that tells the compiler that What if, instead of alignment annotations, we taught Clang to reason about pointer arithmetic preconditions like this? |
Playing around, this seems to address llvm#101370 for `std::vector<char>`, but not `std::vector<int>`. `std::vector<int>` I believe also needs a solution to llvm#101372, which is an alignment issue. The root problem is that vector uses end_cap instead of end as the hardening fencepost. But user code (be it an actual `iter != vec.end()` check, or one synthesized by the language in a range-for loop) uses the container end as the fencepost. We would like the user fencepost to delete the hardening fencepost. For that to happen, the compiler must know that if you take your iterator and then steadily `++iter`, stopping at `iter == end`, you won't hit `iter == end_cap` along the way. To fgire this out, the compiler needs to know a few things: 1. `iter <= end <= end_cap` at the start 2. `iter`, `end`, and `end_cap` are all compatibly aligned, such that `++iter` cannot skip over `end` and then get to `end_cap`. The first of these is not obvious in `std::vector` for because `std::vector` stores three pointers, rather than one pointer and then sizes. That means the compiler never sees `end` (or `end_cap`) computed as `begin + size` (or `begin + capacity`). Without type invariants, the compiler does not know that the three pointers have any relation at all. This PR addresses it by putting assumes in `__bounded_iter` itself. We could also place it in `std::vector::__make_iter`, but this invariant is important enough for reasoning about bounds that it seemed worth establishing it across the board. (Note this means we trust container implementations to use the bounded iterators correctly, which we already do. We're interested in catching bugs in user code, not the STL itself.) That alone is actually enough to handle this because constructing `vector::end()` is enough to tell the compiler that `begin <= end`, and loops usually start at `begin`. But since `__make_iter` is sometimes called on non-endpoint iterators, I added one extra invariant to `__make_iter`. The second issue is llvm#101372. This PR does not address it but will (hopefully) take advantage of it once available. In working on this, I noticed that _LIBCPP_ASSUME silences -Wassume. Without that warning, I ended up spending a lot of time debugging silently no-op assumes. This seems to be a remnant of when _LIBCPP_ASSUME was part of _LIBCPP_ASSERT. Now that it's standalone, I think we shouldn't disable the warning by default. If we ever need to silence the warning, let's do it explicitly.
Interestingly, Clang does actually already assume that, when you write Looks like this comes from the |
In the following loop, the compiler should be able to optimize out the assert.
See this link for a runnable example, and a longer discussion on why the invariant is true: https://godbolt.org/z/ad5P4d5M5
Also in that link is something interesting: Clang does figure out the invariant when we use integers instead of pointers! It just doesn't apply the same analysis to pointers for some reason.
This is the missing compiler piece needed to solve #101370.
(CC @ldionne @var-const @danakj)
The text was updated successfully, but these errors were encountered: