-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for AArch64 MTE memory tagging intrinsics #129010
Comments
These APIs as-is seem very dangerous to use from Rust, see the discussion here and in rust-lang/miri#3787. The docs for these intrinsics should provide guidance for how to use them correctly in Rust. |
Given that LLVM exposes intrinsics for each of these operations directly, is there as much concern with it tracking these changes incorrectly? I would expect LLVM to expose intrinsics that function correctly within its own model. Though, certainly, I could see an argument that these should perhaps operate on a TBIBox (or similar) in Rust, differing from the ACLE. |
That is, unfortunately, not a valid assumption. LLVM is often internally incoherent. For instance, they also expose operations for non-temporal stores but those do not behave properly in the LLVM memory model (Cc llvm/llvm-project#64521). They also expose intrinsics to set the floating-point status register, but it would be UB to actually change that register to anything else (unless special attributes are set on the surrounding code). LLVM often leaves it to their users to figure out which parts of LLVM can be used together correctly, and which cannot. |
Cc @rust-lang/opsem I am in particular concerned about docs like this
We have to make sure codegen backends understand what is going on here -- I am not sure where the provenance updates and realloc are happening. I also don't really understand what |
Yes, that's correct. The idea behind MTE is that we store a 4-bit tag in the top byte of every virtual memory address - that part is handled by the kernel. By default all of the tags are 0000, so all preexisting code already works with MTE. In this example,
Once tagged, if MTE is enabled for the process/thread, every access to that memory address has to be done with a pointer tagged with a matching tag. If the two don't match, the hardware will fault and the process will get a SIGSEGV. If it can be made to play nice with Rust, MTE could potentially be useful for extra pointer provenance checks. E.g. if we could guarantee that adjacent allocations will always have different memory tags, we could preserve provenance information across pointer-usize-pointer casts and still be able to tell the difference between two pointers with the same address but different provenance. For example:
|
The general problem is that in Rust (and in C), you can't just offset a pointer by |
Yes certainly, there's more to the story than just the hardware. The point is that if compilers can be convinced to play along, there are a lot of interesting use cases, benefits and implications for e.g. strict provenance that could be gained. For instance, if I print out the address of some Box I have, I get this |
Hardware doesn't get to just re-define what the "address" is. Rust (and C) have defined the address to be the entire representation of the pointer. I don't see any easy way to change this, the assumption is baked very deep into LLVM. (But maybe it's not so hard to change in LLVM -- I don't know, I am not an expert on LLVM internals. That would be a conversation to be had with the LLVM people. For the purpose of this discussion I will assume LLVM stays unchanged, until someone points at a design that makes LLVM Arguably I'd say that this is a case of hardware engineers not thinking about the fact that people generally write code in "high-level" languages (like C or Rust), not in assembly... Hardware can't just unilaterally change the rules and expect that to make sense. There are multiple complicated abstractions interacting here (the ISA and the surface language), and they have to be carefully designed in tandem. Unfortunately one part was now designed and shipped in silicon without considering the other part, so making the entire story work out nicely will be tricky.
If you print the pointer that is returned from |
True, but then I'd also say that it's the OS that should be defining what an address is. To take Linux as an example, Linux does not consider the top bits part of the address. The top bits are either Agreed on the LLVM side, I'm not an LLVM expert either but I will try to consult some about all this.
Oh yes for sure. This was not a "this proves my point" example, just a somewhat silly illustration, you're obviously right. The fundamental question here is indeed the one of "what constitutes an address". I agree with the broader approach of programming for the abstract machine rather than a real one, but in designing the abstract machine we also need to consider what it actually runs on. If the abstract machine runs on an OS that doesn't use the top byte which runs on an arch that doesn't use the top byte, then pretending as if the top byte is part of the address for the purposes of the abstract machine seems a little pointless. Even more so since these mechanisms are already widely used in practice. Every heap allocation on every Android phone that's been updated in the last couple of years already keeps a tag in the top byte: |
I'd say the language gets to define what values in the language mean. 🤷 Anyway it's kind of moot to discuss who is "supposed to" define this, the fact is that LLVM (and likely GCC) have defined this, and there are very good reasons for defining it the way they do that make it hard to change. We can disagree on whether we think this was a mistake or not, but it is the status quo.
If the tag is set by It is only changing the tag of an already created allocation that causes problems. |
(getting a bit off topic here, but) For a concrete example of an architecture where this is not the case: This is not allowed on x86_64, trying to access such a non-canonical address in almost any way causes a fault (which will likely be handled by the OS as a SIGSEGV or similar if not baremetal).
(from the Intel® 64 and IA-32 Architectures Software Developer’s Manual) You can tag pointers using these high bits on x86_64 (if you know your target machine has a small enough virtual address space), but the tag must be removed before using the pointer, and tagging as such is still a (wrapping) offset in Rust semantics IIUC. |
That is completely beside the point I was making, and while you're right about canonical addressing being a thing that does not make the top byte part of the address on x86_64. There are slight differences but x86_64 has its own TBI variants too, and having those enabled makes the cannonicality check you quoted behave differently than it normally does.
What I'm saying is that from the OS & arch side 64-bit addresses look more like this:
And so pretending from the language side that all 64 bits constitute an address is simply not correct as soon as your code runs on any machine. The top byte would only need to be used for addressing if we wanted to address more than 65536 TiB of memory, which is unlikely to happen anytime soon to say the least. |
I don't agree -- if setting that byte to the wrong value leads to a segfault, I would say it surely is part of the address. Unless you have what I would consider a somewhat odd definition of "address"... but as I said it's moot. All 64 bits are treated entirely uniformly. They must all have the exact right value to make the access valid. Whether you call the highest bits "not address but must be zero" or "part of the address" makes no difference at all, so let's not waste time debating that point. The kind of pointer tagging where all accesses to a heap allocation use the exact same high bits are completely compatible with this. For Rust, those high bits are "part of the address"; we can invent new terminology for this if you insist but it doesn't make a difference. The kind of pointer tagging where the allocation "moves around" the 64bit address space (because the high bits change) is not compatible with the LLVM and Rust memory model. They need to be exposed with a |
Except that whether it does or does not is up to the system the code runs on. If you have TBI/UAI/LAM enabled, you can set it to whatever you want and the hardware/OS will not care, because the actual address part of the pointer has not changed. I suppose my issue here is that coming at this from the assumption that an address is 64 bits quickly leads to contradictions and behaviour that makes no sense. The address space of the vast majority of systems is 256 TiB. If I set bit 56 to 1, I get an 'address' which would be in the 64th PiB of memory. That is simply outside the address space. You cannot access that memory address because such an address does not exist, the OS doesn't have it and the CPU will fault if you try. If the abstraction of a 64-bit address space was true, you'd be able to take an address like
We can do the realloc hack for sure, I'm just trying to explore this a bit more because it seems to me that it is just that - an ugly hack to paper over the compiler incorrectly modelling the platforms the code actually runs on. I think talking about the allocation "moving around" when you change the high bits is just inaccurate because there is no address space there to move around in. Essentially, if your OS provides 256 TiB of virtual memory but the correctness of your compiler relies on the assumption that a given allocation has been allocated in the 64th PiB of virtual memory, I just think that assumption is wrong. I understand why it's there, I understand it's much easier to assume that the leading 0s are actually just part of the address instead of blank metadata, but doing so leads to problems like this as soon as the hardware & the OS try to make use of those bits. Which is what those bits are there for. |
That would be true if that memory was allocated. But it's not. 0x0000ffffffffffff+1 behaves just like every other address that is not currently allocated. Just because on Linux that address will never be allocated (as far as we think today), is not sufficient justification for treating it fundamentally differently. But as I keeps saying, this is a pointless attempt at re-defining certain terms, without changing any of the fundamental facts. The underlying problem is: having distinct addresses (or whatever you want to call the 64bit thing that is the input to a load/store operation) all access the same memory changes some fundamental properties of memory. Ignoring 4 bits of the 64bit address is basically equivalent to having the same pages mapped 2^4 times in different parts of memory. Changing the tag of a pointer is equivalent to doing pointer arithmetic between these different "mirrors". If compilers were written under the assumption that all memory can have such mirrors, that would make them worse at their job of optimizing code for the common case where no such mirrors exist. Therefore basically all optimizing compilers make the very reasonable assumption that memory they work on is mapped only once, and special care is needed if you violate that assumption. Which mechanism you use to violate that assumption (mmap'ing the same page multiple times, or instructing the hardware to ignore some bits of the "address") is entirely irrelevant.
I would say the ugly hack here is on the hardware side, by having it ignore parts of the input. But I guess we won't come to an agreement on this and it doesn't really matter for this discussion anyway. 🤷
If you remap the same physical pages elsewhere in virtual memory, do they "move"? You could argue either way. This is a similar situation. I can see your perspective, but please don't insist on it being the only perspective.
That's not the assumption compilers are making. See the paragraph above for what the actual assumption is. (I touched on this before when I said that the key thing is that the bits must all be fixed, not that they must be 0.) OSes change how much virtual memory they provide -- Linux switched from 48bit to 56bit at some point in the not-too-distant past. It's a good thing that we didn't hard-code any assumption like that into our compilers.
No, I don't think you can just unilaterally claim "ownership" of those bits here. |
For sure, I can see your perspective as well! I don't think one is particularly more valid than the other, it just depends on whether we start off from the hardware & OS side or from the language side of things. I think we get the best results by doing both and meeting somewhere in the middle, which is what these kinds of discussions are great for :)
I mean that in the sense that hardware across multiple architectures was designed in a way that does not make use of those bits by default, with the aim of using them for something in the future. For the sake of argument & from a purely practical standpoint, what's stopping Rust (or I guess more specifically LLVM) from adopting this alternative view of what a memory address is? It seems to me that all the aliasing and mirroring problems you list are only problems if the compiler accounts for all 64 bits, as opposed to effectively masking out the top ones before considering it as an address. If the compiler did that, then suddenly it's not "two mirrored addresses" but "the same address" (maybe with a tag but irrelevant) which matches the underlying platform the code will run on much better. |
Just in case it's not clear from the tone of the discussion, I do agree with you on what the current approach of making TBI/MTE work within the current Rust/LLVM memory model should be. Just trying to explore if there's more that could be done but that's more of an academic discussion rather than an actual immediate proposal :) |
LLVM currently assumes that if you do I don't know how big the impact on performance would be that this loss of alias information has, but it would surely be non-trivial to even figure out all the places where the compiler makes this assumption. It's also quite bad that the underlying behavior here becomes so non-portable; generally it is a goal of Rust to make program semantics consistent across targets. That is one reason why we don't expose the x86 or ARM concurrency models, but instead have our own language-level concurrency model (specifically the one from C++) -- people generally don't want to write a version of their concurrency algorithms for each architecture. But here we'd have to say something like "if you offset your pointer by To me as a language person, a realloc-like API actually seems like a pretty nice way to expose these hardware features. I guess what is and is not a hack is in the eye of the beholder. ;) |
Yeah that makes sense, this being incompatible with alias analysis as it currently stands is pretty unfortunate but most likely not really fixable in practice, as you said, who knows how many assumptions compilers make about this and where. I suppose in practice not being able to support "full TBI" is probably not that much of an issue. Hardly any use-cases will want to change the tag after the allocation, and for the FFI-related ones that do we can provide the TBIBox to do reallocs and make things work under the hood without messing with the memory model. If we eventually want to make more extensive use of pointer tagging in Rust (like for pointer provenance checks), we can always look into tagging pointers when the memory is allocated in the same way that Android use-cases currently do, then it's still fine in the current memory model as you said. Thanks for the discussion, I at least found it very informative! :) |
API-wise I think I'd prefer if we had a raw pointer API for these reallocs exposed as a primitive, and then potentially TbiBox built on top of that (or that could already be done in a user crate). |
Good idea for sure, agreed! |
Feature gate:
#![feature(stdarch_aarch64_mte)]
This is a tracking issue for AArch64 MTE memory tagging intrinsics.
Public API
Steps / History
Unresolved Questions
Footnotes
https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩
The text was updated successfully, but these errors were encountered: