Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Integers V2: It's Time #3686

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

clarfonthey
Copy link
Contributor

@clarfonthey clarfonthey commented Sep 1, 2024

Summary

Adds the builtin types u<N> and i<N>, allowing integers with an arbitrary size in bits.

Rendered

Details

This is a follow-up to #2581, which was previously postponed. A lot has happened since then, and there has been general support for this change from a lot of different people. It's time.

There are a few key differences from the previous RFC, but I trust that you can read.

Thanks

Thank you to everyone who responded to the pre-RFC on Internals with feedback.

This reverts commit 25f85cc105cb04b4e87debf46f4547240c122ae4.
@ehuss ehuss added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-types Relevant to the types team, which will review and decide on the RFC. labels Sep 2, 2024
@jhpratt
Copy link
Member

jhpratt commented Sep 5, 2024

As much as I dislike as casts and would prefer a better solution, until that solution exists (in the standard library), this is probably the best way to go.

👍 from me

@Alonely0
Copy link

Alonely0 commented Sep 5, 2024

Even if we should probably leave them out of the initial RFC for complexity reasons, I would just cheat with floats, as they rely on system libraries and hardware instructions way more than regular integers. By that, I mean that I'd allow f<32> for consistency reasons, but only those that are actually supported would compile; i.e., f<3> would throw a compile-time error (it could either be done at monomorphisation time, or disallowing const generics on that one). Overall, I think this RFC is in the right track, but I'd postpone it until we're past Rust 2024.

@matthieu-m
Copy link

Overall, I think this RFC is in the right track, but I'd postpone it until we're past Rust 2024.

Are you proposing delaying the discussion or the implementation?

My understanding is that with a release early 2025, Rust 2024 will be done by mid November, which is only 2 months away, and it seems quite unlikely this RFC would be accepted and implementation ready to start by then, so I see no conflict with regard to starting on the implementation...

... but I could understand a focus on the edition for the next 2 months, and thus less bandwidth available for discussing RFCs.

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Sep 5, 2024

Even if we should probably leave them out of the initial RFC for complexity reasons, I would just cheat with floats, as they rely on system libraries and hardware instructions way more than regular integers. By that, I mean that I'd allow f<32> for consistency reasons, but only those that are actually supported would compile; i.e., f<3> would throw a compile-time error (it could either be done at monomorphisation time, or disallowing const generics on that one).

The problem with this approach is that any "cheating" becomes permanently stabilised, and thus, it's worth putting in some thought for the design. This isn't to say that f<N> is a bad design (I personally don't like it, but I won't fault people for wanting to use it), but rather that u<N> and i<N> are good designs in several ways that f<N> is not.

Plus, monomorphisation-time errors were actually one of the big downsides to the original RFC, and I suspect that people haven't really changed their thoughts since then. Effectively, while it's okay to allow some of edge-case monomorphisation-time errors like this RFC includes (for example, asking for u<0xFFFF_FFFF_FFFF> is a hard error, since it's larger than u32::MAX), but not extremely-common errors like just asking for f<N> where N is anything that isn't 16, 32, 64, or 128.

One potential solution that was proposed for unifying u<N>, i<N>, usize, and isize was to have some separate ADT that encapsulates signedness and has different options for "size" and N. This kind of solution feels promising for generic floats since it means that you could have an impl like:

impl<const F: FloatKind> MyTrait for f<F> {
    // ...
}

And it would support all float types, forever, and there would be no invalid values for F since we've explicitly defined it. However, this requires const_adt_params which is currently unstable.

Overall, I think this RFC is in the right track, but I'd postpone it until we're past Rust 2024.

As stated: yes, RFCs take time to discuss and implement and it's very reasonable to expect people to focus on the 2024 edition for now. However, that doesn't mean that we can't discuss this now, especially since there are bound to be things that were missed that would be good to point out.


In general, operations on `u<N>` and `i<N>` should work the same as they do for existing integer types, although the compiler may need to special-case `N = 0` and `N = 1` if they're not supported by the backend.

When stored, `u<N>` should always zero-extend to the size of the type and `i<N>` should always sign-extend. This means that any padding bits for `u<N>` can be expected to be zero, but padding bits for `i<N>` may be either all-zero or all-one depending on the sign.
Copy link
Member

@RalfJung RalfJung Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify this to say what exactly happens when I transmute e.g. 255u8 to u<7> (and similar to i<N>). I assume it is UB, i.e., the validity invariant of these types says that the remaining bits are zero-extended / sign-extended, but the RFC should make that explicit.

Note that calling this "padding" might be confusing since "padding" in structs is uninitialized, but here padding would be defined to always have very specific values. (That would, e.g. allow, it to be used as a niche for enum optimizations.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not quite sure what a better name is; it's the same as rustc_layout_scalar_valid_range, which is UB if the bits are invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that since this is the reference description, calling them niche bits would be more appropriate? Would that feel reasonable?

Copy link
Member

@RalfJung RalfJung Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Niche bits are an implementation detail of the enum layout algorithm, and mostly not stable nor documented.

Just describe what the valid representations of values of these type are, i.e., what should go into this section about these types.


The compiler should be allowed to restrict `N` even further, maybe even as low as `u16::MAX`, due to other restrictions that may apply. For example, the LLVM backend currently only allows integers with widths up to `u<23>::MAX` (not a typo; 23, not 32). On 16-bit targets, using `usize` further restricts these integers to `u16::MAX` bits.

While `N` could be a `u32` instead of `usize`, keeping it at `usize` makes things slightly more natural when converting bits to array lengths and other length-generics, and these quite high cutoff points are seen as acceptable. In particular, this helps using `N` for an array index until [`generic_const_exprs`] is stabilized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "using N for an array length", I assume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


As an example, someone might end up using `u<7>` for a percent since it allows fewer extraneous values (`101..=127`) than `u<8>` (`101..=255`), although this actually just overcomplicates the code for little benefit, and may even make the performance worse.

Overall, things have changed dramatically since [the last time this RFC was submitted][#2581]. Back then, const generics weren't even implemented in the compiler yet, but now, they're used throughout the Rust ecosystem. Additionally, it's clear that LLVM definitely supports generic integers to a reasonable extent, and languages like [Zig] and even [C][`_BitInt`] have implemented them. A lot of people think it's time to start considering them for real.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say Zig has generic integers, it seems like they have arbitrarily-sized integers. Or is it possible to write code that is generic over the integer size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well actually you can

const std = @import("std");

fn U(comptime bits: u16) type {
    return @Type(std.builtin.Type {
        .Int = std.builtin.Type.Int {
            .signedness = std.builtin.Signedness.unsigned,
            .bits = bits,
        },
    });
}

pub fn main() !void {
    const a: U(2) = 1;
    const b: U(2) = 3;
    // const c: U(2) = 5; // error: type 'u2' cannot represent integer value '5'
    const d = std.math.maxInt(U(147));
    std.debug.print("a={}, b={}, d={}", .{ a, b, d });
    // a=1, b=3, d=178405961588244985132285746181186892047843327
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that example is satisfactory enough, @RalfJung? Not really sure if it's worth the effort to clarify explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, neat.

C and LLVM only have concrete-width integers though, I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, C doesn't have generic anything, so, I guess you're right. Not 100% sure the distinction is worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang adds _BitInt to C++ as an extension and the number of bits can be generic: template <size_t N> void example(_BitInt(N) a); will deduce N but it only works on the actual _BitInt types, not just any signed integer type.

@diondokter
Copy link

diondokter commented Sep 6, 2024

I love this!

One point that is touched upon here is aliases for uN <=> u<N>.

I think that'd be super valuable to have. Rust already has a lot of symbols and being able to not use the angle brackets makes sure that the code is much calmer to look upon. It's also not the first explicit syntax sugar since an async fn is treated the same as fn -> impl Future in a lot of places.

Having the aliases also allows for this while keeping everything consistent:

fn foo<const N: usize>(my_num: u<N>) { ... }

foo(123); // What is the bit width? u32 by default?
foo(123u7); // Fixed it

@clarfonthey
Copy link
Contributor Author

I love this!

One point that is touched upon here is aliases for uN <=> u<N>.

I think that'd be super valuable to have. Rust already has a lot of symbols and being able to not use the angle brackets makes sure that the code is much calmer to look upon. It's also not the first explicit syntax sugar since an async fn is treated the same as fn -> impl Future in a lot of places.

I agree with you, just didn't want to require them for the initial RFC, since I wanted to keef it simple. Ideally, the language will support uN aliases as well as uN suffixes.

Copy link

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the last RFC was postponed, the stated reason was waiting for pure library solutions to emerge and letting the experience with those inform the design. I don't really see much of this in the current RFC, so here's a bunch of questions about it. It would also be great if some non-obvious design aspects of the RFC (such as limits on N, whether and how post-monomorphization errors work, padding, alignment, etc.) could be justified with experience from such libraries.


This was the main proposal last time this RFC rolled around, and as we've seen, it hasn't really worked.

Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, bounded-integer and intx only provide subsets of the native types up to {i,u}128, not arbitrarily large fixed-size integers. The u crate seems to be about something else entirely, did you mean to link something different there?

So where are the libraries that even try to do what this RFC proposes: arbitrary number of bits, driven by const generics? I've searched and found ruint, which appears relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely seems like a good option to add to the list. I had trouble finding them, so, I appreciate it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a mention of https://crates.io/crates/arbitrary-int, which is (I think) the closest in design to this rfc


Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems:

* None of these libraries can easily unify with the existing `uN` and `iN` types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A const-generic library type can't provide this and also can't support literals. But what problems exactly does that cause in practice? Which aspects can be handled well with existing language features and which ones really need language support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC already mentions how being able to provide a small number of generic impls that cover all integer types has an extremely large benefit over being forced to use macros to implement for all of them individually. You cannot do this without language support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this bullet point is "only" about impls like impl<const BITS: usize> Foo for some_library::Int<BITS> { ... } not implementing anything for the primitive integer types? Could From impls and some form of delegation (#3530) also help with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, and this is mentioned in the RFC also. That's 5 impls for unsigned, 5 impls for signed that could just be 2 impls, whether you have delegation or not. Even for simple traits, like Default, you're incentivised to use a macro just because it becomes so cumbersome.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrary-int provides a unification somewhat using its Number trait. It's somewhat rudimentary but I am working on improving it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this again, the Number trait fulfills a somewhat different role though. It allows writing generic code against any Number (be it an arbitrary-int or a native int), but it does not expose the bits itself - which can be a plus or a minus, depending on what you're building.

Crates like [`u`], [`bounded-integer`], and [`intx`] exist, but they come with their own host of problems:

* None of these libraries can easily unify with the existing `uN` and `iN` types.
* Generally, they require a lot of unsafe code to work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of unsafe code, and for what purposes? And is that sufficient reason to extend the language? Usually, if it's something that can be hidden behind a safe abstraction once and for all, then it seems secondary whether that unsafety lives on crates.io, in sysroot crates, or in the functional correctness of the compiler backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the unsafe code is stuff similar to the bounded-integer crate, where integers are represented using enums and transmuted from primitives. The casting to primitives is safe, but not the transmuting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really all? Because that seems trivial to encapsulate without affecting the API, and likely to be solved by any future feature that makes it easier to opt into niche optimizations (e.g., pattern types).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's easy to encapsulate, but I think it's worth mentioning that unsafe code is involved as a negative because it means many code bases will be more apprehensive to use it.

You are right that it could easily be improved, though, with more compiler features. I just can't imagine it ever being on par with the performance of a compiler-supported version, both at runtime and compile time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arbitrary-int works without unsafe code (with the exception of the optional function new_unchecked which skips the bounds check)


* None of these libraries can easily unify with the existing `uN` and `iN` types.
* Generally, they require a lot of unsafe code to work.
* These representations tend to be slower and less-optimized than compiler-generated versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any data on what's slower and why? Are there any lower-stakes ways to fix these performance issues by, for example, adding/stabilizing suitable helper functions (like rust-lang/rust#85532) or adding more peephole optimizations in MIR and/or LLVM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main source of slowdown is from using enums to take advantage of niche optimisations; having an enum with a large number of variants to represent this niche is pretty slow to compile, even though most of the resulting code ends up as no-ops after optimisations.

I definitely should mention that I meant slow to compile here, not slow to run. Any library solution can be made fast to run, but will generally suffer in compile time when these features are effectively already supported by the compiler backends, mostly for free.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any compile time issue when not trying to provide niches? Out of the potential use cases the RFC lists, only a couple seem to really care about niche optimizations. In particular, I don't expect that it typically matters for integers larger than 128 bits. (But again, surveying the real ecosystem would help!) If so, the compile time problem for crates like bounded-integer could be addressed more directly by stabilizing a proper way to directly opt into niches instead of having to abuse enums. And that would help with any bounds, while this RFC (without future possibilities) would not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would expect some negative compile-time impact from repeatedly monomorphizing code that's const-generics over bit width or bounds. But that's sort of inherent in having lots of code that is generic in this way, so it's no worse for third party libraries than for something built-in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very fair; I agree that we should have an ability to opt into niches regardless. I guess that my reasoning here is pretty lackluster because I felt that the other reasons to have this feature were strong enough that this argument wasn't worth arguing, although you're right that I should actually put a proper argument for it.

From what I've seen, of the use cases for generic integers:

  1. Generalising primitives
  2. Between-primitives integer types (like u<7> and u<48>)
  3. Larger-than-primitives integer types

For 1, basically no library solution can work, so, that's off the table. For 2, which is mostly the subject of discussion here, you're right that it could probably be improved a lot with existing support. And for 3, most people just don't find the need to make generalised code for their use cases, and just explicitly implement, say, u256 themselves with the few operations they need.

The main argument IMHO is that we can effectively knock out all three of these options easily with generic integers supported by the language, and they would be efficient and optimized by the compiler. We can definitely whittle down the issues with 2 and 3 as we add more support, but the point is that we don't need to if we add in generic integers.

Although, I really need to solidify this argument, because folks like you aren't 100% convinced, and I think that the feedback has been pretty valuable.

Copy link

@hanna-kruppe hanna-kruppe Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I appreciate that you're trying to tackle a lot of different problems with a unifying mechanism. I focus on each problem separately because I want to tease out how much value the unifying mechanism adds for each of them, compared to smaller, more incremental additions that may be useful and/or necessary in any case. Only when that's done I feel like I can form an opinion on whether this relatively large feature seems worth it overall.

* None of these libraries can easily unify with the existing `uN` and `iN` types.
* Generally, they require a lot of unsafe code to work.
* These representations tend to be slower and less-optimized than compiler-generated versions.
* They still require you to generalise integer types with macros instead of const generics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the problem here. If a library provides struct Int<const BITS: usize>(...); then code using this library shouldn't need macros to interact with it (except, perhaps, as workaround for current gaps in const generics). The library itself would have a bunch of impls relating its types to the language primitives, which may be generated with macros. But that doesn't seem like such a drastic problem, if it's constrained to the innards of one library, or a few competing libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your argument. No matter what, a library solution cannot be both generic and unify with the standard library types. I don't see a path forward that would allow, for example, some library Uint<N> type to allow Uint<8> being an alias for u8 while also supporting arbitrary Uint<N>. Even with specialisation, I can't imagine a sound subset of specialisation allowing this to work.

Like, sure, a set of libraries can choose to only use these types instead of the primitives, circumventing the problem. But most people will want to implement their traits for primitives for interoperability.

Copy link

@hanna-kruppe hanna-kruppe Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps a bit with the bullet point about unification, but I do think it depends a lot on what one is doing. For example, the num-traits crate defines traits that it needs to implement for the primitive types. On the other hand, any code that's currently written against the traits from num-traits may be happy with a third party library that provides Int<N> and Uint<N> and implements the relevant traits for them. And for something like bit fields, you may not need much generalization over primitive types at all: in the MipsInstruction example, you probably want some widening and narrowing conversions, but only with respect to u32 specifically.

It's hard to form an opinion about how common these scenarios are (and whether there are other nuances) without having a corpus of "real" code to look at. Experience reports (including negative ones) with crates like num-traits and bounded-integer may be more useful than discussing it in the abstract.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 7, 2024

Two things that came to mind:

  1. Are there any issues with the self-referentiality of these types? Although usize is a distinct type, one could easily imagine wanting to make it a "new-type wrapper" around the appropriate integer type, which would make a circular dependency between the two implementations. We could say that usize is not implemented that way, but then it's surprising to me that usize would be the "foundation" rather than the other way around.
  2. Even though LLVM can express integers of arbitrary size, it seems unlikely that these types have seen extensive use with unusual sizes. Maybe these integer types should be lowered to common integer types within rustc, so that backends can be simplified.

@clarfonthey
Copy link
Contributor Author

When the last RFC was postponed, the stated reason was waiting for pure library solutions to emerge and letting the experience with those inform the design. I don't really see much of this in the current RFC, so here's a bunch of questions about it. It would also be great if some non-obvious design aspects of the RFC (such as limits on N, whether and how post-monomorphization errors work, padding, alignment, etc.) could be justified with experience from such libraries.

So, I agree that this was one of the reasons, but it's worth reiterating that also, at that time, const generics weren't even stable. We had no idea what the larger ecosystem would choose to do with them, considering how many people were waiting for stabilisation to really start using them. (We had an idea of what was possible, but not what would feel most ergonomic for APIs, etc.)

So, I personally felt that the library solution idea was mostly due to that fact that we didn't really know what libraries would do with const generics. And, overwhelmingly, there hasn't been much interest in it for what I believe to be the most compelling use case: generalising APIs without using macros, which right now cannot really be done without language support.

* `From` and `TryFrom` implementations (requires const-generic bounds)
* `from_*e_bytes` and `to_*e_bytes` methods (requires [`generic_const_exprs`])

Currently, the LLVM backend already supports generic integers (you can refer to `iN` and `uN` as much as you want), although other backends may need additional code to work with generic integers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to emphasize here: getting u128 to work was a huge endeavour, and bigger ones will be even harder for things like division -- even for 128-bit it calls out to a specific symbol for that.

Embarassingly-parallel things like BitAnd or count_ones are really easy to support for bigger widths, but other things might be extremely difficult, so it might be worth exploring what it would look like to allow those only for N ≤ 128 or something, initially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to emphasize here: getting u128 to work was a huge endeavour, and bigger ones will be even harder for things like division -- even for 128-bit it calls out to a specific symbol for that.

LLVM has a pass specifically for expanding large divisions into a loop that doesn't use a libcall, so that shouldn't really be an issue though libcalls can still be added if you want something faster: llvm/llvm-project@3e39b27

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as part of clang gaining support for _BitInt(N) where N > 128, basically all the work to make it work has already been done in LLVM. Div/Rem was the last missing piece and that was added in 2022.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang still limit _BitInt(N) to N <= 128 on quite a few targets: https://gcc.godbolt.org/z/8P3sMjavs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that's merely because they haven't got around to defining the ABI, but it all works afaict: https://llvm.godbolt.org/z/88K3ox7bh

Copy link
Contributor Author

@clarfonthey clarfonthey Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning that having N > 128 be a post-monomorphisation error was seen as one of the biggest downsides to the previous RFC, and that this would cause more of a headache than just trying to make it work in general. [citation needed]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good start. However, as long as Clang isn't shipping it, the people working on Clang aren't discovering and fixing any bugs specific to those platforms. The div/rem lowering happens in LLVM IR so it's hopefully pretty target-independent, but most other operations are still legalized later in the backends. That includes the operations the div/rem lowering relies on, but also any other LLVM intrinsics that the standard library uses or may want to use in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of relying a lot on the fact that even though not everyone is using _BitInt(N) right now, by the time we actually would be stabilising this RFC, LLVM would be a lot more robust in that regard. Kind of a role reversal from what happened with 128-bit integers: back then, we were really pushing LLVM to have better support, and C benefited from that, but now, C pushing LLVM to have better support will benefit Rust instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, this can be revisited later, but note that there's no guarantee that Clang will ever support _BitInt(129) or larger on any particular target. The C standard only requires BITINT_MAXWIDTH >= ULLONG_WIDTH. If some target keeps it at 128 for long enough, it could become entrenched enough that nobody wants to risk increasing it (e.g., imagine people putting stuff like char bits[BITINT_MAXWIDTH / 8]; in some headers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had no idea that was how the standard worked, but I shouldn't really be surprised, considering how it's C. :/

@danlehmann
Copy link

danlehmann commented Sep 11, 2024

Hey, I'm the author of https://crates.io/crates/arbitrary-int . It seems like this proposal has some overlap with what I've built as a crate, so I can talk a bit about the hurdles I've run into.

Arbitrary-int has a generic type UInt<T, const BITS: usize>. T refers to the underlying type, which is usually the next larger native integer.

It also provides types to shorten the name. For example u3 becomes a shortcut for UInt<u8, 3>, u126 is a shortcut for UInt<u128, 126>. The types all pick the next largest built-in type, though it's also possible to specify this directly through e.g. UInt<u32, 6>.

It also provides a Number trait which unifies arbitrary-ints with regular ints, allowing completely generic programming over all of those types.

In general, implementing this as a crate worked pretty well, but there are some downsides:

  • Implementing From and TryFrom: This can't be done with the current type system. Once you implement From, the standard lib automatically adds TryFrom; I tried many ways of limiting this based on the size of the BITS involved but the type system always complains about multiple implementation. I think the specialization feature would allow me to address that, but it's unstable.
  • Lack of const generics: The other issue with From is that it's a trait which can't be used in const contexts; to work around that, arbitrary-int provides another function new which IS const and which allows creating a type directly.
  • No special syntax: As it's not a built-in feature, I can't provide custom suffixes like 1u4. So the shortest ways to create a number is generally u4::new(1). Not horrible, but not quite native.

[#2581]: https://github.com/rust-lang/rfcs/pull/2581
[Zig]: https://ziglang.org/documentation/master/#Primitive-Types

# Rationale and alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the biggest reason to go this way is the coherence possibilities. I'd propose something like

Suggested change
# Rationale and alternatives
# Rationale and alternatives
## Coherence
One problem with other ways of doing this is that anything trait-based will run afoul of coherence in user code.
For example, if I tried to `impl<T> MyTrait for T where T: UnsignedInteger`, then it takes extra coherence logic -- which doesn't yet exist -- to also allow implementing `MyTrait` for other things. And this is worse if you want blankets for both `T: SignedInteger` and `T: UnsignedInteger` -- which would need like mutually-exclusive traits or similar.
When user code does
```
impl<const N: u32> MyTrait for u<n> { … }
impl<const N: u32> MyTrait for i<n> { … }
```
those are already-distinct types to coherence, no different from implementing a trait for both `Vec<T>` and `VecDeque<T>`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would likely go in the motivation section rather than the rationale section, but I agree with you that this is a good argument to mention. Will have to ponder where exactly it fits in the RFC.

@danlehmann
Copy link

Hey, I'm the author of https://crates.io/crates/arbitrary-int . It seems like this proposal is very close to what I've built as a crate (I think arbitrary-int is also the closest to this rfc), so I can talk a bit about the hurdles I've run into.

Arbitrary-int has a generic type UInt<T, const BITS: usize>. T refers to the underlying type, which is usually the next larger native integer.

It also provides types to shorten the name. For example u3 becomes a shortcut for UInt<u8, 3>, u126 is a shortcut for UInt<u128, 126>. The types all pick the next largest built-in type, though it's also possible to specify this directly through e.g. UInt<u32, 6>.

It also provides a Number trait which unifies arbitrary-ints with regular ints, allowing completely generic programming over all of those types.

In general, implementing this as a crate worked pretty well, but there are some downsides:

* Implementing `From` and `TryFrom`: This can't be done with the current type system. Once you implement `From`, the standard lib automatically adds `TryFrom`; I tried many ways of limiting this based on the size of the BITS involved but the type system always complains about multiple implementation. I think the `specialization` feature would allow me to address that, but it's unstable.

* Lack of const generics: The other issue with `From` is that it's a trait which can't be used in const contexts; to work around that, arbitrary-int provides another function `new` which IS const and which allows creating a type directly.

* No special syntax: As it's not a built-in feature, I can't provide custom suffixes like `1u4`. So the shortest ways to create a number is generally `u4::new(1)`. Not horrible, but not quite native.

Also due to my design decision to base everything on a simple types (no arrays), the maximum number of bits supported is u127.

@clarfonthey
Copy link
Contributor Author

Also due to my design decision to base everything on a simple types (no arrays), the maximum number of bits supported is u127.

I hadn't actually read the code yet, but I'm actually a bit curious why the max number of bits is 127 instead of 128. This feels like a weird restriction.

@danlehmann
Copy link

Also due to my design decision to base everything on a simple types (no arrays), the maximum number of bits supported is u127.

I hadn't actually read the code yet, but I'm actually a bit curious why the max number of bits is 127 instead of 128. This feels like a weird restriction.

It is 128 bits actually. UInt<u128, 128> works just fine (though it is just somewhat useless as you might as well use the actual u128). My main point was that larger numbers aren't possible, unlike it e.g. ruint, which operates on arrays.

@danlehmann
Copy link

By the way, I love this RFC! While arbitrary-int (as well as ux) provide the unusually-sized ints like u48 etc, having a built-in solution will feel more natural and allows to treat numbers in a much more unified fashion, which I'm looking forward to.

}
```

That's a lot better. Now, as you'll notice, we still have to cover the types `usize` and `isize` separately; that's because they're still separate from the `u<N>` and `i<N>` types. If you think about it, this has always been the case before generic integers; for example, on a 64-bit system, `u64` is not the same as `usize`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can certainly come after this RFC, but this could be made much more ergonomic by adding the following APIs:

impl usize {
    fn to_bits(self) -> u<Self::BITS>;
    fn from_bits(bits: u<Self::BITS>) -> Self;
}
impl isize {
    fn to_bits(self) -> i<Self::BITS>;
    fn from_bits(bits: i<Self::BITS>) -> Self;
}

So this is definitely not a problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now I reälize that won’t work, because usize::BITS is a u32. But then again, it might be helpful to have ::BITS_USIZE constants anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also won't work until generic const args are stable, since associated consts aren't allowed in const generics at the moment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usize and isize could have an associated type alias for the equivalent u<N>/i<N>, and to_bits/from_bits could reference that type alias.

Copy link
Member

@programmerjake programmerjake Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also won't work until generic const args are stable, since associated consts aren't allowed in const generics at the moment.

no, it works fine since there are no generics in the const expression. what doesn't work is struct S<T: Tr>(S2<{ T::ASSOC_CONST }>);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now I reälize that won’t work, because usize::BITS is a u32. But then again, it might be helpful to have ::BITS_USIZE constants anyway.

I still think that these should be u<const BITS: u32> because of this. Once things like that are allowed, I want u<{FOO.ilog2_ceil()}> to just work, not need casts.

(We shouldn't make things worse forever for a minor mostly-irrelevant convenience today, since u<const BITS: usize> still doesn't even fix to_ne_bytes and such.)

Copy link
Contributor Author

@clarfonthey clarfonthey Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I changed my mind, I had just forgotten all the justification for choosing usize over u32, which is why I felt amenable to changing it at the time.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of why I wanted usize as the bit width parameter type is that it's the same as arrays, and that trying to share a const generic between a u32 bit width and an array size is basically impossible until we get casting in const generic expressions. e.g.:

pub const fn to_binary<const N: usize>(v: u<N>, buf: &mut [u8; N]) -> &str {
    let mut i = 0;
    while i < N {
        buf[i] = if (v >> (N - i - 1)) & 1 == 1 { b'1' } else { b'0' };
        i += 1;
    }
    str::from_utf8(buf).unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do you need an array with one element per bit? If the array length has any other relationship with the number of bits (e.g., adding +2 for a 0b prefix, or the aforementioned to_le_bytes and friends), then you still need const generics expressions. And casts are possibly less problematic than most other operations, which can fail and therefore imply more post-mono errors or need some solution for propagating bounds like “N > 0”.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied here

@programmerjake
Copy link
Member

(replying to #3686 (comment) here so it's less likely to get lost when that thread gets resolved)

How often do you need an array with one element per bit?

when doing SIMD with bitmasks -- all the time. using Mask<T, N> and Simd<T, N> will be more common, but using [T; N] and u<N> will still be decently common. Mask<T, N> on many architectures is just a wrapper over u<N> (currently done with macros for a limited set of N), and Simd<T, N> is basically the same as [T; N] except with perhaps more alignment and better codegen.

In particular this is a big motivation for using usize for N in u<N>, since that's what all of Simd<T, N>, Mask<T, N>, and [T; N] need.

@hanna-kruppe
Copy link

I am not super familiar with core::simd, so please correct me if I misunderstood something. I understand that the N in Simd<T, N> and Mask<T, N> needs to be usize because both often occur in type signatures together and Simd<T, N> is closely tied to [T; N] in various ways. But mixing those types with u<N> seems much more niche, for reasons that I think are pretty fundamental.

From looking at the current implementation it seems that Mask<T, N> is a wrapper for Simd<T, N> on every target except x86 with AVX-512. That's what I would expect, since SIMD masks on most architectures really want to stay in SIMD registers or in dedicated mask registers when they exist -- and the latter are more like <i1 x N> than uN. Even AVX-512 has the k-regs, but for various reasons it's common to pass masks around as integers. Punning between masks and integers is sometimes useful, but it's not free (even on x86, even more so in the rest of the computing world) and should only be done very deliberately. That's one reason why Mask<T, N> and its equivalents in libraries like memchr and hashbrown have a lot of helper methods instead of just conversions to/from integers.

As long as masks are a separate type with platform-dependent representation, it seems to me that the interaction with u<N> is only skin-deep. It could simplify a few functions in the public API, but otherwise masks will still revolve around a cfg'd newtype or associated type. Users who want abstract over lane count will also be limited in how much they can equivocate between masks and integers: they may want a method like fn to_bits(self) -> u<N> on Mask<T, N> but it's not clear to me whether a return type like u<{N as u32}> would be much worse for them. Ideally there would be no casts anywhere, but as @scottmcm points out, choosing usize also requires casts for some reasonable use cases.

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Oct 4, 2024

Honestly, if the arguments for usize and u32 are equally compelling, I personally think the tie should be broken toward usize, for the simple reason that it is effectively the default type for generic constants. The number of bits makes sense as an unsigned size, and that's the full reasoning.

I do think that it would be worthwhile to choose one over the other if there actually were a lot of cases where it were valuable, and SIMD does feel compelling enough to be valuable, but honestly, it's kind of tied at this point.

I personally think that choosing u32 for a lot of the integer methods that involve bits was a mistake, but we're stuck with it for better or worse, and it feels like no matter what we do, there is going to be some inconsistency somewhere. So, might as well conceptually treat it as an "array of bits" IMHO.

Like realistically, if we were choosing an appropriate size for the count of bits, we should have chosen u16 instead of u32, since more than 8KiB in a single integer that isn't a big-integer feels kind of absurd. But again, hindsight doesn't really matter when deciding this; we just have to figure out a compromise.

@hanna-kruppe
Copy link

Using u16 would have the advantage that conversion to u32 and to usize is always lossless while the opposite direction isn’t. usize::from(x) isn’t a const expression today but the monomorphic case could be allowed without touching the whole “maybe-const trait bounds” subject. It’s still too wordy to seriously propose (without adding implicit widening, which is a whole other can of worms). But if we ever get usize: From<u32> then that would be a point in favor of u32 in my opinion.

Wild idea: is there anything forcing the u<N> and i<N> constructors to pick a single type for N? If they’re mostly compiler magic anyway, couldn’t they accept any unsigned integer? For example, can u<7u32> and u<7usize> be the same type? That might resolve most (not all) clashes of u32 vs usize in const generic APIs.

@hanna-kruppe
Copy link

hanna-kruppe commented Oct 4, 2024

Nevermind that last bit, the lang item might not have to pick a type for N, but all the impl blocks involving those types have to pick a type for their const generic parameter, so fn foo<const N: u8>(x: u<N>) couldn’t do anything with it’s argument.

@clarfonthey
Copy link
Contributor Author

I mean, it would be nice if you could allow {N as T} expressions in const generics as a special case when there's no truncation, under the assumption that it's always unambiguous, but again, this runs into all the other problems with const generics and unification.

I don't think it'd be a good idea to delay deciding what the type of the parameter should be in this case, although we might end up getting the above anyway before this feature is actually implemented, since it's currently one of the project goals; see rust-lang/rust-project-goals#100

@hanna-kruppe
Copy link

You mean carve it out as something that doesn't even need any of the implementation challenges to be solved because you could mostly ignore the cast completely? Might work, might cause horrible problems down the line, I don't know. But I'm already working under the assumption that "MVP generic const expressions" is a prerequisite for generic integers to be stabilized and widely used. Starting from a smaller type doesn't buy you anything in terms of language design or type system interactions,, but it's a nicer model for users (well, for me at least) because it means far fewer places where my "could this truncate?" spidey sense tingles.

@programmerjake
Copy link
Member

programmerjake commented Oct 4, 2024

I am not super familiar with core::simd, so please correct me if I misunderstood something. I understand that the N in Simd<T, N> and Mask<T, N> needs to be usize because both often occur in type signatures together and Simd<T, N> is closely tied to [T; N] in various ways. But mixing those types with u<N> seems much more niche, for reasons that I think are pretty fundamental.

From looking at the current implementation it seems that Mask<T, N> is a wrapper for Simd<T, N> on every target except x86 with AVX-512.

That's mostly because we only got around to setting it to bitmasks on AVX512, other targets that have bitmasks are at least RISC-V V and ARM SVE.

I fully expect Mask<T, N> to just be a wrapper over u<N> for any architecture that has bitmasks mostly because that gives the correct in-memory size, mask operations would either be direct integer operations on the mask integer and/or intrinsics that bitcast to llvm ir <N x i1> bitvectors. note that none of those intrinsics are architecture-specific, we let llvm translate from architecture independent operations to whatever the architecture uses -- except for dynamic swizzle since llvm doesn't yet have llvm ir instructions/intrinsics for architecture-independent dynamic swizzle.

@clarfonthey
Copy link
Contributor Author

But I'm already working under the assumption that "MVP generic const expressions" is a prerequisite for generic integers to be stabilized and widely used.

I'm not, actually, which is why I felt comfortable restarting this RFC at this point in time.

Back at the original RFC's time, we were in a similar situation with regard to const generics in general that we are right now with regard to these other const generic features: they were definitely coming, we had plans for them, and we abstractly knew what they would look like, but they weren't complete yet. The difference is that we genuinely cannot implement generic integers without const generics, whereas we totally can have generic integers without generic const expressions.

Yes, it's likely that a lot of these things will be mostly resolved by the time an implementation exists, but even if they were delayed for a year, they didn't live up to what we hoped they'd be, or they take a very long time to become stable, I think that generic integers could still exist and be useful. For example, if we allowed most integer methods and operations via the generic type but still made from_ne_bytes exclusive to the current integer types, I would say that's still very valuable, and we can always improve it later.

So, I think that it's better to consider the feature from the perspective that these features don't exist, so we don't hype up potentially unrealistic fantasies of what we'll be able to do with them. And I think that even without them, this is still incredibly useful of a proposal, and it's not incompatible with the improvements we can make later with these features.

@hanna-kruppe
Copy link

That's mostly because we only got around to setting it to bitmasks on AVX512, other targets that have bitmasks are at least RISC-V V and ARM SVE.

Adding additional platforms that use an integer representation for masks internally doesn't remove the (common and important!) ones that prefer the vector representation. It also doesn't make mask<->integer punning in user code any better for performance portability -- even on SVE and RVV, you want the generated code to stay in predicate/mask land as much as possible.

I also have some doubts about whether integers are really the best representation of masks in RVV and predicates in SVE, considering ISA design, calling conventions, the vendor-defined C intrinsics, and what little I know about existing uarchs. But I don't want to come off as telling you and the others working on portable SIMD how to do your job and this isn't the right venue for a deeper discussion in any case.

@programmerjake
Copy link
Member

Adding additional platforms that use an integer representation for masks internally doesn't remove the (common and important!) ones that prefer the vector representation. It also doesn't make mask<->integer punning in user code any better for performance portability -- even on SVE and RVV, you want the generated code to stay in predicate/mask land as much as possible.

portable-simd's only vector representation for masks is not bitmasks, but instead full integer elements that are 0 or !0 (we call them full masks). e.g. on avx2 Mask<i16, 16> is basically Simd<i16, 16> but with each element known to be either 0 or !0.

bitmasks currently are just struct wrappers around an integer (and that seems unlikely to change, though operations may change to use more intrinsics). we rely on llvm optimizations to translate that to operations on mask/predicate registers.

@hanna-kruppe
Copy link

I don't get why you're explaining that in response to what I've written. Let's try a different angle. Can you point at some concrete code snippets where u<N> would occur together with Simd<T, N> or Mask<T, N>?

I've tried to guess at where exactly you're trying to go with that connection but it feels a bit like we've been talking past each other. Put differently, I would distinguish between two aspects:

  1. u<N> would be used internally inside Mask<T, N> on some platforms, in place of what's currently [u8; (N+7)/8] behind some macros and traits. I think I understand the benefits of this, but their scope is very bounded.
  2. Code using the portable SIMD APIs would juggle u<N> along other type constructors also involving N (and wanting it to be usize. This aspect I don't really understand yet.

I'm aware of the existence Mask::{to,from}_bitmask (currently dealing in u64 but u<N> would be more natural). I'm specifically wondering about how commonplace and useful those two methods are, as opposed to all the other ways of working with masks. As I said, I don't have much experience with portable SIMD APIs, so I really don't know the answer. From all I know about the codegen that I'd want on various architecture, it seems like a performance footgun for portable code. So if you ask me, I'd only use in target- and lane-count-specific code that happens to be more convenient to write with core::simd than with core::arch (where the extra generality of u<N> as opposed to u64 is not very useful). If I'm wrong about that, please show me a counter-example.

@clarfonthey
Copy link
Contributor Author

Unrelated to the above, but another thing that occurred to me as a benefit of this system is exhaustive matching.

In a recent project I have quite a few places where I'm doing bitmasking, then matching on the result, where there always has to be a wildcard unreachable!() branch. This could help reduce the number of cases of that by replacing them with exhaustive matches over smaller integer types.

@juntyr
Copy link
Contributor

juntyr commented Oct 5, 2024

Unrelated to the above, but another thing that occurred to me as a benefit of this system is exhaustive matching.

In a recent project I have quite a few places where I'm doing bitmasking, then matching on the result, where there always has to be a wildcard unreachable!() branch. This could help reduce the number of cases of that by replacing them with exhaustive matches over smaller integer types.

That’s a really good point, that I think even applies to current types.

Let’s say I have a random number generator that gives me a u64, and I want to split that into two u32s. For explicitness sake, I will usually bitmask both the original number and the down-shifted one with u32::MAX before as-casting to u32. Clippy of course still complains and so I need to allow some truncation lint.

What would be great, especially for generic ints, would be to have a method truncate_into<const N: usize>(self) -> u<N> that would do both the masking and type conversion and be as explicit as doing the bit masking myself.

@clarfonthey
Copy link
Contributor Author

What would be great, especially for generic ints, would be to have a method truncate_into<const N: usize>(self) -> u<N> that would do both the masking and type conversion and be as explicit as doing the bit masking myself.

So, at least that part will hopefully be covered by the WrappingFrom RFC, but I was thinking of going further than this and maybe offering a bitfield<const N: usize, const M: usize>(self) -> u<{M-N}> function, maybe returning u<M> depending on how long generic const exprs take.

Sure, this doesn't cover cases where the bit fields are discontinuous, but it covers most of them.

@programmerjake
Copy link
Member

I don't get why you're explaining that in response to what I've written. Let's try a different angle. Can you point at some concrete code snippets where u<N> would occur together with Simd<T, N> or Mask<T, N>?

I've tried to guess at where exactly you're trying to go with that connection but it feels a bit like we've been talking past each other. Put differently, I would distinguish between two aspects:

  1. u<N> would be used internally inside Mask<T, N> on some platforms, in place of what's currently [u8; (N+7)/8] behind some macros and traits. I think I understand the benefits of this, but their scope is very bounded.

yes they are bounded, but also critical for getting arbitrary length Simd types so we can remove the trait bounds on the length, which makes portable-simd much more ergonomic when dealing with generic lengths.

  1. Code using the portable SIMD APIs would juggle u<N> along other type constructors also involving N (and wanting it to be usize. This aspect I don't really understand yet.

I'm aware of the existence Mask::{to,from}_bitmask (currently dealing in u64 but u<N> would be more natural). I'm specifically wondering about how commonplace and useful those two methods are, as opposed to all the other ways of working with masks.

One example is when counting the number of Mask elements that are set, there you'd use my_mask.to_bitmask().count_ones(). Also for figuring out the first/last index that is set/clear (e.g. searching for stuff), where you'd use my_mask.to_bitmask().trailing_zeros() to return the index of the first set element, and similarly for the other combinations.

@tmccombs
Copy link

tmccombs commented Oct 6, 2024

One example is when counting the number of Mask elements that are set, there you'd use my_mask.to_bitmask().count_ones(). Also for figuring out the first/last index that is set/clear (e.g. searching for stuff), where you'd use my_mask.to_bitmask().trailing_zeros() to return the index of the first set element, and similarly for the other combinations.

Those both seem like things that it would make sense to have methods for directly on Mask.

@programmerjake
Copy link
Member

programmerjake commented Oct 6, 2024

One example is when counting the number of Mask elements that are set, there you'd use my_mask.to_bitmask().count_ones(). Also for figuring out the first/last index that is set/clear (e.g. searching for stuff), where you'd use my_mask.to_bitmask().trailing_zeros() to return the index of the first set element, and similarly for the other combinations.

Those both seem like things that it would make sense to have methods for directly on Mask.

Yeah, that's part of the problem: either things are popular enough to warrant a method on Mask or are not quite popular enough so they need to depend on to_bitmask()...

This issue on adding something like to_bitmask() to WASM also has some other uses: WebAssembly/simd#131
e.g. one commenter wanted to use to_bitmask() for indexing for a decision tree for JPEG stuff: WebAssembly/simd#131 (comment)

also, AVX512 has kadd* for adding two masks to each other, so they presumably thought adding two bitmasks was useful enough to add to AVX512, idk what it's used for though... This seems like something that won't be added to Mask though.

@hanna-kruppe
Copy link

hanna-kruppe commented Oct 7, 2024

yes they are bounded, but also critical for getting arbitrary length Simd types so we can remove the trait bounds on the length, which makes portable-simd much more ergonomic when dealing with generic lengths.

If the end goal is to not have any bounds, then that's a fair point. There are other possible solutions (e.g., enough "generic const expressions" to make struct Mask<N: usize>([u8; (N + 7) / 8]) feasible) but they're not better either and who knows which will get implemented first.

One example is when counting the number of Mask elements that are set, there you'd use my_mask.to_bitmask().count_ones(). Also for figuring out the first/last index that is set/clear (e.g. searching for stuff), where you'd use my_mask.to_bitmask().trailing_zeros() to return the index of the first set element, and similarly for the other combinations.

I know these use cases well, but as mentioned I would prefer not to write them that way in portable SIMD because there are much more efficient ways to implement them on most non-x86 targets. Sometimes even the most efficient spelling is not worth it (e.g., SwissTable/Hashbrown switches between 128-bit SSE2, 64-bit NEON, and integer-based SWAR depending on the target mostly because of the wildly varying cost of the necessary mask operations). As far as possible, a portable API should avoid guiding people into performance portability cliffs, and provide more abstract operations whenever feasible. This was also a theme in the Webassembly discussion you linked. As you say, there's always a long tail of creative uses that the mask abstraction can't cover completely, so I would never argue for not having conversions between masks and integers at all. But I think in the context of "using core::simd instead of core::arch, and abstracting over lane count" they're so niche that nicer APIs for that specific combination comes very low on the list of priorities. Removing SupportedLaneCount<N>-style bounds from portable SIMD code that only ever mentions Mask<T, N> and Simd<T, N> seems much more significant to me.

Aside about AVX-512 `kadd*`

also, AVX512 has kadd* for adding two masks to each other, so they presumably thought adding two bitmasks was useful enough to add to AVX512, idk what it's used for though... This seems like something that won't be added to Mask though.

I would love to hear why the architects added this one. I googled a bit and found virtually no mention of it and only one potential use. But that article also says that the instruction is too slow to be worthwhile on the CPUs considered and simply doing the mask->GPR->mask round-trip that kadd* is supposed to avoid turns out faster. Indeed Agner Fog lists it as latency 4 throughput 1 on all existing Intel chips that implement it. Zen 4 has put more effort into it, which could either be because they care about some code that uses or could use the instruction, or because it just happened to be easier for them because of other design decisions. In any case, this instruction just replaces three cheap uops (move mask register into GPR, do add in GPR, move result from GPR to mask register) with one -- whether that's a win depends on their relative cost and on which uarch resources are the bottleneck for whatever you're doing. It's potentially very useful if and only if you're already counting uops and critical path length on a particular targets and spend hours thinking about how you can reduce those figures by warping how you do a computation around the idiosyncrasies of your target hardware. That sort of work can be interesting and useful, but in my experience it's rarely if ever done from behind the veil of portable SIMD abstractions. Yeah, I wouldn't add this operation to Mask. But also, if that was the only operation that required mask<->uN conversions (it's not), I don't know if that would be sufficient motivation for adding it.

@programmerjake
Copy link
Member

programmerjake commented Oct 7, 2024

As far as possible, a portable API should avoid guiding people into performance portability cliffs, and provide more abstract operations whenever feasible.

yes, but I also think that translating from a reasonable portable implementation to whatever weirdness your particular cpu architecture requires should be mostly llvm's problem to handle -- since that's how almost all of portable-simd currently operates and that allows portable-simd's users to easily write actually portable simd and still get good performance without having to spend weeks researching how every different target does stuff its own special way.

iirc so far the only exceptions to portable-simd's leaving it up to llvm are switching between bitmasks/fullmasks, implementing dynamic swizzle (since llvm just plain doesn't have a non-arch-specific operation for that), and working around aarch64 backend bugs for integer division.

so, in summary, I think a programmer writing my_mask.to_bitmask().count_leading_zeros() or any other common bitmask op should compile to the optimal arch-specific weirdness for that, even if we wrap that to_bitmask call sequence into a Mask helper method.

@hanna-kruppe
Copy link

hanna-kruppe commented Oct 7, 2024

I don't disagree but I also don't think it conflicts with what I said, considering that LLVM does not (and might never) 100% live up to that goal. Providing higher-level operations on Mask and guiding people towards those doesn't require users to know about different targets and implementation strategies, they just need the methods to exist and documentation to encourage their use. The higher level operations are strictly better whenever they're applicable: easier to use than manually translating to integer bit twiddling, more self-documenting, gives at least equally good codegen, and makes it easier for Rust to work around missing optimizations in LLVM or other changes in backends.

As I said before, getting rid of the bounds on N in the Simd<T, N> and Mask<T, N> is a good argument independently of the above. Everything beyond that has only a very loose relation to u<N>, so I would suggest we either stop here or take it elsewhere (you can open an issue anywhere and ping me, if you want).


## Documentation decluttering

Having generic impls would drastically reduce the noise in the "implementations" section of rustdoc. For example, the number of implementations for `Add` for integer types really drowns out the fact that it's also implemented for strings and `std::time` types, which is useful to know too.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Having generic impls would drastically reduce the noise in the "implementations" section of rustdoc. For example, the number of implementations for `Add` for integer types really drowns out the fact that it's also implemented for strings and `std::time` types, which is useful to know too.
Having generic impls would drastically reduce the noise in the "implementations" section of rustdoc. For example, the number of implementations for `Add` for integer types really drowns out the fact that it's also [implemented for strings](https://doc.rust-lang.org/stable/std/ops/trait.Add.html#impl-Add%3C%26str%3E-for-String) and `std::time` types, which is useful to know too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-types Relevant to the types team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.