Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative macro_rules! derive macros #3698

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Sep 21, 2024

Many crates support deriving their traits with derive(Trait). Today, this
requires defining proc macros, in a separate crate, typically with several
additional dependencies adding substantial compilation time, and typically
guarded by a feature that users need to remember to enable.

However, many common cases of derives don't require any more power than an
ordinary macro_rules! macro. Supporting these common cases would allow many
crates to avoid defining proc macros, reduce dependencies and compilation time,
and provide these macros unconditionally without requiring the user to enable a
feature.

I've reviewed several existing proc-macro-based derives in the ecosystem, and
it appears that many would be able to use this feature to avoid needing proc
macros at all.

Rendered

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 21, 2024
@joshtriplett
Copy link
Member Author

Nominated as a follow-up to recent lang discussions about this.

@joshtriplett joshtriplett added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Sep 22, 2024
@liigo
Copy link
Contributor

liigo commented Sep 23, 2024

Since its call-side syntax is different from normal macro_rules! (i.e. macroname!()), the definition-side syntax should also variants e.g. macro_rules_derive! (v.s. #[proc_macro_derive(TraitName)]).

The same to #3697 where we could name it macro_rules_attribute! (v.s. #[proc_macro_attribute]).

@coolreader18
Copy link

coolreader18 commented Sep 26, 2024

One thing that would be really nice (i.e. greatly increase usability of decl macros) (though would not at all decrease the usefulness of this feature) is proper, full-featured parsing of meta fragments of decl macros. Right now, if you want to parse arguments to a decl macro in a MetaList form, you have to either enforce that the named arguments come in a specific order, or use a tt-muncher (and even then, it's not clear at first thought how you'd do it!). Obviously this is something that would introduce a whole new axis of complexity to decl macro declarations and evaluation, but being able to specify something like $[$(foo = $foo:expr)? $(bar($bar:expr))*],* and accept 0 or 1 foos and 0 or more bars in any order would be a game changer, and I can't see using attr or derive decl macros being very ergonomic for anything but the simplest cases without something like that. This RFC lets you define helper attributes for decl derive macros, but without any easy way of extracting them from out of the set of other attributes on the item, I don't see that being very useful for most people. An accessible and intuitive macro definition format tt-munchers are not.

(Shoulda brought this up at the hopes and dreams for the language session at Unconf 🙃)

@joshtriplett
Copy link
Member Author

@coolreader18 I would love to see many parsing improvements for macros, including something to address this kind of parsing difficulty. I don't think that's specific to derive or attribute macros, though it certainly makes them more useful.

@matthieu-m
Copy link

To define helper attributes, put an attributes key in the macro_derive attribute, with a comma-separated list of identifiers for helper attributes: #[macro_derive(attributes(helper))]. The derive macro can process the #[helper] attribute, along with any arguments to it, as part of the item the derive macro was applied to.

In order to avoid name collisions in helper attributes between different derives, I think it would be worth it to take a page out of serde here and require namespacing of those attributes.

For example, simple serde code:

#[derive(Debug, Deserialize)]
struct Message {
    #[serde(rename = "type")]
    type_: String,
    #[serde(default = "Message::default_payload")]
    payload: String,
}

Some names (like default) are likely to be particularly sought after, and mixing dependencies which clash on those helper names will be nightmarish.

Convention for the namespace could dictate that it be either a parent module/crate name, or the name of the trait, snake-cased.

@jplatte
Copy link
Contributor

jplatte commented Sep 26, 2024

I don't think that's a good idea. It's already the case that other libraries than serde look at serdes attributes, sometimes even in a context where the input type does not use serdes derives at the same time (but a proc-macro copies the serde attributes to some types it generates). As one example, the derive macro schemars::JsonSchema currently "registers" serde, schemars and validate, likely for good reason.

@ssokolow
Copy link

ssokolow commented Sep 26, 2024

I don't think that's a good idea. It's already the case that other libraries than serde look at serdes attributes, sometimes even in a context where the input type does not use serdes derives at the same time (but a proc-macro copies the serde attributes to some types it generates). As one example, the derive macro schemars::JsonSchema currently "registers" serde, schemars and validate, likely for good reason.

Perhaps some kind of pub annotation then? Yes, that would have made achieving the current state of Serde attribute cross-compatibility difficult, but it has been the Rust way to default to pushing back against Hyrum's law.

@epage
Copy link
Contributor

epage commented Sep 26, 2024

There has also been talk of "common" attributes, like skip.

For clap, namespacing by crate or derive name was insufficient and it now processes 4 different namespaces.

If we did encourage something by default, i think it should be derive name so there is a clear relationship.

On a simlar note of constraining users, imo derives should only produce a trait impl for the derive and considered proposing that be enforced but figured that deviating for what proc-macros provide would also be a downside. Also, I've seen with clap how it can be useful to include mostly-internal trait impls with the requested one.

@kpreid
Copy link
Contributor

kpreid commented Sep 26, 2024

On a simlar note of constraining users, imo derives should only produce a trait impl for the derive

Note that imposing such restriction would make it difficult to write derives that implement traits like IntoIterator, where each implementation typically introduces a new type that must be nameable by the impl, until associated type position impl Trait is available.

I was recently exploring this space while working on exhaust 0.2, which needs to generate two types and several impls along with the nominally derived impl, and my current strategy for being nice to the macro’s users is that the derive macro’s generated items are all inside a const _: () = { ... } block; this ensures that the macro’s expansion has no effect on the caller’s namespace, even though more items than just a trait impl are generated. One could imagine inserting that block wrapper automatically as part of the semantics of derive macros (whether declarative or procedural), but that should probably be its own design discussion separate from introducing a new kind of macro implementation.

@matthieu-m
Copy link

I don't think that's a good idea.

For clap, namespacing by crate or derive name was insufficient and it now processes 4 different namespaces.

I would note that I specifically mentioned "as a convention".

If anything, the fact that schemars also looks at serde namespace is a validation of the need for namespacing to me. Otherwise if there's a #[default = ...] it wouldn't even know whether that's for serialization/deserialization or some unrelated purpose.

I think enforcing namespacing and having a simple convention for simple usecases -- so folks can just follow along -- would work well. Perhaps a clippy lint which has to be explicitly #[expect] for the handful of usecases requiring more schemas.

There has also been talk of "common" attributes, like skip.

Common attributes are fine: a standardized meaning should not cause "mishaps".

@joshtriplett
Copy link
Member Author

@PoignardAzur I've added an item to the future possibilities section about making it easier to derive where bounds.

@weiznich
Copy link
Contributor

weiznich commented Oct 2, 2024

Thanks for filling this RFC. I really like this proposal, as it can also solve some other issues around proc macro derives. That includes things like not having a $crate equivalent for proc macros which make it notoriously hard to couple a proc macro to a crate and issues around feature unification. Both would be solved by allowing to write such macros as declarative macros in the main crate, as you just don't have that separate crate to deal with then.

That written I would like to point to an in my opinion important detail. The RFC mentions as future possibility that:

We should provide a way for macro_rules! macros to provide better error reporting, with spans, rather than just pointing to the macro.

I think it's rather important to provide this functionality from the beginning. I believe that many of the more popular derive marcos wouldn't be interested in migrating to something that does regress their error messages significantly.

@scottmcm
Copy link
Member

scottmcm commented Oct 2, 2024

We might want to provide a generate_where_bounds macro:

I don't think this is possible from tokens. To me, these are the reasonable options:

  • Bound all the type parameters, which is most semver stable, like all the built-in derives do.
  • Bound all the field types directly (like impl<T> Default for Foo<T> where Option<T>: Default, u32: Default, ActuallyLetsSkipV: Default), which repeats exactly what you use in the impl.
  • Have an attribute saying which bounds to emit, for people who want something in the middle between those other options.

With TAITs and associated types in traits and such, knowing what's happening from tokens doesn't make sense to me.

@nikomatsakis
Copy link
Contributor

My concern with this RFC is that it will lead to a large number of hacky derive macros that don't support the full variety of Rust syntax. Writing a macro-rules that will handle anonymous fields, named fields, enums, attributes, (in the future) default values, etc feels pretty hard and likely to ultimately require tt-munging, which we don't really have the ability to abstract over.

I was hoping to see a somewhat more declarative approach where the compiler handled the parsing on your behalf, with syn and synstructure as the fallback case.

@joshtriplett
Copy link
Member Author

joshtriplett commented Oct 2, 2024

@weiznich My expectation is that we'll try it in nightly, experiment with it for various macros in the ecosystem, see how it goes (including what the errors look like), and figure out what we need before stabilization. If the error messages are wildly worse in common cases, I certainly think we'd want to address that before stabilization.

I've added an unresolved question to that effect to make sure we evaluate that before stabilization.

@joshtriplett
Copy link
Member Author

joshtriplett commented Oct 2, 2024

I was hoping to see a somewhat more declarative approach where the compiler handled the parsing on your behalf

I'd love to see that one day as well, but I think this is one step in that direction. We can start out (in nightly) with "this is possible at all", and then incrementally add mechanisms for macros to more easily extract things from syntax, as we experiment and discover what people need.

For instance (as one possible example, not meant to be normative), imagine adding a "struct field" matcher, and then extending the macro metavariable expressions mechanism to have a way to extract pieces of structure from a higher-level AST match (e.g. "field name" and "field type"), via something like ${f.name}. That would make many simple cases really easy.

@weiznich
Copy link
Contributor

weiznich commented Oct 3, 2024

@joshtriplett Given that there are quite a few questions that were answered with: Let's land something on nightly first and then iterate from there, it's maybe a good idea to explicitly state that in the RFC itself by marking it as experimental RFC?

@joshtriplett
Copy link
Member Author

@joshtriplett Given that there are quite a few questions that were answered with: Let's land something on nightly first and then iterate from there, it's maybe a good idea to explicitly state that in the RFC itself by marking it as experimental RFC?

It's already stated in the RFC in the form of unresolved questions, which turn into checklist items in the tracking issue for subsequent evaluation. I've also explicitly captured the concern about crate maintainers getting pressure to use something that doesn't have all the features they want yet; that's a problem we can address with appropriate messaging, in addition to subsequent improvements.

The RFC aims to define a chunk of useful functionality that will be useful for some crates, without the expectation that every possible proc macro will immediately be a good idea to convert.

@PoignardAzur
Copy link

PoignardAzur commented Oct 3, 2024

(Repeating from my post in #3681 for the sake of transparency, though I expect the main discussion to happen there.)

Bound all the type parameters, which is most semver stable, like all the built-in derives do.

The built-in derives do more than that. If you write:

#[derive(Default)]
struct MyStruct<T: SomeTrait>
{
    t: T,
    ta: T::Assoc,
}

The generated derive will be:

impl<T: ::core::default::Default + SomeTrait> ::core::default::Default for MyStruct<T>
    where T::Assoc: ::core::default::Default 
{
    // ...
}

The macro already needs to parse the fields to generate the T::Assoc: ::core::default::Default bound.

So even if we stuck to the "one bound per type parameter" approach, we would still likely want a generate_where_bounds macro.

@weiznich
Copy link
Contributor

weiznich commented Oct 4, 2024

The RFC aims to define a chunk of useful functionality that will be useful for some crates, without the expectation that every possible proc macro will immediately be a good idea to convert.

I'm honestly not sure if this will be really that useful for anyone without a lot of additional work. One advantage as the maintainer of an old crate like diesel is that I can look back in the past and see how things were back then in the "beginning". In this case: Diesel is older than stable proc-macro support, so I remember the ways how this was implemented back then before we had access to stable proc-macros. You can find the code here. Funnily the solution we used back then is very similar to what this RFC proposes, although it did not have the syntax sugar to be able to just write #[derive(MyMacro)]. There was the custom_derives crate back then, which essentially enabled the syntax proposed by this RFC, with the minor cavecat that you would need to put the struct definition into a macro call.
Niko pointed out that it really hard to parse all possible syntax variants of rust structs/types/enums in a declarative macro today. I fully agree with that. See the old code from simpler days for some estimation how complex that would be today with even more syntax variants. So having support from the compiler here seems the be necessary from my point of view. This becomes especially important if you think about what happens if the rust syntax is extended. The current approach would likely lead to quite a few such crates fail to accept the new syntax, as their macro parser cannot handle these new cases. That's not an issue with the syn based approach or if the compiler does the parsing and only invokes some sort of special macro syntax.

As another counter point diesel has a table! with relatively simple syntax that is fully under our control as crate authors. Even for that we decided to implement that parser as proc-macro, as the macro was quite complex and the error reporting was quite bad if something went wrong. (It essentially boiled down to: You macro is wrong, checkout the documentation for the correct syntax).

Now my main points are:

  • If this RFC is only for experimentation: It's already possible to do that on stable rust, you can just use the same approach as the old custom_derives approach
  • There is existing old code that essentially used this approach already. There are a lot of reasons why crates like diesel moved away from this approach. You likely find a few more examples for this by checking which other crates also used custom_derives

@joshtriplett
Copy link
Member Author

joshtriplett commented Oct 21, 2024

I've looked at the linked code, and I can understand that the full generality of the macro infrastructure that Diesel had is not something you want to have to maintain. From what I've seen in the ecosystem, many proc macro derives will not require that degree of complexity, and some will be able to switch over earlier than Diesel will want to. I've spelled out in the RFC the potential issue of pressure on crate maintainers, and included a stabilization-blocking requirement that we provide guidance for users and crate maintainers to attempt to avert that. There's a limit to how much we can do there, but we should do what we can.

Beyond that, there's a long history of potential macro improvements stalling out because scope creep or because "wait until macros 2.0", rather than taking place incrementally. There is no one-way door here, so we can ship features incrementally without blocking on having everything users may want.

@weiznich
Copy link
Contributor

Beyond that, there's a long history of potential macro improvements stalling out because scope creep or because "wait until macros 2.0", rather than taking place incrementally. There is no one-way door here, so we can ship features incrementally without blocking on having everything users may want.

I'm not sure I agree with that. Yes not shipping any features because they are not 100% perfect is bad, sure. But on the other hand: Shipping features and promising to "fix" more complex cases later on is also bad, because quite often that fixing did not happen anytime soon. (Examples: Proc-macro diagnostics, various async limitations (traits, drop, …)) That leaves crate maintainers in a situation where there might be a considerable pressure to use a new feature but it is not possible to use it in an good.

I personally would wish that the RFC either clears up that this feature is supposed to be used in special situations (and declares which ones) or that it includes a more general approach to handle at least something rather simple like #[derive(Default)] (without #[default]) or #[derive(Clone)]. We hopefully all agree that these derives are not complex in terms of code parsing is done there. I personally do not see a way to express without at least a considerable amount of parsing code expressed as macro rules. That code wouldn't be a problem if it would be maintained as part of the compiler, but it becomes a problem as soon as each crate includes it's own likely incomplete and broken version.

@joshtriplett
Copy link
Member Author

Before stabilizing this feature, we should receive feedback from crate
maintainers, and potentially make further improvements to macro_rules to make
it easier to use for their use cases. This feature will provide motivation to
evaluate many new use cases that previously weren't written using
macro_rules, and we should consider quality-of-life improvements to better
support those use cases.

@ijackson
Copy link

ISTM that this feature is very difficult to use because of (1) macro_rules's severe parsing limitations, as others have said (2) even if those were improved, the results would not be very ergonomic, since you'd still need some kind of matcher for the derive input.

These problems could be addressed by taking an approach like derive-deftly's: don't have the user write a matcher for the struct definition, and instead give them pre-canned bindings for the pieces of the input. In principle something like derive-deftly could be done in-language.

I think rust-lang's development effort could probably be better spent by trying to improve macro_rules's limitations (some of this has already been done, but there's more needed; the ambiguity rule is particularly troublesome and also nontrivial to get rid of). Until that's done, there's little value in providing merely a mildly improved invocation syntax.

IOW: In this area, the hard problems need to be tackled before we start adding sugar.

@kennytm
Copy link
Member

kennytm commented Oct 23, 2024

@ijackson I believe that derive-deftly approach is being addressed by #3714.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.