Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CDDL marking and linking #2977

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Support CDDL marking and linking #2977

wants to merge 5 commits into from

Conversation

tidoust
Copy link
Contributor

@tidoust tidoust commented Dec 3, 2024

I'm creating this pull request as a draft because there are still a couple of things missing. Before I finalize the PR, I'd appreciate feedback on whether this does what people want for #2072, and whether it does it in a way that works for everyone. See below for more specific questions.

TL;DR

This adds:

  1. CDDL autolinking capabilities (and highlighting but that was sort of already the case).
  2. A new shorthand notation for CDDL autolinks: {^foo^}
  3. A CDDL index, generated at the end of the spec, split per CDDL module when needed.

Implementation notes

I copied most of the logic from the code used to process IDL blocks.

As opposed to IDL definitions, CDDL definitions are not exported by default because CDDL is typically not imported/exported between specs (there is ongoing work on a CDDL module structure at IETF, but that would be done through explicit "import" statements in any case).

Support for CDDL definitions means 4 new definition types get introduced:

  • cddl-type: roughly the equivalent of an IDL interface. Type definitions do not have a data-dfn-for attribute.
  • cddl-key: roughly the equivalent of an IDL attribute. Key definitions always have a data-dfn-for attribute that links back to a CDDL type.
  • cddl-value: roughly the equivalent of an enum-value in IDL. Value definitions always have a data-dfn-for attribute that links back to a CDDL type or to a CDDL key.
  • cddl-parameter: used for generic parameter names (no known spec uses CDDL generics for the time being).

The definition types are prefixed with cddl- to avoid collision with other types used for other purpose (e.g., value in CSS).

Once we start having CDDL definitions in the cross-references database, it may become useful to add a fifth type cddl-operator to cover and link operators that may appear in definition types such as .default or .within and link back to their definition in RFC 8610.

The code collects CDDL definitions to produce a CDDL index at the end of the spec. To accommodate specs like WebDriver BiDi that define two sets of CDDL (remote end and local end), it is possible to associate CDDL definitions with one or mode CDDL modules:

  1. Any CDDL module must be defined somewhere in the spec with a <dfn> with a data-cddl-module attribute set to a shortname for the CDDL module.
  2. CDDL blocks must add a data-cddl-module attribute set to a comma-separated list of CDDL module shortnames they belong to. If a CDDL block does not have that attribute, the code considers it is defined in all CDDL modules.

For example:

The <dfn data-cddl-module="local">local end definitions</dfn> contain foo.
The <dfn data-cddl-module="remote">remote end definitions</dfn> contain bar.

<xmp class=cddl data-cddl-module=local>
LocalCommand = {
  param: common
}
</xmp>

<xmp class=cddl data-cddl-module=remote>
RemoteCommand = {
  param: common
}
</xmp>

<xmp class=cddl data-cddl-module="local,remote">
common = any
</xmp>

The CDDL index is split per module, using the <dfn> text as title for each module.

Note: Even when modules are used, CDDL definitions in a spec are part of a single namespace, meaning that a foo rule cannot be defined differently within a single spec for two different CDDL modules. I believe that's fine for all specs that define CDDL, and that keeps the autolinking logic simple.

CDDL parsing is done through a hand-made CDDL parser that follows the CDDL
grammar defined in RFC 8610, currently available as tidoust/cddlparser.

That parser started as a port from the cddl Node.js parser but has now deviated significantly from it to stay closer to the grammar, and allow re-serialization and marking up of the tree in a way that preserves whitespaces and comments. I tested the parser against CDDL extracts from IETF RFCs and a few W3C specs.

To ease authoring, a new shorthand notation gets introduced to reference CDDL:
{^foo^} is an autolink to a CDDL definition. FWIW, the shorthand was chosen
to mean "shortcut for an autolink to CDDL code" on the grounds that:

  • {} would hint at a code block (as for IDL blocks)
  • ^ means "cut" in CDDL. Granted, the notion of "cut" in CDDL has nothing to do with a shorthand. Anyway.

CDDL type definitions can become somewhat arbitrarily convoluted, the code does not attempt to be too smart and won't autolink definitions that are too complex.

CDDL defines a number of types in the standard prelude, such as int, tstr, or bool. There is no more specific anchor that may be used and the code simply does not link these types back to RFC 8610 for now.

Still missing

Before the PR can be merged, I still need to:

  • Publish the parser as a Pypi package (once I've figured out how to do that ;)) so that it may be properly imported. The proposed code won't work on any other computer than mine for now due to the use of a local import. This problem also explains current linter failures. Edit: cddlparser package published and tested with Python >=3.9
  • Update the doc to reflect the CDDL capabilities.

(Editorial updates may also be needed afterwards to specs that currently define CDDL. I'm happy to help with that.)

Questions

  • Does the overall approach look sound?
  • What about the new CDDL definition types? Should we rather only have a generic cddl type without creating more specific ones? (CDDL really is all about "types", the nuance between different constructs is not always super explicit)
  • It seemed useful to autolink enumeration values as done for IDL, but then there's nothing that distinguishes an enumeration from a single literal text value in CDDL, so the code will end up autolinking single values as well (things like the "browser.close" in WebDriver BiDi). Is it too much?
  • What about the new shorthand syntax? Should another shorthand be used? Should we start without a shorthand?
  • Should standard types link back to the prelude section in RFC 8610? Ideally, they would become available as exported definitions in the cross-references database, but again the problem is that there is no specific anchor in the RFC for them. Having links back to the RFC would also add it to the list of normative references, which would be a good thing as well.
  • Is the data-cddl-module mechanism to split CDDL per module in the CDDL index suitable?
  • Regarding the parser, would it be a good idea to move it under the speced organization on GitHub (provided the Speced CG agrees with it)? I ended up writing it because I didn't see any immediate way to proceed without a parser but then I'm neither good at writing Python code nor an expert in CDDL, so I'd happily share that mostly-shaven yak ;) I'm happy to maintain the code though.
  • Anything I forgot?

This makes Bikeshed process CDDL blocks à la `<pre class=cddl>` as described
in speced#2072 to:
- add highlighting (done by Pygments, the code merely sets the right class)
- wrap terms in `<dfn>` and `<a>` automatically, making it possible to
define them in, or reference them from, the rest of the prose.

Most of the logic is copied from the logic used to process IDL blocks.

As opposed to IDL definitions, CDDL definitions are not exported by default as
most CDDL definitions only apply to the underlying spec.

Support for CDDL definitions means 4 new definition types get introduced:
- `cddl-type`: roughly the equivalent of an IDL interface. Type definitions do
not have a `data-dfn-for` attribute.
- `cddl-key`: roughly the equivalent of an IDL attribute. Key definitions
always have a `data-dfn-for` attribute that links back to a CDDL type.
- `cddl-value`: roughly the equivalent of an enum-value in IDL. Value
definitions always have a `data-dfn-for` attribute that links back to a CDDL
type or to a CDDL key.
- `cddl-parameter`: used for generic parameter names (noting that no known spec
uses CDDL generics for the time being).

The definition types are prefixed with `cddl-` to avoid collision with other
types used for other purpose (e.g., `value` in CSS).

The code also collects CDDL definitions to produce a CDDL index at the end of
the spec. To accommodate specs like WebDriver BiDi that define two sets of
CDDL (remote end and local end), a mechanism gets added to associate CDDL
definitions with a given module:
1. The CDDL module must be defined with a `<dfn>` with a `data-cddl-module`
attribute set to a shortname for the CDDL module
2. CDDL blocks must add a `data-cddl-module` attribute set to a comma-separated
list of CDDL module shortnames they belong to. If a CDDL block does not have
that attribute, the code considers it is defined in all CDDL modules.

The index is split per module, using the `<dfn>` text as title for each module.

Note: Even when modules are used, CDDL definitions in a spec are part of the
same namespace, meaning a `foo` rule cannot be defined differently within a
single spec for two different CDDL modules.

CDDL parsing is done through a hand-made CDDL parser that follows the CDDL
grammar defined in RFC 8610, currently sitting under:
 https://github.com/tidoust/cddlparser

To ease authoring, a new shorthand notation gets introduced to reference CDDL:
`{^foo^}` is an autolink to a CDDL definition. FWIW, the shorthand was chosen
to mean "shortcut to CDDL code" on the grounds that:
- `{}` indicates a code block (for IDL)
- `^` means "cut" in CDDL

CDDL type definitions can become somewhat convoluted, the code does not
attempt to be too smart and won't autolink definitions that are too complex.
`mypy` started reporting type issues as it can now look into the `cddlparser`
package. This update bumps the version of `cddlparser` to get better typing
info, and adds assertions when appropriate.
Previous version was not fully compatible with Python 3.9 and Python 3.10. The
`cddlparser` project now runs tests under multiple versions of Python to
guarantee compatibility.
@tidoust
Copy link
Contributor Author

tidoust commented Dec 12, 2024

@tabatkins, coding-wise, this should be ready for review.

I published the CDDL parser as a Pypi package, adapted it to run with Python >= 3.9, and linted the code here. The parser is also more aligned with the CDDL grammar and reports an error when the CDDL syntax is invalid, which seems a good thing.

FYI, there's an issue right now with the WebDriver BiDi in that it uses a <pre class="cddl"> block to reference the CCDL rule EmptyResult defined elsewhere, creating an invalid CDDL block. That explains why the test update for WebDriver BiDi features a "CDDL syntax error" message.

If the PR looks reasonable, I'll update the documentation as well.

@tidoust tidoust requested a review from tabatkins December 12, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant