Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure stable IDs for dfn refs in domintro sections #2094

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sideshowbarker
Copy link
Contributor

@sideshowbarker sideshowbarker commented Jun 29, 2021

MDN and possibly other places have need to link to not just normative dfn terms/targets themselves in specs but also (actually, often) have need to link to secondary references to terms in other spec sections — in particular, in informative (non-normative) spec sections written specifically for web developers.

WHATWG specs follow a convention of using class=domintro to mark up such written-for-web-developers sections, and some other specs (e.g., the IndexedDB spec) have also adopted that convention.

To facilitate having stable links to term references in those class=domintro sections, this change ensures that those references are output with IDs that have -dev appended — rather than prefixed with ref-to.

And while it’s true that some references in class=domintro sections may end up with -dev-②, etc., suffixes — with ②, etc., circled-digits also appended, rather than ending with just -dev — that’ll likely happen mostly just for ancillary references that sites such as MDN won’t have need to link to anyway, while the “primary” initial references that MDN, etc., do have need for linking to will most likely end up with -dev suffixes without any additional ②, etc., circled-digits also appended.

Examining the changes to the test-case output in this PR branch confirms the assertion in the paragraph above about those “something that MDN has need for linking to” references in fact ending up with stable -dev suffixes without additional ②, etc., circled-digits also appended.

Without this change, even those terms references/links that are inside class=domintro sections are otherwise output with IDs of the form ref-foo-①, etc. — just as are the IDs output for such references/links outside of class=domintro sections.

So this change helps make it likely that primary references/links to terms inside class=domintro sections end up with stable IDs useful for referencing from MDN and other places — rather than ref-foo-②, etc., IDs that might change if some new reference to a term is added to a spec somewhere preceding a class=domintro section where the term is referenced.

cc @foolip


See the related discussion at mdn/browser-compat-data#11088.

For MDN and possibly other sources, it’s not desirable to link from MDN to the spec target IDs for defined terms (<dfn>s) themselves (that is, the places which give implementors the normative definitions for the terms); it’s instead desirable to link from MDN to the spec target IDs for the places in the spec that for any given defined term provide non-normative information written specifically for web developers (that is, the sections for which the class=domintro convention is used in WHATWG specs and in other specs).

And for class=domintro sections, IDs in the ref-foo-① form are suboptimal — since the digit suffix part may change if some new reference to a term is added to a spec somewhere preceding a class=domintro section where the term is referenced.

So those ref-foo-① IDs rightly need to be considered unstable. And so that’s why it’d be better to have stable -dev IDs if we can get them. And the test-case output in this PR branch seems to show that in fact we can successfully get them.

And while this change is particularly useful for WHATWG specs, which all use the class=domintro convention (in particular, the DOM, Streams, and Fetch specs, which each use it quite heavily), we can also say the following:

  • the IndexedDB spec and the other specs with changed test-case output in this PR also already use the class=domintro convention, so this change will give the same benefits for those specs; and
  • some other specs, e.g., the WebUSB and Device Orientation specs, already have domintro-like parts (written specifically for web developers) that could benefit from this same change if they were to adopt the class=domintro convention

@sideshowbarker sideshowbarker force-pushed the sideshowbarker/stable-dev-dominto-id-values branch from ebec859 to 3c32d8c Compare June 29, 2021 07:21
@sideshowbarker sideshowbarker changed the base branch from master to main June 29, 2021 07:30
This change ensures that dfn references/links inside sections which follow
the class=domintro convention are output with IDs ending with “-dev” —
and most likely, without any additional ①, ②, etc., suffix.

Otherwise, without this change, dfn references/links inside class=domintro
sections are output with IDs of the form “ref-foo-①”, etc. — just as the
IDs output for such references/links outside of class=domintro sections.

So this change helps make it likely that dfn references/links inside
class=domintro sections end up with stable IDs useful for referencing from
MDN and other places — rather than “ref-foo-②”, etc.  IDs that might change
if some new reference to a term is added to a spec somewhere preceding a
class=domintro section where the term is referenced.

Relates to mdn/browser-compat-data#11088
@sideshowbarker sideshowbarker force-pushed the sideshowbarker/stable-dev-dominto-id-values branch from 3c32d8c to e9eb9ea Compare June 29, 2021 07:33
@sideshowbarker sideshowbarker force-pushed the sideshowbarker/stable-dev-dominto-id-values branch from c5bae5a to 3ce8770 Compare June 29, 2021 11:12
@foolip
Copy link
Collaborator

foolip commented Jun 29, 2021

I haven't reviewed the change, leaving that to @tabatkins, but I like the approach, thanks @sideshowbarker!

Copy link
Collaborator

@tabatkins tabatkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cite the possibility of these refs still getting number-suffixed; do we want to allow the author to specify an attribute on the domintro ancestor whose value will be used to further dedup the IDs in that block? If it's absent we'd just continue to use the approach already outlined here.

@@ -45,7 +45,10 @@ def addDfnPanels(doc, dfns):
for i, el in enumerate(els):
refID = el.get("id")
if refID is None:
refID = f"ref-for-{id}"
if hasAncestor(el, lambda x: hasClass(x, "domintro")):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather abstract this to a method in h.dom, just to give it a bit more semantic meaning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since this full conditional block is duplicated in unsortedtext, it would probably be good to shift the entire thing over, to a generateRefId method or something. (Feel free to use a better name.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since this full conditional block is duplicated in unsortedtext, it would probably be good to shift the entire thing over, to a generateRefId method or something. (Feel free to use a better name.)

OK will do — and to be clear, this generateRefId method should best be in h.dom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather abstract this to a method in h.dom, just to give it a bit more semantic meaning.

OK, done now

@sideshowbarker
Copy link
Contributor Author

You cite the possibility of these refs still getting number-suffixed; do we want to allow the author to specify an attribute on the domintro ancestor whose value will be used to further dedup the IDs in that block?

Yes, good point — I’ll add that.

As far as the attribute name, if we want consistency with what we do for the HTML spec, we could call it subdfn. That’s admittedly a bit arcane — but then so is domintro as a class name.

But anyway maybe having consistency with what HTML does isn’t super important.

@sideshowbarker
Copy link
Contributor Author

As far as the attribute name, if we want consistency with what we do for the HTML spec, we could call it subdfn. That’s admittedly a bit arcane — but then so is domintro as a class name.

To be clear a bit more clear about what subdfn does — it’d be a value-less boolean attribute that effectively means “don’t de-dupe the ID generated for this element”. So if the element is a reference to a foo dfn, then subdfn on its own would deterministically ensure the element gets a foo-dev ID with no number suffix.

Anyway, I’ll take a shot at implementing it that way, and see if I can make it work.

For any element with an attribute named “subdfn”, this change ensures
the element’s ID remains stable rather than getting “de-duped”; that is,
this change prevents any element with a “subdfn” attribute from being
among the ones that end up with ①, ②, etc., numbered suffixes appended.
@sideshowbarker
Copy link
Contributor Author

You cite the possibility of these refs still getting number-suffixed; do we want to allow the author to specify an attribute on the domintro ancestor whose value will be used to further dedup the IDs in that block? If it's absent we'd just continue to use the approach already outlined here.

OK, implemented that using subdfn as the attribute name, and also added info about it to the docs

@tabatkins
Copy link
Collaborator

So if I'm understanding this code correctly, the presence of subdfn means that the link "claims" the un-numbered version of the dedupe'd ID (thus ensuring that its ID isn't dependent on the presence or absence of other refs earlier in the document), right?

Hm. It seems, then, like people would still need to liberally sprinkle subdfn in their document to actually get anything like a guarantee of stability, right? If they forget, then it's reasonably likely they might still run into a collision; only convention (WHATWG doesn't put many links in its domintro sections besides the one corresponding to the thing being defined) makes the "-dev" switch much more likely to succeed at stability. And even if they do use it, it still depends on site authors manually grabbing the ID from a link that doesn't expose it in a natural way.

I think I'd like to revisit this problem a little more thoroly and solve it more reliably.


So, the problem at hand is that domintro section, which provide an informative description of something officially defined somewhere nearby, are often more useful for reference sites to link to, since they're more human-readable for authors. However, domintro blocks don't guarantee that they have an ID or expose that ID for linkability if they have it, and so people end up linking to the ID of the main link instead; that link, however, just has a generated ID which is very much not stable if the document preceding it changes.

So here's another proposal:

  1. Formalize domintro sections a little bit more, ensuring they're fit for this purpose. Require an ID on them, and visibly expose that anchor like we do for headings and such, so it's easy to spot what should be linked to.
  2. Possibly, allow an author to indicate the "primary" link in a domintro section instead; this removes the need to manually provide an ID. Instead, we generate an ID for that link in a guaranteed-stable way and then visibly expose that link on the domintro block. (That is, the visible "§" or whatever links to the indicated link, rather than to the domintro block itself.)
  3. Maybe still do an alternative ref-generation scheme on the rest of the links in a domintro, like what you have here, just so if someone still grabs one of those links manually it's at least more likely that the ID will be stable, even if it's not guaranteed.

Anything else I'm missing that we could address?

@sideshowbarker
Copy link
Contributor Author

So if I'm understanding this code correctly, the presence of subdfn means that the link "claims" the un-numbered version of the dedupe'd ID (thus ensuring that its ID isn't dependent on the presence or absence of other refs earlier in the document), right?

Yes, that’s it exactly.

It seems, then, like people would still need to liberally sprinkle subdfn in their document to actually get anything like a guarantee of stability, right?

Right, yeah — that’s the downside.

I think I'd like to revisit this problem a little more thoroughly and solve it more reliably.

Sounds great — I’m definitely not wedded to the particular implementation if this I wrote up.

  1. Formalize domintro sections a little bit more, ensuring they're fit for this purpose. Require an ID on them, and visibly expose that anchor like we do for headings and such, so it's easy to spot what should be linked to.

✅ All sounds reasonable

2. Possibly, allow an author to indicate the "primary" link in a domintro section instead; this removes the need to manually provide an ID. Instead, we generate an ID for that link in a guaranteed-stable way and then visibly expose that link on the domintro block. (That is, the visible "§" or whatever links to the indicated link, rather than to the domintro block itself.)

✅ I very much like the idea of exposing a visible indicator — which’ll help encourage people to use that anchor for citing elsewhere.

I think we do need some way for authors to indicate which reference in the primary link — rather than just taking the first reference in document order within a domintro section. The reason for that — which I maybe mentioned earlier — is that we do in fact have a number of domintro sections in which there one or more references in a domintro block that in document order precede what need to be the “primary” reference.

3. Maybe still do an alternative ref-generation scheme on the rest of the links in a domintro, like what you have here, just so if someone still grabs one of those links manually it's at least more likely that the ID will be stable, even if it's not guaranteed.

✅ OK — though perhaps we’re going to end up finding there will be no need for that if we get the mechanism working that you outlined in the steps #1 and #2.

Anything else I'm missing that we could address?

I don’t think you missed anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants