Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft versions need to relate to each other #6

Closed
strogonoff opened this issue Dec 31, 2021 · 30 comments
Closed

Draft versions need to relate to each other #6

strogonoff opened this issue Dec 31, 2021 · 30 comments
Assignees
Labels
enhancement New feature or request

Comments

@strogonoff
Copy link

  • We have this draft: draft-tsou-softwire-6rd-multicast-01
  • We also have this draft: draft-tsou-softwire-6rd-multicast-02
  • It looks like the second document supersedes the first document, but there is no way to discover the relationship
@strogonoff strogonoff changed the title Draft versions don’t relate to each other. Draft versions don’t relate to each other Dec 31, 2021
@ronaldtse
Copy link

@strogonoff in IETF Internet-Drafts, the last two numeric digits provide a sequential "draft number", as described here: ietf-ribose/bibxml-service#13

i.e.

Pattern 2: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.draft-{example-name}-{draft-number}.xml

The draft number is a strictly increasing number validated by the Datatracker service when first uploaded. From the filename, it can be immediately inferred that draft-tsou-softwire-6rd-multicast-02 supersedes draft-tsou-softwire-6rd-multicast-01.

However, some drafts start with a draft number 00. If we know that a 00 document exists, we know 01 supersedes it. If we only have 01, we do not know if 00 exists.

@strogonoff
Copy link
Author

@ronaldtse Why was this closed? There are documents that supersede each other. Relaton provides a field for that, but it’s not being used.

This shouldn’t be difficult at GHA stage. While parsing 01 check if version 00 exists, and if so then create a relation. Different approaches are possible (a single pass with drafts sorted by ID in advance, or a double pass that fills in relations after initial metadata is formed).

@strogonoff
Copy link
Author

In fact, no, we should not even check if versions exist. If we know at build time that draft 01 supersedes 00, it is semantically correct to link to 00. Maybe it doesn’t exist yet, but that’s not the problem of Relaton.

@strogonoff
Copy link
Author

This is important from UX standpoint. BibXML service provides a way to go back/forward to superseded/superseding versions, and missing relations mean readers don’t get this opportunity with Internet Drafts.

@strogonoff strogonoff reopened this Jan 24, 2022
@ronaldtse
Copy link

@strogonoff can you explain what we need to do here? You want Relaton to analyse the supersession relationships? If so please help file a ticket at relaton-ietf.

@ronaldtse ronaldtse added the question Further information is requested label Feb 18, 2022
@ronaldtse ronaldtse changed the title Draft versions don’t relate to each other Draft versions need to relate to each other Feb 18, 2022
@strogonoff
Copy link
Author

@strogonoff can you explain what we need to do here? You want Relaton to analyse the supersession relationships? If so please help file a ticket at relaton-ietf.

I think they should, because they supersede each other, but I have filed this as a question for a reason.

@andrew2net
Copy link
Contributor

I don't understand what the question is. If we need to implement the relations for Internet-Drafts, then ok, it's possible to do.

@TonyLHansen
Copy link

#11 is absolutely correct: there are two patterns of the names

Legacy pattern(s) to implement:

Pattern 1: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.{example-name}.xml
Pattern 2: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.draft-{example-name}-{draft-number}.xml

(The draft number will sometimes be referred to as the sequence number or generation number.)
(Note: The "draft-" prefix (after "reference.I-D.") is an important part of the differentiator for the patterns to indicate that there IS a draft number at the end.)

I did a check of the IDs collected on tools.ietf.org. There are 36016 drafts with -00, and 22662 with -01, sequence numbers. Out of the -01 drafts, there are only 268 where a -00 is not also saved. So 0.7% of the drafts with a -01 saved there did not have a -00 preceding that. So with 99.3% certainty, I can claim that series almost always start with a draft number of -00.

However, the IDs collected on tools.ietf.org are not complete. There are cases where, for example, a -07 is stored, but the data tracker has evidence of -00 through -06 existing. On the flip side, there are a number of drafts that tools.ietf.org has that the datatracker doesn't.

We can definitely say that there is a definite relationship between -00 and -01, and between -01 and -02, etc.

In the rfc-index, there are some documents that were assigned numbers, but were never issued. They are still catalogued, but the data for them says "Not issued".

I think the best path forward is to assume that the relationship exists, but in some strange cases, a given sequence number might not have been issued or is missing from the various databases.

@strogonoff
Copy link
Author

strogonoff commented Feb 20, 2022

@andrew2net: I filed this to at least clarify how things should work, even if nothing is to be done. I work on a service that provides access to the data, but I am not deeply familiar with how the data should look like and organizational specifics.

To my view it looked like Internet Draft versions are documents that supersede each other. If so, they probably should relate this way and if e.g. no data for a previous version exists (in cases like @TonyLHansen pointed out) then relation could be empty.

Or maybe Internet Draft versions are actually a single document, just with version history exposed as separate bibliographic items. That would imply that in our data a single document does not mean a single bibliographic item, and this should be clarified. (The service already kind of allows this, by putting multiple bibliographic items with the same identifier—which I-D versions have—on the same page, but whether it’s a good design decision or a workaround the lack of relationships is unclear.) Then adding relations could be conceptually wrong? I don’t know. It’s a subtle distinction…

@TonyLHansen
Copy link

The drafts do form a series of documents, with each version superceding the previous one for the series. There is a definite relationship.

Each version can also be individually referenced, allowing us to reference something that was said specifically in (say) version -03 of the draft, and something else that was said specifically in (say) version -17.

I don't know if it affects the work here, but there are also relationships between series when drafts become adopted by working groups, or divorced from working groups. The datatracker knows most of this data. These relationships probably do NOT need to be stored in this database.

@ronaldtse
Copy link

Thanks @TonyLHansen , agree that we should have a main "I-D" that groups the versions together whenever possible.

That said there is a potential issue for recognizing the name of the document. There are 200 documents that has a name that ends with '-\d+'.

e.g.

  • reference.I-D.draft-weis-gdoi-iec62351-9-00.xml
  • reference.I-D.draft-schulzrinne-sip-911-01.xml

If we limit the pattern to end with \-\d\d, we still have 49:

reference.I-D.draft-farmer-6man-exceptions-64-00.xml
reference.I-D.draft-farmer-6man-exceptions-64-01.xml
reference.I-D.draft-farmer-6man-exceptions-64-02.xml
reference.I-D.draft-farmer-6man-exceptions-64-03.xml
reference.I-D.draft-farmer-6man-exceptions-64-04.xml
reference.I-D.draft-farmer-6man-exceptions-64-05.xml
reference.I-D.draft-farmer-6man-exceptions-64-06.xml
reference.I-D.draft-farmer-6man-exceptions-64-07.xml
reference.I-D.draft-farmer-6man-exceptions-64-08.xml
reference.I-D.draft-farmer-6man-exceptions-64-09.xml
reference.I-D.draft-farmer-6man-routing-64-00.xml
reference.I-D.draft-farmer-6man-routing-64-01.xml
reference.I-D.draft-farmer-6man-routing-64-02.xml
reference.I-D.draft-ietf-16ng-ip-over-ethernet-over-802-dot-16-12.xml
reference.I-D.draft-ietf-ipsec-ah-hmac-md5-96-00.xml
reference.I-D.draft-ietf-ipsec-ah-hmac-sha-1-96-00.xml
reference.I-D.draft-ietf-ipsec-auth-hmac-md5-96-02.xml
reference.I-D.draft-ietf-ipsec-auth-hmac-ripemd-160-96-03.xml
reference.I-D.draft-ietf-nfsv4-03-00.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-00.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-01.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-02.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-03.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-04.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-05.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-06.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-07.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-08.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-09.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-10.xml
reference.I-D.draft-ietf-tsvwg-ieee-802-11-11.xml
reference.I-D.draft-mahy-sipping-16-04.xml
reference.I-D.draft-songlee-aes-cmac-96-04.xml
reference.I-D.draft-spaghetti-idr-deprecate-8-9-10-00.xml
reference.I-D.draft-srinivasan-fr-over-mpls-with-frf-16-00.xml
reference.I-D.draft-szigeti-tsvwg-ieee-802-11-00.xml
reference.I-D.draft-szigeti-tsvwg-ieee-802-11-01.xml
reference.I-D.draft-szigeti-tsvwg-ieee-802-11-02.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-00.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-01.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-02.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-03.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-04.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-05.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-06.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-07.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-08.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-09.xml
reference.I-D.draft-zhang-mif-api-extension-802-21-10.xml

The confusion is this: if we are given a document reference reference.I-D.draft-szigeti-tsvwg-ieee-802-11.xml, the service does not necessarily know whether this is a versioned I-D or an unversioned I-D.

Possibly not a big deal for existing documents, since we can index this pattern. However, for new documents, the IETF may want to enforce that the I-D name cannot end in 2-digits.

I don't know if it affects the work here, but there are also relationships between series when drafts become adopted by working groups, or divorced from working groups. The datatracker knows most of this data. These relationships probably do NOT need to be stored in this database.

Probably not. This is interesting information but probably not necessary for citation purposes.

@TonyLHansen
Copy link

This statement is false:

The confusion is this: if we are given a document reference reference.I-D.draft-szigeti-tsvwg-ieee-802-11.xml, the service does not necessarily know whether this is a versioned I-D or an unversioned I-D.

That is where the "draft-" at the beginning comes into play. This is ONLY present with the versioned documents. For the UN-versioned names, "draft-" must NOT be at the beginning of the name. (After reference.I-D." of course.)

reference.I-D.draft-zhang-mif-api-extension-802-21-00.xml # versioned
reference.I-D.zhang-mif-api-extension-802-21.xml # unversioned
reference.I-D.zhang-mif-api-extension-802-21-00.xml # NOT ALLOWED
reference.I-D.draft-zhang-mif-api-extension-802-21.xml # NOT ALLOWED

There is NO ambiguity i the reference names.

@ronaldtse
Copy link

@TonyLHansen thank you for the clarification! I have been confused about this all along.

@TonyLHansen
Copy link

Traditionally, the invalid paths did not work.

When the TCL scripts broke, I "temporarily" switched to a redirect script that fails to block the invalid paths.

But please do reject the invalid paths in the new implementation.

@andrew2net
Copy link
Contributor

That said there is a potential issue for recognizing the name of the document. There are 200 documents that has a name that ends with '-\d+'.

@ronaldtse I noticed the Internet-Draft documents have an anchor without a version. For example, the document draft-weis-gdoi-iec62351-9-00 has an anchor I-D.weis-gdoi-iec62351-9. It allows recognizing the version in the name of the document.

@ronaldtse
Copy link

Thank you @TonyLHansen for the determination, this will greatly help @strogonoff refine the correct behavior for the BibXML service!

@strogonoff
Copy link
Author

strogonoff commented Mar 25, 2022

@ronaldtse Regarding this:

I guess the real confusion is this: in the current implementation, all of these paths work.

Actually, this issue does not really relate to returned XML data, and actually should not affect xml2rfc paths or XML output. The relationships are made use of in GUI only, when users search for and explore documents.

@ronaldtse
Copy link

ronaldtse commented Mar 26, 2022

this issue does not really relate to returned XML data, and actually should not affect xml2rfc paths or XML output. The relationships are made use of in GUI only, when users search for and explore documents.

@strogonoff maybe there is some misunderstanding in my comment https://github.com/ietf-ribose/relaton-data-ids/issues/6#issuecomment-1047371606. It does affect the URL paths patterns

These paths work, and are correct:

These paths work right now, but are incorrect and should not work (return a 404 instead):

What I meant is that the BibXML service should reject the last two path patterns, which is what @TonyLHansen requested.

@strogonoff
Copy link
Author

strogonoff commented Mar 27, 2022

@ronaldtse

These paths work right now, but are incorrect and should not work (return a 404 instead):

This may be a fine distinction, but I believe the requirement was that preexisting paths should return correct data, while behavior for not-exactly-matching paths was not specified (so if not-exactly-correct path returns the same data, it does not violate that requirement).

If it is a requirement that non-matching paths should necessarily return 404, then some logic at xml2rfc path compatibility layer needs to be adjusted ASAP.

Based on Tony’s comment requesting

do reject the invalid paths in the new implementation.

I take it that we need this. Should be done within the upcoming week…

@strogonoff
Copy link
Author

strogonoff commented Mar 27, 2022

Correction, I think the issue is less global than I initially thought. I’ll just make it so that versioned URLs for I-Ds are rejected, while other xml2rfc-style paths maintain their existing behavior.

(NOTE: this means non-exactly-matching xml2rfc paths may return bibliographic data and not 404. If this is definitely undesirable let me know. It may be tricky to implement since we need to deal with new bibliographic data being available under xml2rfc paths.)

Should be done by Monday (https://github.com/ietf-ribose/bibxml-service/issues/157)

@ronaldtse
Copy link

Not sure why this is so complicated?

It just means:

  • Unversioned I-D has path pattern: reference.I-D.xxx.xml
  • Versioned I-D has path pattern: reference.I-D.draft-xxx-nn.xml

In the following cases, the path should return 404:

  • Unversioned I-D has path pattern: reference.I-D.draft-xxx.xml
  • Versioned I-D has path pattern: reference.I-D.xxx-nn.xml

This change ONLY applies to I-Ds.

(NOTE: this means non-exactly-matching xml2rfc paths may return bibliographic data and not 404. If this is definitely undesirable let me know. It may be tricky to implement since we need to deal with new bibliographic data being available under xml2rfc paths.)

This should not happen because by definition, the name of a draft never starts with draft-xxx.

@strogonoff
Copy link
Author

strogonoff commented Mar 29, 2022

This change ONLY applies to I-Ds.

Yes, when I understood that it is a simple change. At first I thought this was a request for all xml2rfc paths. Due to fuzzy matching, it is by design that they may return valid data for more than one path, so inexact paths do not guarantee 404.

I-D behavior is a special case of the above behavior, and a specific provision for I-Ds can be made to return 404 for versioned paths.

@strogonoff
Copy link
Author

Since https://github.com/ietf-ribose/relaton-data-ids/issues/15 is stalled for now (Nick is against using primary ID and I can’t switch to docnumber), let’s add these superseded/supersedes relations between I-D versions? GUI needs to give the user a way to navigate to the latest draft at least clicking through relations. cc @ronaldtse

@ronaldtse
Copy link

ronaldtse commented Apr 3, 2022

Sorry I think this thread got sidetracked possibly by my comment a while ago.

What @strogonoff needs here is this:

  • For each bibliographic item, create a supersedes relation between draft-xx-{nn} and draft-xx-{nn+1},
    • draft-xx-{nn} has a relationship of superseded_by to draft-xx-{nn+1}
    • draft-xx-{nn+1} has a relationship of supersedes to draft-xx-{nn}
  • There are two catches we need to account for:
    • The first draft may be 00 or 01 or any number. The first draft does not supersede any other item (for now).
    • The draft number sequence may be disjoint, e.g. 00 then 02. We have to accommodate this situation.
    • An item with the latest draft number is clearly not superseded by any other item.

These relationships are important for the BibXML service to be able to show the an I-D in a series of versions.

@andrew2net can you help implement this in relaton-ietf?

@ronaldtse ronaldtse added enhancement New feature or request and removed question Further information is requested labels Apr 3, 2022
@andrew2net
Copy link
Contributor

These relationships are important for the BibXML service to be able to show the an I-D in a series of versions.

@andrew2net can you help implement this in relaton-ietf?

@ronaldtse sure, I can. As soon as finish relaton-w3c and relaton-bipm

@ronaldtse
Copy link

@strogonoff this issue belongs in relaton-ietf. I'm creating an issue there.

@ronaldtse
Copy link

Since #15 is stalled for now (Nick is against using primary ID and I can’t switch to docnumber), let’s add these superseded/supersedes relations between I-D versions? GUI needs to give the user a way to navigate to the latest draft at least clicking through relations. cc @ronaldtse

I've created the corresponding issues that deal with this. I believe this is sufficient for the current use case, let me know if not.

@ronaldtse
Copy link

@strogonoff is this completed? If so please help close it. Thanks.

@rjsparks
Copy link
Member

@strogonoff, @ronaldtse - I'm closing this - if there's anything left to do, please reopen, or better yet - create other issues for what remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants