Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Web Pathing Specification: initial outline with TODOs #453

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lidel
Copy link
Member

@lidel lidel commented Nov 12, 2023

The goal of this specification is to close #432 and define a subset of possible content paths that ensures compatibility with existing HTTP and Web Platform standards, and have clear MUSTs and SHOULDs that we can use when discussing implementation details of projects like ipfs-chromium's Intent to Prototype: Verifying IPFS client.

Pushing an extremely early draft of the scope to get early feedback.

Everyone is invited to comment on the PR, focusing on TODOs, MUSTs and SHOULDs and suggest improvements, especially if something is missing 🙏

pushing extremely early draft of the scope to get early feedback
from stakeholders that requested this specification to be created
@lidel lidel changed the title web pathing: initial outline with TODOs [wip] Web Pathing Specification: initial outline with TODOs Nov 12, 2023
The resulting specification should be detailed enough to allow competing,
interoperable implementations.

### TODO: things to cover
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Stebalien @dignifiedquire @hacdias @aschmahmann @Jorropo @rvagg @ribasushi @alanshaw @2color @autonome @darobin for visibility and sourcing early feedback on the scope of this spec.

Feel free to drop a comment about any tricky/painful pathing edge cases you've encountered over the years that we should clarify web behavior for by including them in this spec 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +60 to +61
- `sha2-384` (`0x20`, aka SHA-384; as specified by [FIPS 180-4](https://csrc.nist.gov/pubs/fips/180-4/upd1/final)) TODO: where is this used? why is this on the list?
- sha3-512 TODO: code for such label does not exist, a typo in prior notes? follow up required
Copy link
Member Author

@lidel lidel Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@John-LittleBearLabs these two were included in your draft for WICG proposal, do you remember the reason/source?

I've found the code for the second one in https://github.com/multiformats/multicodec/blob/master/table.csv but not sure if we intended sha3 (0x14) or should switch to sha2 (0x13) here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by 'label'?

I don't recall, no. sha2-384 doesn't ring a bell - perhaps it was one of the comments that's now deleted (hackmd doesn't seem to let me mark things as resolved/hidden). As for sha3-512... it was probably not a good source; I think what it was was I found someone somewhere was talking about future-proofing hashes and I looked for one of the recommendations that also was marked as permanent in the table.

I'm definitely open to this list being altered.


### TODO: things to cover

- TODO: why it's called "web pathing": ensuring pathing is interoperable with how existing http and web platform works; covers both /ipfs and /ipns namespace semantics; defines logical content root CID that can be mapped to URL / root which enables subdomain/dnslink gateways and ipfs:// and ipns:// protocol handlers to load existing datasets, websites, and assets with relative pathing without the need for modifying them;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it's called "web pathing"

I'm curious about this myself. It doesn't strike me as being particularly web-specific, at least not immediately.

Copy link
Collaborator

@bumblefudge bumblefudge Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would "URL-safe pathing pathing" or "web-deterministic pathing" or "web-compatible pathing" be more precise? it isn't pathing FOR or OF the web, but rather a web-compatible subset of the pathing currently possible with the tech to date, right?

- TODO make it clear if both DAG variants of CBOR and JSON are a MUST, or if JSON is a SHOULD (right now conformance tests require both as a MUST).

- TODO: MUST what happens when we can't traverse part of the path
- TODO: separate errors for traversal errors due to missing codec vs missing content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

TODO: List relevant CIDs. Describe how implementations can use them to determine
specification compliance.

TODO: [gateway-conformance](https://github.com/ipfs/gateway-conformance) tests for all MUSTs in this spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍


- TODO: MUST support UnixFS pathing
- TODO: traversing HAMTs
- TODO: traversing symlinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions, and not sure if I should be instead commenting here ?

There's a few basic forms I could imagine this working in, and they're not necessarily incompatible:

  • /ipfs/cid1/a = "/ipfs/cid2/c" : /ipfs/cid1/a/b -> /ipfs/cid2/c/b
    • Replace all left of and including current path element with link contents.
    • IIRC I believe the gateway conformance test has this, so I'm guessing this is the real thing.
    • Are we allowed to link to /ipns/ namespace?
    • If so... even DNSLink? The link would still be immutable, but fully resolved what looks like part of your tree now depends on your local DNS setup?
  • /ipfs/cid1/a = "c" : /ipfs/cid1/a/b -> /ipfs/cid1/c/b (i.e. not starting with /)
    • Replace current path element with link contents.
    • I read someone talking about converting tar to car and if so there's an important special case...
    • "../b" : If allowed we might need rules about this.
  • /ipfs/cid1/a/b = "/c" : /ipfs/cid1/a/b -> /ipfs/cid1/c
    • Replace current path element and everything between the root and current element.
    • Need rules about DAGs that contain a directory under root named either /ipfs/ or /ipns/ etc.
    • I don't love features that break DAG symmetry, but others seem to 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how does this interact with _redirects (since it both has to be in root and its redirects can be relative to the root)?
Site A has a _redirects with splat to /a.html
Site B has a symlink (called link) to a's root, and its own redirects splat to /b.html
ipfs://B/link/notfound.html
becomes what exactly?

In my current PR it would redirect to ipfs://B/link/a.html (e.g. it respects A's _redirects file, and does it relative to A's root). But if A did not have a redirect, it would be not found (e.g. B's _redirects is ignored).

Feels weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@John-LittleBearLabs (I've realized we've discussed this during one of sync calls but did not reply here)

  • symlinks are generally underspecified and not used much. I would mark this as unspecified behavior in this spec until we land Publish UnixFS specifications at specs.ipfs.tech #331
  • that being said, if you already implemented symlink support, its ok, only caveat is that following symlink should not allow for going beyond the content root (/ipfs/cid), so /ipfs/cid1/a pointing at /ipfs/cid2/b or ../cid2/b must error
  • rules from _redirects are executed only when requested content path is missing within same origin (based on root CID). in scenario you described you operate under origin B and it is not aware of _redirects from origin A (so _redirects is not executed)

- TODO: multicodecs that are required to facilitate path traversal
- DAG-PB
- RAW
- libp2p-key (for IPNS names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is verfying an IPNS record outside the scope of this document? It's not exactly pathing, even if that's where it my codebase it happens to show up.

This makes me think there really are only 4 we care about, and 2 of them are MAY, and none of them are listed as permanent here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. My initial idea is to refer to IPNS spec which states that only Ed25519 is a MUST (RSA is SHOULD, other key types are MAY).

The resulting specification should be detailed enough to allow competing,
interoperable implementations.

### TODO: things to cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- TODO: MUSTs, SHOULDs and MAYs in relation to

- TODO: multihash functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention of this section to clarify baseline multihash & codecs that must be supported to provide content for libraries such as @helia/verified-fetch?


- TODO: cid versions
- MUST:
- CIDv1 (`0x01`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we MUST support CIDv1, we should call out the multibase/hash/codecs that aren't guaranteed to be supported by web-pathing spec implementers.

- RAW
- libp2p-key (for IPNS names)
- DAG-CBOR
- DAG-JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should MUST raw JSON as well, or is the intent to use RAW for that?


TODO: Explain the security implications/considerations relevant to the spec.

TODO: length limit for entire path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we limiting to browser URLs, or do we want to support longer lengths? https://stackoverflow.com/a/417184/592760 is a really thorough answer talking about variants.

TODO: Explain the security implications/considerations relevant to the spec.

TODO: length limit for entire path
TODO: length limit for a path segment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should limit path segment lengths, but we should prevent / in path segments opposite of IPLD pathing


- TODO: MUST what happens when we can't traverse part of the path
- TODO: separate errors for traversal errors due to missing codec vs missing content
- TODO: `/ipfs/valid-cid-dag-pb/invalid-path` (logical "not found", translates to HTTP 404 to indicate content does not exist, mention implicit http caching of 404 vs 500 – )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: a browser/HTTP specific section with additional behaviors that are possible when HTTP redirects can be executed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants