From c1e121eb0c569246c13f877f587c20f5039aaed4 Mon Sep 17 00:00:00 2001 From: Jorropo Date: Wed, 11 Oct 2023 06:18:01 +0200 Subject: [PATCH 1/4] ipip(0445): add skip-leaves --- src/http-gateways/trustless-gateway.md | 22 +++++- src/ipips/ipip-0445.md | 105 +++++++++++++++++++++++++ 2 files changed, 126 insertions(+), 1 deletion(-) create mode 100644 src/ipips/ipip-0445.md diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 949e2b0bf..cf7528743 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -214,7 +214,7 @@ The Body hash MUST match the Multihash from the requested CID. A CAR stream for the requested [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) -content type (with optional `order` and `dups` params), path and optional +content type (with optional `order`, `dups` and `skip-leaves` params), path and optional `dag-scope` and `entity-bytes` URL parameters. ## CAR version @@ -301,6 +301,26 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as the raw data is already present in the parent block that links to the identity CID. +## CAR `skip-leaves` (content type parameter) + +The `skip-leaves` parameter specifies whether blocks with the multicodec `raw` +`0x55` must be sent. + +It accepts two values: +- `y`: Blocks with `raw` multicodec MUST NOT be sent. +- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent. + +A gateway MUST NOT assume this field is `y` if unspecified. +When not specified it always MUST be understood as `n`. + +:::note Notes for implementers + +A request which is rooted at a `raw` block and has `skip-leaves=y` does not +make sense and SHOULD NOT be sent by clients, it is fair for servers to +error in this situation. + +::: + ## CAR format parameters and determinism The default header and block order in a CAR format is not specified by IPLD specifications. diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md new file mode 100644 index 000000000..9e57d0306 --- /dev/null +++ b/src/ipips/ipip-0445.md @@ -0,0 +1,105 @@ +--- +title: "IPIP-0445: trustless gateway skip-leaves option" +date: 2023-10-09 +ipip: open +editors: + - name: Hugo VALTIER + github: Jorropo + url: https://jorropo.net/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ +relatedIssues: + - https://github.com/ipfs/specs/issues/444 +order: 445 +tags: ['ipips'] +--- + +## Summary + +Introduce `skip-leaves` flag for the :cite[trustless-gateway]. + +## Motivation + +Allow clients to read a stream which only contain proofs in a bottom heavy +graph using `raw` codec for it's leaves. + +Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444). + +## Detailed design + +The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway] +allows clients to download an entity except blocks with the multicodec +`raw` (`0x55`). + +- When set to `y`, the parameter instructs the gateway not to transmit + blocks tagged with the `raw` multicodec. +- If set to `n`, or left unspecified, the gateway MUST transmit `raw` + multicodec blocks. + +Importantly, unless explicitly specified as `y`, the default operational +mode of the gateway MUST assume the value of `skip-leaves` to be `n`. + +## Design rationale + +### User Benefit + +Implementing the `skip-leaves` parameter offers several benefits to users: + +1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received + files in their deserialized form without necessitating the transmission of + raw blocks from the gateway. +2. **Incremental Download:** Clients can incrementally download files in + deserialized forms from non-IPFS servers. Allowing applications to share + distribution for IPFS and non IPFS clients. +3. **Efficient Block Discovery:** With the `skip-leaves` option enabled, + clients can quickly discover numerous candidate blocks without being + bottlenecked by the gateway's transmission of raw blocks. + +### Compatibility + +Setting the default value of the `skip-leaves` parameter to `n` ensures +backward compatibility with existing clients and systems that are unaware +of this new flag. + +### Prevention of Amplification Attacks and Efficient Server Operation + +By utilizing the `raw` (`0x55`) codec servers can trivially determine whether +to fetch or skip a block without having to learn any new information. +Although more limited and not able to handle unixfs file using dag-pb for their +leaves, it allows both the client and server to trivially verify a block +must not be fetched. Preventing issues of Amplification where a server could +need to fetch multiple orders more data than the client when executing the +request. + +### Why not `dag-scope=skip-leaves` ? + +The `dag-scope` parameter determines the overall range of blocks to retrieve, +while `skip-leaves` selectively filters specific blocks within that range. +Combining them under one parameter would restrict their combined utility. + +For example: +- A client is streaming a video from a webseed and the user seeked through the + video, then the client would send `dag-scope=entity&entity-bytes=42:1337` + with `skip-leaves=y` to download the proofs for the required section of the + video. +- A client is verifying an OOB transfered directory in deserialized form, + then `dag-scope=all` with `skip-leaves=y` makes sense. + +### Alternatives + +An alternative approach would be to request blocks individually. +However it adds extra round trips and more per HTTP request overhead +and thus is undesireable. + +## Security + +None. + +## Test fixtures + +TODO + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 131b29d2c39c0f210d1f4449d9d416793350538c Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 25 Oct 2023 20:04:00 +0200 Subject: [PATCH 2/4] ipip-445: rename to skip-raw-blocks URL param + basic editorials --- src/http-gateways/trustless-gateway.md | 50 ++++----- src/ipips/ipip-0445.md | 140 ++++++++++++++++++------- 2 files changed, 131 insertions(+), 59 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index cf7528743..2fcc9d895 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -183,6 +183,28 @@ returned: returned to the client, the HTTP status code has already been sent to the client. +### :dfn[skip-raw-blocks] (request query parameter) + +The optional `skip-raw-blocks` parameter is available only for CAR requests. + +It specifies whether blocks with the multicodec `raw` `0x55` MUST be present in +the CAR response. + +It accepts two values: +- `y`: Blocks with `raw` multicodec MUST NOT be returned. +- `n`, or missing (unspecified): no-op, no special handling of `raw` blocks. + +When not specified a gateway implementation MUST assume `n`. + +:::note Notes for implementers + +A `skip-raw-blocks=y` request for a content path with `raw` root CID does not +make sense and SHOULD NOT be sent by clients. + +A Gateway SHOULD return HTTP error 400 Bad Request + +::: + # HTTP Response Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway]. @@ -212,10 +234,10 @@ The Body hash MUST match the Multihash from the requested CID. # CAR Responses (application/vnd.ipld.car) -A CAR stream for the requested -[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) -content type (with optional `order`, `dups` and `skip-leaves` params), path and optional -`dag-scope` and `entity-bytes` URL parameters. +A CAR stream ([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) +with optional `order` and `dups` content type parameters) for the requested +content path (and optional `dag-scope`, `entity-bytes` and/or `skip-raw-blocks` +URL parameters). ## CAR version @@ -301,26 +323,6 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as the raw data is already present in the parent block that links to the identity CID. -## CAR `skip-leaves` (content type parameter) - -The `skip-leaves` parameter specifies whether blocks with the multicodec `raw` -`0x55` must be sent. - -It accepts two values: -- `y`: Blocks with `raw` multicodec MUST NOT be sent. -- `n`, or unspecified: Blocks with `raw` multicodec MUST be sent. - -A gateway MUST NOT assume this field is `y` if unspecified. -When not specified it always MUST be understood as `n`. - -:::note Notes for implementers - -A request which is rooted at a `raw` block and has `skip-leaves=y` does not -make sense and SHOULD NOT be sent by clients, it is fair for servers to -error in this situation. - -::: - ## CAR format parameters and determinism The default header and block order in a CAR format is not specified by IPLD specifications. diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md index 9e57d0306..e1414eefb 100644 --- a/src/ipips/ipip-0445.md +++ b/src/ipips/ipip-0445.md @@ -1,14 +1,20 @@ --- -title: "IPIP-0445: trustless gateway skip-leaves option" +title: "IPIP-0445: Option to Skip Raw Blocks in Gateway Responses" date: 2023-10-09 ipip: open editors: - - name: Hugo VALTIER + - name: Hugo Valtier github: Jorropo url: https://jorropo.net/ affiliation: name: Protocol Labs url: https://protocol.ai/ + - name: Marcin Rataj + github: lidel + url: https://lidel.org/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ relatedIssues: - https://github.com/ipfs/specs/issues/444 order: 445 @@ -17,88 +23,152 @@ tags: ['ipips'] ## Summary -Introduce `skip-leaves` flag for the :cite[trustless-gateway]. +Introduce `skip-raw-blocks` flag for the :cite[trustless-gateway]. ## Motivation Allow clients to read a stream which only contain proofs in a bottom heavy graph using `raw` codec for it's leaves. -Usefull with unixfs for features like webseeds [#444](https://github.com/ipfs/specs/issues/444). +Usefull for UnixFS for features like webseeds +([ipfs/specs#444](https://github.com/ipfs/specs/issues/444)), where metadata +about a DAG is fetched from a trustless gateway, but the actual raw data can be +fetched from any source that supports either trustless gateway specification, +or plain HTTP Range Requests, allowing for trustless and verifiable data +retrieval from plain HTTP (non-IPFS) data sources. ## Detailed design -The `skip-leaves` CAR Content-Type parameter on :cite[trustless-gateway] +The `skip-raw-blocks` URL query parameter on :cite[trustless-gateway] allows clients to download an entity except blocks with the multicodec `raw` (`0x55`). - When set to `y`, the parameter instructs the gateway not to transmit - blocks tagged with the `raw` multicodec. -- If set to `n`, or left unspecified, the gateway MUST transmit `raw` - multicodec blocks. + blocks referenced with a CID with the `raw` multicodec. +- If set to `n`, or left unspecified, there is no special handling of `raw` + multicodec blocks (the existing default behavior remains the same). Importantly, unless explicitly specified as `y`, the default operational -mode of the gateway MUST assume the value of `skip-leaves` to be `n`. +mode of the gateway MUST assume the value of `skip-raw-blocks` to be `n`. ## Design rationale ### User Benefit -Implementing the `skip-leaves` parameter offers several benefits to users: +Implementing the `skip-raw-blocks` parameter offers several benefits to users: 1. **Verification Flexibility:** Clients can verify out-of-band (OOB) received files in their deserialized form without necessitating the transmission of raw blocks from the gateway. + 2. **Incremental Download:** Clients can incrementally download files in deserialized forms from non-IPFS servers. Allowing applications to share - distribution for IPFS and non IPFS clients. -3. **Efficient Block Discovery:** With the `skip-leaves` option enabled, + distribution for IPFS and non-IPFS clients. + +3. **Efficient Block Discovery:** With the `skip-raw-blocks` option enabled, clients can quickly discover numerous candidate blocks without being bottlenecked by the gateway's transmission of raw blocks. +4. **Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed + over HTTP in deserialized form can now act as sources for specific block + byte ranges, without having to support any IPFS specific APIs. Plain HTTP + Range Requests can be used for fetching remaining raw block data, and the + metadata read via `skip-raw-blocks=y` is enough for a client to verify the + remaining raw block byte ranges fetched from non-IPFS system match expected + CIDs. + ### Compatibility -Setting the default value of the `skip-leaves` parameter to `n` ensures +Setting the default value of the `skip-raw-blocks` parameter to `n` ensures backward compatibility with existing clients and systems that are unaware of this new flag. -### Prevention of Amplification Attacks and Efficient Server Operation +### Alternatives -By utilizing the `raw` (`0x55`) codec servers can trivially determine whether -to fetch or skip a block without having to learn any new information. -Although more limited and not able to handle unixfs file using dag-pb for their -leaves, it allows both the client and server to trivially verify a block -must not be fetched. Preventing issues of Amplification where a server could -need to fetch multiple orders more data than the client when executing the -request. +An alternative approach would be to request blocks individually. +However, it adds extra round trips and more per HTTP request overhead +and thus is undesirable. -### Why not `dag-scope=skip-leaves` ? +#### Why not `dag-scope=skip-raw-blocks` ? -The `dag-scope` parameter determines the overall range of blocks to retrieve, -while `skip-leaves` selectively filters specific blocks within that range. +The existing `dag-scope` parameter determines the overall range of blocks to retrieve, +while `skip-raw-blocks` selectively filters specific blocks across all scopes and ranges. Combining them under one parameter would restrict their combined utility. For example: -- A client is streaming a video from a webseed and the user seeked through the +- A client is streaming a video from a webseed and the user seeks through the video, then the client would send `dag-scope=entity&entity-bytes=42:1337` - with `skip-leaves=y` to download the proofs for the required section of the - video. -- A client is verifying an OOB transfered directory in deserialized form, - then `dag-scope=all` with `skip-leaves=y` makes sense. + with `skip-raw-blocks=y` to download the proofs for the required section of the + video, and then fetches remaining raw data byte ranges from a faster CDN. +- A client is verifying an OOB transferred directory in deserialized form, + then `dag-scope=all` with `skip-raw-blocks=y` makes sense. -### Alternatives +#### Why not CAR content type parameter ? -An alternative approach would be to request blocks individually. -However it adds extra round trips and more per HTTP request overhead -and thus is undesireable. +CAR content type's +([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) +optional parameters like `order` and `dups` impact the way data is represented +when returned as a CAR stream, but does modify the scope of the data itself. +Does not add nor subtract data from the response. + +The scope of the data is controlled by URL content path and optional +`dag-scope`, `entity-bytes` URL parameters. This is where `skip-raw-blocks` +belongs. + +This is not just a matter of aesthetics: the URL path and query parameters +allow for caching of different subsets of a DAG in a way that is interoperable +with existing HTTP tools and clients, minimizes risk of caching incomplete DAG +response due to HTTP cache misconfiguration. Thanks to `skip-raw-blocks` being +in the URL query, we ensure CAR responses without `raw` blocks will be cached +under different key than full responses (just like already existing `dag-scope` +and `entity-bytes`). + +#### Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks? + +Prevention of amplification attacks and efficient server operation. + +By utilizing the `raw` (`0x55`) codec servers can trivially determine whether +to fetch or skip a block without having to fetch it to learn any new +information. + +If we framed this feature around skipping all leaf nodes, that would require +server to fetch the leaves to learn if they have any child nodes. This would +force server to fetch data that is never returned to the client. + +Although `skip-raw-blocks` is more limited and not able to handle UnixFS files +chunked without `--raw-leaves` option, it allows both the client and server to +trivially verify a block must not be fetched. Preventing issues of +Amplification where a server could need to fetch multiple orders more data than +the client when executing the request. ## Security -None. +This IPIP does not impact security model of trustless gateway. ## Test fixtures -TODO +:::issue + +TODO: update below section with CIDs or CARs from conformance tests + +Scenarios we should check: +- [ ] reuse existing UnixFS DAG that has raw-leaves, request it with + `skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs +- [ ] create a new CAR fixture that only have non-raw blocks. Request it with + `skip-raw-blocks=y`, confirm the response includes expected CIDs and does not + include raw blocks referenced by parents. + - important part is creating CAR fixture by hand, and ensure the raw blocks are + NEVER announced anywhere (generate fixture with random data, add to ipfs + with raw-leaves option, then export DAG without `raw` blocks (use go-car's + [`filter`](https://github.com/ipld/go-car/tree/master/cmd/car#readme) or + similar) + - Why? This goes extra mile, but ensures every conformant gateway + implementation is not doing useless work of fetching raw blocks which are + not required for fulfilling `skip-raw-blocks=y` requests). We did + similar thing for `entity-bytes` and it was the only way we could show + bugs in Saturn project's cache implementation at the time. + +::: ### Copyright From f96a92a5262fba7c9f788192aaea3f37ab0d8f06 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 25 Oct 2023 20:20:52 +0200 Subject: [PATCH 3/4] ipip-445: HTTP 400 on raw root cid Ref. https://github.com/ipfs/specs/pull/445#discussion_r1357342245 --- src/http-gateways/trustless-gateway.md | 10 ++-------- src/ipips/ipip-0445.md | 1 + 2 files changed, 3 insertions(+), 8 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 2fcc9d895..6ae33f2ad 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -196,14 +196,8 @@ It accepts two values: When not specified a gateway implementation MUST assume `n`. -:::note Notes for implementers - -A `skip-raw-blocks=y` request for a content path with `raw` root CID does not -make sense and SHOULD NOT be sent by clients. - -A Gateway SHOULD return HTTP error 400 Bad Request - -::: +A Gateway MUST return HTTP error 400 Bad Request when `skip-raw-blocks=y` is +sent for a content path with a root CID with the `raw` multicodec. # HTTP Response diff --git a/src/ipips/ipip-0445.md b/src/ipips/ipip-0445.md index e1414eefb..07621ab65 100644 --- a/src/ipips/ipip-0445.md +++ b/src/ipips/ipip-0445.md @@ -152,6 +152,7 @@ This IPIP does not impact security model of trustless gateway. TODO: update below section with CIDs or CARs from conformance tests Scenarios we should check: +- [ ] request for `/ipfs/cid` where CID has `raw` codec MUST return HTTP 400 (Bad Request) - [ ] reuse existing UnixFS DAG that has raw-leaves, request it with `skip-raw-blocks=n`, confirm the response includes expected raw leaves' CIDs - [ ] create a new CAR fixture that only have non-raw blocks. Request it with From ceb8b1d6fd7eee2ec04dd60978847aa20f861571 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 9 Nov 2023 04:23:14 +0100 Subject: [PATCH 4/4] chore: update editors --- src/http-gateways/trustless-gateway.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 6ae33f2ad..82750ef04 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -10,9 +10,21 @@ editors: - name: Marcin Rataj github: lidel url: https://lidel.org/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ - name: Henrique Dias github: hacdias url: https://hacdias.com/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ + - name: Hugo Valtier + github: Jorropo + url: https://jorropo.net/ + affiliation: + name: Protocol Labs + url: https://protocol.ai/ xref: - url - path-gateway