From 1c2c1dd82ca90b4353fb272fcf47d9f950771686 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Mon, 11 Jan 2021 05:45:47 -0500 Subject: [PATCH 1/3] rfc: added rfc for handling arbitrary block sizes --- RFC/rfcBBL209/README.md | 104 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) create mode 100644 RFC/rfcBBL209/README.md diff --git a/RFC/rfcBBL209/README.md b/RFC/rfcBBL209/README.md new file mode 100644 index 0000000..0e92e14 --- /dev/null +++ b/RFC/rfcBBL209/README.md @@ -0,0 +1,104 @@ +# RFC|BB|L2-08: Handle Arbitrary Block Sizes + +* Status: `Brainstorm` + +## Abstract + +This RFC proposes adding a new type of data exchange to Bitswap for handling blocks of data arbitrarily larger than the 1MiB limit by using the features of common hash functions that allow for pausing and then resuming the hashes of large objects. + +## Shortcomings + +Bitswap has a maximum block size of 1MiB which means that it cannot transfer all forms of content addressed data. A prominent example of this is Git repos which even though they can be represented as a content addressed IPLD graph cannot necessarily be transferred over Bitswap if any of the objects in the repo exceed 1MiB. + +## Description + +The major hash functions work by taking some data `D` chunking it up into `n` pieces `P_0...P_n-1` then they modify an internal state `S` by loading pieces into the hash function in some way. This means that there are points in the hash function where we can pause processing and get the state of the hash function so far. Bitswap can utilize this state to effectively break up large blocks into smaller ones. + +### Example: Merkle–Damgård constructions like SHA-1 or SHA-2 + +MD pseudo-code looks roughly like: + +```golang +func Hash(D []byte) []byte { + pieces = getChunks(D) + + var S state + for i, p := range pieces { + S = process(S, p) // Call this result S_i + } + + return finalize(S) // Call this H, the final hash +} +``` + +From the above we can see that: + +1. At any point in the process of hashing D we could stop, say after piece `j`, save the state `S_j` and then resume later +2. We can always calculate the final hash `H` given only `S_j` and all the pieces `P_j+1..P_n-1` + +The implication for Bitswap is that if each piece size is not more than 1MiB then we can send the file **backwards** in 1MiB increments. In particular a server can send `(S_n-2, P_n-1)` and the client can use that to compute that `P_n-1` is in fact the last part of the data associated with the final hash `H`. The server can then send `(S_n-3, P_n-2)` and the client can calculate that `P_n-2` is the last block of `S_n-2` and therefore also the second to last block of `H`, and so on. + +#### Extension + +This scheme requires linearly downloading a file which is quite slow with even modest latencies. However, utilizing a scheme like [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) (i.e. downloading metadata manifests up front) we can make this fast/parallelizable + +#### Security + +In order for this scheme to be secure it must be true that there is only a single pair `(S_i-1, P_i)` that can be produced to match with `S_i`. If the pair must be of the form `(S_i-1, P_malicious)` then this is certainly true since otherwise one could create a collision on the overall hash function. However, given that there are two parameters to vary it seems possible this could be computationally easier than finding a collision on the overall hash function. + +#### SHA-3 + +While SHA-3 is not a Merkle–Damgård construction it follows the same psuedocode structure above + +### Example: Tree constructions like Blake3, Kangaroo-Twelve, or ParallelHash + +In tree constructions we are not restricted to downloading the file backwards and can instead download the parts of the file the we are looking for, which includes downloading the file forwards for sequential streaming. + +There is detail about how to do this for Blake3 in the [Blake3 paper](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf) section 6.4, Verified Streaming + +### Implementation Plan + +#### Bitswap changes + +* When a server responds to a request for a block if the block is too large then instead send a traversal order list of the block as defined by the particular hash function used (e.g. linear and backwards for SHA-1,2,3) +* Large Manifests + * If the list is more than 1MiB long then only send the first 1MiB along with an indicator that the manifest is not complete + * When the client is ready to process more of the manifest then it can send a request WANT_LARGE_BLOCK_MANIFEST containing the multihash of the entire large block and the last hash in the manifest +* When requesting subblocks send requests as `(full block multihash, start index, end index)` + * process subblock responses separately from full block responses verifying the results as they come in +* As in [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) specify how much trust goes into a given manifest, examples include + * download at most 20 unverified blocks at a time from a given manifest + * grow trust geometrically (e.g. 10 blocks, then if those are good 20, 40, ...) + +#### Datastore + +* Servers should cache/store a particular chunking for the traversal that is defined by the implementation for the particular hash function (e.g. 256 KiB segments for SHA-2) + * Once clients receive the full block they should process it and store the chunking, reusing the work from validating the block +* Clients and servers should have a way of aliasing large blocks as a concatenated set of smaller blocks +* Need to quarantine subblocks until the full block is verified as in [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) + +#### Hash function support + +* Add support for SHA-1/2 (should be very close to the same) +* Make it possible for people to register new hash functions locally, but some should be built into the protocol + +## Evaluation Plan + +* IPFS file transfer benchmarks as in [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) + +## Prior Work + +* This proposal is almost identical to the one @Stebalien proposed [here](https://discuss.ipfs.io/t/git-on-ipfs-links-and-references/730/6) +* Utilizes overlapping principles with [RFC|BB|L2 - Speed up Bitswap with GraphSync CID only query](https://github.com/protocol/beyond-bitswap/issues/25) + +### Alternatives + +An alternative way to deal with this problem would be if there was a succinct and efficient cryptographic proof that could be submitted that showed the equivalence of two different DAG structures under some constraints. For example, showing that a single large block with a SHA-2 hash is the equivalent to a tree where the concatenated leaf nodes give the single large block. + +### References + +This was largely taken from [this draft](https://hackmd.io/@adin/sha256-dag) + +## Results + +## Future Work From 8268291c984cf16da40515ad63cef51719576189 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Mon, 11 Jan 2021 06:35:02 -0500 Subject: [PATCH 2/3] rfc: added rfc for identifying UnixFS files using the hash of the full content --- RFC/rfcBBL210/README.md | 45 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 RFC/rfcBBL210/README.md diff --git a/RFC/rfcBBL210/README.md b/RFC/rfcBBL210/README.md new file mode 100644 index 0000000..e1f68fb --- /dev/null +++ b/RFC/rfcBBL210/README.md @@ -0,0 +1,45 @@ +# RFC|BB|L2-10: UnixFS files identified using hash of the full content + +* Status: `Brainstorm` + +## Abstract + +This RFC proposes that for UnixFS files we allow for downloading data using a CID corresponding to the hash of the entire file instead of just the CID of the particular UnixFS DAG (tree width, chunking, internal node hash function, etc.). + +Note: This is really more about IPFS than Bitswap, but it's close by and dependent on another RFC. + +## Shortcomings + +There exists a large quantity of content on the internet that is already content addressable and yet not downloadable via IPFS and Bitswap. For example, many binaries, videos, archives, etc. that are distributed today have their SHA-256 listed along side them so that users can run `sha2sum file` and compare the output with what they were expecting. When these files are added to IPFS they can be added as: a) An application-specific DAG format for files (such as UnixFSv1) which are identified by a DAG root CID which is different from a CID of the multihash of the file data itself b) a single large raw block which cannot be processed by Bitswap. + +Additionally, for users using application specific DAGs with some degree of flexibility to them (e.g. UnixFS where there are multiple chunking strategies) two users who import the same data could end up with different CIDs for that data. + +## Description + +Utilizing the results of [RFCBBL209](../rfcBBL209/README.md) we can download arbitrarily sized raw blocks. We allow UnixFS files that have raw leaves to be stored internally as they are now but also aliased as a single virtual block. + +## Implementation plan + +* Implement [RFCBBL209](../rfcBBL209/README.md) +* Add an option when doing `ipfs add` that creates a second aliased block in a segregated blockstore +* Add the second blockstore to the provider queue + +## Impact + +This scheme allows a given version of IPFS to have a canonical hash for files (e.g. SHA256 of the file data itself), which allows for independent chunking schemes, and by supporting the advertising/referencing of one or more common file hash schemes allow people to find some hash on a random website and check to see if it's discoverable in IPFS. + +There are also some larger ecosystem wide impacts to consider here, including: + +1. There's a lot of confusion around UnixFS CIDs not being derivable from SHA256 of a file, this approach may either tremendously help or cause even more confusion (especially as we move people from UnixFS to IPLD). An example [thread](https://discuss.ipfs.io/t/cid-concept-is-broken/9733) about this +2. Storage overhead for multiple "views" on the same data and extra checking + advertising of the data +3. Are there any deduplication use case issues we could run into here based on users not downloading data that was chunked as the data creator did it, but instead based on how they want to chunk it (or likely the default chunker) + +## Evaluation Plan + +TBD + +## Prior Work + +## Results + +## Future Work From 871947157cc4ee0bf02d3341bc4e7bd6ad61fe64 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Wed, 13 Jan 2021 18:28:37 -0500 Subject: [PATCH 3/3] rfc: rfcBBL210 add additional impact of verifiable HTTP gateway responses Co-authored-by: Marcin Rataj --- RFC/rfcBBL210/README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/RFC/rfcBBL210/README.md b/RFC/rfcBBL210/README.md index e1f68fb..1505eda 100644 --- a/RFC/rfcBBL210/README.md +++ b/RFC/rfcBBL210/README.md @@ -33,6 +33,9 @@ There are also some larger ecosystem wide impacts to consider here, including: 1. There's a lot of confusion around UnixFS CIDs not being derivable from SHA256 of a file, this approach may either tremendously help or cause even more confusion (especially as we move people from UnixFS to IPLD). An example [thread](https://discuss.ipfs.io/t/cid-concept-is-broken/9733) about this 2. Storage overhead for multiple "views" on the same data and extra checking + advertising of the data 3. Are there any deduplication use case issues we could run into here based on users not downloading data that was chunked as the data creator did it, but instead based on how they want to chunk it (or likely the default chunker) +4. File identified using hash of the full content enables [validation of HTTP gateway responses](https://github.com/ipfs/in-web-browsers/issues/128) without running full IPFS stack, which allows for: + - user agents such as web browsers: display integrity indicator when HTTP response matched the CID from the request + - IoT devices: downloading firmware updates over HTTPS without the need for trusting a gateway or a CA ## Evaluation Plan