-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO: DAG walk for Block API #118
Comments
Thinking through this more - I'd like to get closer to go-ipld-prime selectors functionality, without going full selectors. The most commonly used selector is "explore all" which simply walks a DAG, visiting every block, but not matching any nodes. You provide it with a loader, and then watch that loader for block loads and those loads tell you the order of block traversal. We typically use it for complete-DAG CAR creation (in go-ipfs' So we already have async Block.exploreAll(cid:CID, loaderFn:(cid)=>Promise<Block>) And your const writer = ... // likely a CAR output writer
const loaderFn = async (cid) => {
const bytes = await repo.blocks.get(cid, options) // get the bytes from somewhere, the loader's main concern
const codec = await codecs.getCodec(cid.code) // let the loader do the decoding, `Block` doesn't have codecs
if (codec) {
const block = Block.createUnsafe({ bytes, cid, codec })
await writer.write(block) // this is where we're doing double-duty, a load is telling us that this block is _next_ in the DAG order
return block // give the block to `exploreAll()` to keep walking
} else if (!NO_LINKS_CODECS.includes(cid.code)) {
throw new Error(`Can't decode links in block with codec 0x${cid.code.toString(16)} to form complete DAG`)
}
// not quite sure what to do here, but we know it doesn't contain links so we probably return some faux block with
// the right bytes and CID but not the right contents, we can defer this decision
}
await Block.exploreAll(root, loaderFn) |
To check if I am on the right track, something like this? /**
* @template T
* @param {Object} options
* @param {CID} options.cid
* @param {(cid: CID) => Promise<Block<T>>} options.loaderFn
*/
const exploreAll = async ({ cid, loaderFn }) => {
const block = await loaderFn(cid)
for (const link of block.links()) {
const linkCid = link[1]
if (!(linkCid.equals(cid))) {
await exploreAll({ cid: linkCid, loaderFn })
}
}
} This function will take as input a CID and a loaderFn. It will then leave the responsibility of getting a block by its CID to the loader function. Once it has got the block, it iterates through the links in the block using the links() iterator. N.B. There is an assumption here that the loaderFn is trying to work its way down a DAG and will therefore run out of block.links as it gets to leaf nodes. However, there are loader functions where the above won't terminate. But since the above exploreAll function is just a utility library function, perhaps it is not its responsibility to ensure that the process terminates? Another thing to note with the above function is that it will load blocks right down to the leaves below the first child of the root before loading the second child of the root node. |
Yes, this is close, some thoughts:
A test should involve a graph of blocks that include at least one duplicate link to a block, maybe from different places, even as simple as this: a -> b And then you'd want to check that you strictly loaded 4 blocks, no repeats, in the depth-first order: a, b, d, c |
Closes: #118 Basic traversal functionality for deterministic DAG walking with no repeat block visits and support for block skipping. User supplies a block loader, which can be used to watch the block ordering of the walk.
Closes: #118 Basic traversal functionality for deterministic DAG walking with no repeat block visits and support for block skipping. User supplies a block loader, which can be used to watch the block ordering of the walk.
We've had APIs for DAG walks in various iterations but don't have a generic one for the js-multiformats stack and it would be a nice fit in the core package.
e.g. https://github.com/ipld/js-block/blob/master/reader.js#L4 was for the short-lived js-block experiment that predates js-multiformats, and I think there's a library that did it for the older interfaces too. Not a hard algorithm, but the complicating factor now is that we don't want to bundle codecs in the core API—the user needs to provide them. So the API might be a little gnarly with an array of codecs, or we could move forward with something like #38 to make this easier/cleaner.
https://github.com/ipfs/js-ipfs/blob/6a2c710e4b66e76184320769ff9789f1fbabe0d8/packages/ipfs-core/src/components/dag/export.js#L82-L107 was the last implementation of this done. Any implementation here should aim to replace most of that code with a call into this library.
ipfs-car would also be a logical consumer of such an API, to help make deterministic CARs: storacha/ipfs-car#76
Questions around ordering will need to have some clarity when comparing to Go APIs. Such DAG walks using go-ipld-prime will strictly use map field ordering as they appear in the encoded block. We have a little more difficulty in JS since we have to rely on
Object
property ordering to do this work. It would be good to test and document any potential footguns for users wrt compatibility and "determinism" related to this.The text was updated successfully, but these errors were encountered: