-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illustrate multiblock encoder/decoder API #38
base: context-binding
Are you sure you want to change the base?
Conversation
const c = json.or(raw)
c.decodeBlock(raw.encodeBlock(Buffer.from('hello')))
// => SyntaxError: Unexpected token h in JSON at position 0 This is going to get annoying I reckon. Especially when you scale up through the One way to solve this is to do away with Other than that; I don't really have strong feelings about any of these changes. I mostly just want to try these patterns out to form opinions, I don't like churn but don't see good reasons to object other than that. For |
I know this discussion is probably breaking your nice inheritance patterns here but this When you're calling The same could be said for Maybe composition for codecs doesn't actually make as much sense since you are dealing with a pair rather than a singular thing like with the multibase and multihash encoders. Similar problems exist when you get to Again, it's on the decode where this stuff matters. I need to have a bank (I'm avoiding calling it a "registry" because that seems to be a burdened term by now) of multibases, multihashes and multicodecs to pick from. If I have CID+Binary then the bank lets me figure out what I have and form it into a |
Thanks @rvagg for going through and providing, feedback. It really helps, especially because I had been debating some of the decisions that you've called out here myself. I'll respond to individual points inline below:
I agree, in fact in the first iteration I wonder if the original approach would be less of a footgun here. We could also alleviate @rvagg do you thing that would make more sense ?
If you do have That said this is geared towards for different use cases, e.g.
Decode especially makes more sense for cases like
I don't think this is true for It is true that
I think it's fine to call it registry (I call it a compostion) and yes intent of all this is a better registry. As I mentioned in the original issue and pull I think registry is needed for higher level APIs I just did not wanted everywhere. There were also bunch of issues that I called out with
I think symmetry does comes with a cost and maybe it's taken too far here ? But as I did point out I do have use case (with worker) for encoder composition where simply using codec directly isn't really viable. I think main problem you're calling out is with defaults. I do think that |
Yes, I think separating them might be the better option, and maybe the API challenge can be avoided by not performing the composition action on the
You've gone to all this trouble to inject explicitness into the interfaces, factories, methods and the way that it's used. Explicitness is good. But this I think real-world use of these APIs will be most informative. It doesn't feel quite right but that could be biases I'm bringing that aren't helpful in designing this. Mostly I'm suspicious of having so many free floating things that need to be pulled together into one place, but that's kind of inherent in what we're trying to achieve here so it's a matter of how can we do it best? |
Main downside I see with this approach is that codecs exposed by multiformats aren’t going to be composable, and they’d need to be wrapped e.g.: import raw from ‘multiformats/codecs/raw’
import json from ‘multiformats/codecs/json’
import cbor from ‘@ipfs/dag-cbor’
const codec = multicodec(raw)
.or(multicodec(json))
.or(multicodec(cbor)) In my experience when you have to import another library to compose things it is too tempting to craft solution inline. That said there could be other options worth considering, maybe just like const c = json.or(raw)
c.decodeBlock(raw.encodeBlock(Buffer.from('hello')))
// => SyntaxError: Unexpected token h in JSON at position 0 Instead you’ll have const codec = json.or(raw)
codec.defaultDecoder.decode(Buffer.from(‘hello’))
// => SyntaxError: Unexpected token h in JSON at position 0 Not sure if that makes things any more explicit though. Another option could be to make
This is a fare critique. I think primary motivation for combining Given all this I think only viable options are:
What you loose is either const { encoder, decoder } = dag.codec({
multicodec: codecs.json.or(codecs.raw),
defaultCodec: codecs.json,
hasher: hashes.sha256.or(hashes.sha512),
defaultHasher: hasher
}) Which otherwise was deterministically inferred from the composition. Furthermore you still have a same problem as before, just in the next layer decoder.decodeBlock(raw.encodeBlock(Buffer.from('hello'))) This also means you have a same problem when you do const { encoder, decoder } = dag.codec({
multicodec: defaultCodec.or(givenMulticodec),
hasher
}) All this is to say that you can either:
My hope was to strike a balance between two by:
That way user is empowered to make a more informed decision between explicitness and connivence based on constraints they’re facing.
Sure To summarize my position, I am not opposed to been fully explicit everywhere, but that has tradeoffs and approach so far had been leaning towards convenience. Which is why I think it would be best to allow choice between convenience and explicitness and make choice about which one to use case by case basis. We could improve naming of things to make those tradeoffs more clear to users and we can also switch semantics of c = json.or(raw)
c.decodeBlock(raw.encodeBlock(Buffer.from('hello'))).toString()
// =>'hello'
|
I think you've done really well at exposing our lack of mechanism to identify a block type within the binary (as per your discussion on "multiblock"). But that's unavoidable, at some layer of the stack, anyway (binary formats we have to consume but can't control, storage mechanisms that wouldn't support additional pieces, etc.), so we have to deal with it. Trying to think of the way I would want to interact with this kind of system, what I see is:
So right now, my take away from all of this composition work at this layer is that I can't really see myself reaching for it. I'm either going to go lower or higher. You seem to have usecase that makes the in-between a little more interesting? It might also be the case that I'm not quite seeing the broad utility yet. I really just want to see what the higher portion is going to be and I still feel like you and @mikeal have slightly different visions about what that is so I'll wait and see what the synthesis is. |
Problem Statement
#35 removes registries which simplifies things for the encoder which had to communicate which encoder and hashing algorithm to use by code or a name and now it does just by reference. It does however pushes some complexity onto decoder which needs a table of decoders (and in some instances hashers) in order to be able to decode dags encoded with multiple codecs.
Overview of changes here.
Changes in this pull request are based on #35 and attempt to introduce solution for above described problem, that is, make decoding dags that use multiple codecs as convenient and simple as it was with registries. Conceptually it does so by introducing
or
combinator allowing one to combine several multi(thing)s and turn them into one multi(thing) that can encode / decode in either of the combined things.Base decoders / encoders
encoder.baseEncode(bytes)
that encode bytes with underlying base encoding (without multibase prefix) and correspondingdecoder.baseDecode(string)
that decodes string (which isn't multibase prefixed) with an underlying base.encoder.encode(bytes)
that encode bytes with underlying base encoding and return it with multibase prefix. Correspondingdecoder.decode(multibase)
checks the base and if supported decodes it otherwise throws an exception:Primary idea is that if you need base decoder that handles multiple bases, you can create a composition and not worry about which one you're dealing with.
Codec
Same idea as above had been expanded to codecs. Just like
baseEncode
/baseDecode
codecs have nowencodeBlock
/decodeBlock
that encode / decode via concrete codec. In addition there is nowencode({ code, value }) => {code, bytes}
anddecode({ code, bytes }) => {code, value}
. Just likebaseEncode
/baseDecode
operated on strings andencode
/decode
operated on multibase tagged strings,encodeBlock
/decodeBlock
operate on raw value <-> bytes andencode/decode
operate on code taggedvalue <-> bytes
hence{ code, value } <-> { code, bytes }
Similar to base decoders codecs can also be composed:
And encoders compose as well
Just like logical or
json.or(raw)
is also drop-in replacement forjson
because it's the operand on the left assumes default role:So what's going on here is that there is a
BlockEncode/BlockDecoder
pair of interfaces andMulticodecEncoder/MulticodecDecoder
pair of interfaces. When codec is created it implements all four which makes codecs and codec compositions swappable. This enables high level APIs likedag
API (see blow) to not care how many codecs are used in the dag it just needs a codec (or composition of them) to encode / decode it.Originally this pull request separated
BlockEncode
/BlockDecoder
fromMulticodecEncoder/MulticodecDecoder
and you had to build later with former. I think that allowing codecs to compose without having to introduce another layer makes a an API simple yet powerful, but I am also curious to learn how others perceive it.Hashers
Hashers got similar treatment. There is
digestBytes(bytes: Uint8Array): Promise<MultihashDigest<Code>>
anddigest(input: {code: Code, bytes: Uint8Array }): Promise<MultihashDigest<Code>>
which makes them composable:Dag
Previous
block.js
was replaced with more flexibledag.js
that can be used to create dag encode / decoder with blocks in various encodings:Just like all the other things dag codecs also implement
or
interface so that one could take dag codec from one sub-system and compose it with dag codec from another and get a codec that can be used with both: