Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPIP-337: Delegated Content Routing HTTP API #337
IPIP-337: Delegated Content Routing HTTP API #337
Changes from 3 commits
1d9ec9c
65d178b
4c024dd
0acdb01
f7b4437
13d695c
e3e744a
451b1e9
27d23e8
a9984a9
fce070f
fff68c3
11f4ca5
39c467e
87ff0ac
96d55d0
4264a2d
0f49dcf
7238e63
e823d9e
19fff93
1aac44c
acc397b
325ca1e
9c47a31
512bc05
655b1f2
d343189
573417e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just write the spec within the IDL of the data and define the transport to be this? It seems like it'd be easy enough except for the areas where the divergence of this API runs counter to some of the Reframe goals, which seem worth discussing. For example, I put an alternative that seems to capture some of your major changes below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a property of Reframe this is a property of the transport defined in this document https://github.com/ipfs/specs/blob/main/reframe/REFRAME_HTTP_TRANSPORT.md, while Reframe is independent of individual transports.
e.g. see #327 which doesn't touch any method specifications just adds an alternative (v2) HTTP transport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What types of routers are "inbounds" here if streaming support is off the table?
Here are a few routing systems in place today:
By making the API non-streaming you effectively only end up supporting cid.contact, in which case it doesn't seem like it's so different then using the Indexer-specific API with the
/multihash
endpoint and calling it a day?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Etag
handling in edelwieiss is a bug and that is all – see proposed fix based onLast-Modified
in IPIP-327: Reframe over HTTP version=2 (DAG-CBOR and better cache controls) #327\n
as delimiter)application/json-seq
) standard for streaming multiple JSON documentsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be true, when the discussion for ETags was brought up it was flagged that FindProviders cannot be forced to buffer data although some implementations may choose to (e.g. storetheindex). If this happened it was a breaking change that was not flagged as such. There was intentional bug fixing here to make sure streaming was supported ipfs/go-delegated-routing#26.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some use cases I think it'd be quite useful for a delegated routing API to support.
bafyfoo
but actually the graphbafyfoo;adl=unixfs
byte range 10000-20000There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this doesn't include streaming is because there is no immediate need for it, and we can add it later. The narrow scope is intentional so that we can focus on nailing this particular use case.
"Later" can be mean immediately after this spec; I just don't think it's helpful to block indexer support on it, since they don't need it. For adding it later, consider a streaming response as just a different response format which can be included in content negotiation, such as ndjson with the same provider record schema.
I believe there will remain value in supporting standard
application/json
responses for operators since existing services, infrastructure, middleware, and other tools can produce/parse/manipulate the response without having to write custom code for the format.What does this mean, concretely? Provider records have a
Protocols
section that can accomodate any multicodec identifier and an opaque JSON value payload, is that not the kind of extensibility you're talking about? Could you provide a complete example that we can work through?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the indexer POV would be great to see paging support so that we can use the protocol with the larger IPFS nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends on whose use cases you're thinking about, but IMO there are uses for it today (see some support in #337 (comment)). The library https://github.com/libp2p/js-libp2p-delegated-content-routing which is particularly useful in browsers cannot be replaced in a meaningful way with this API, which as I've mentioned means we're not making any progress here other than adding a new API for an existing system (Indexers) which already has an API.
I think it depends. For example, if the v1 protocol doesn't support streaming and v2 does how long will it be expected for the ecosystem to support v1 for? Note that there is already an HTTP API that does exactly what the Indexers want that IMO is largely unsuitable for content routing as a whole (and I suspect most participants in this PR agree or else we'd be pushing to reuse that spec), unless this is really another attempt at the "Indexer HTTP API" with the wrong title name.
IMO even if we were sticking with this it would still be important for developers to be able to figure out what the data meant which means declaring what they mean here or in a sibling spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I've listed multiple examples in my post earlier in this thread which are not supported, but I can work through them more explicitly.
There are a variety of ways to describe fetching IPLD data over HTTP (some examples here). However, to see the problems with the proposed scheme consider the demo advertisement doing this with the Indexer. The flexibility in this PR matches exactly the flexibility in the Indexer protocol and its
/multihash
HTTP endpoint.However, it has a problem ... there's no good place to put the information. The demo above has a blank HTTP protocol ID, a bogus peerID, and an HTTP multiaddr. This leads to some hacks that being required to address this such as:
/p2p/<peerID>
is not appended to the HTTP multiaddr""
and make sure it isn't validated)Note: BitTorrent is roughly a similar story.
I gave multiple examples there, but to flesh out one. Consider that I'd like to support
SHA256(100 MB)
within major ecosystem tooling (e.g. pinning services) without requiring anyone else in the ecosystem to do much of anything. So the way I do this is I advertise some metadata into the content routing system claiming thatSHA256(100 MB data)
is downloadable in an incrementally verifiable way with a given proof with CIDP
.Just like in the HTTP case there's no PeerID and in this case there's not even a multiaddr to be associated with. So where does this data go?
For those wondering there are a number of important applications of this for bringing IPFS compatibility to existing systems. You can take a look at https://github.com/aschmahmann/mdinc (and this PR) for background and a demo of making Docker data available over IPFS.
As an example: I should be able to create an endpoint (e.g. at routing.delegate.ipfs.io) that ingests FindProviders responses from say cid.contact, the DHT, and experimental-new-system.tld and responds with both of them as they arrive. If either system adds new types of responses (e.g. new provider types) that shouldn't require updating the proxy in the middle to understand them in order for the new provider types to propagate their way through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for a version bump, it can be introduced in a backwards-compatible way by adding a new content type such as
application/x-ndjson
, and clients and servers can do the usual content negotiation.I agree but I don't think this API should be interpreting the meaning of it, it is pass-through data and asserting semantics here would hinder forwards compatibility. I would hesitate to even call it a "sibling" spec as this API should be completely agnostic to the contents of the payload. IMO it is the responsibility of the producer of the data to document its structure and semantics for the consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I get the theoretical ideas, but I am having trouble understanding what it means concretely. Are you saying that the peer ID, multiaddr etc. are details of the transfer protocol and should really be part of the opaque payload?? If so, that would make sense and I can change provider records to use tagged unions. What about something like this:
Provider record schema:
Bitswap provider record (multicodec_code=2320):
So then the full response becomes something like
Is that like what you have in mind? (Protocols are stringified multicodec codes so that this is compatible with OpenAPI discriminators, which require discriminator properties to be string values, and they are codes instead of multicodec names because apparently names aren't as stable as codes.)
Agreed, that is a design goal of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the alternatives section is not obviated by the existence of existing specs. The existing spec doesn't solve a problem you have which is why you proposed an alternative. It seems unlikely that this is the only way to solve the problem, just the one you currently think is the best.
An alternative could be to just define a new HTTP-based transport for Reframe (e.g. #327) that takes some of the good ideas from this proposal.
Below is a strawman taking most of the listed issues with Reframe, putting the ideas from here into a new transport and seeing what's left.
All method names are
/reframe/<method-name>
Proposed in #327
This would resolve:
Removing changes related to backwards compatibility since this breaks everything anyway
Proposed in #327
All parameters in request messages are mapped to query parameters
e.g. for
FindProviders
Key is a CID, so we could have/reframe/findproviders?Key=bafyfoo
, similarly there can be mappings for most if not all of the other IPLD Data Model Kinds.This might mean we have to forbid IPLD kinded unions from the request struct, but if it helps UX then it's probably worth it 😄.
Some like maps might be complicated, so we could just disallow those until a need comes up (e.g. someone wants a union in their request), or have some system such as flattening the request map into query parameters or URL-encoding them. For example, for
{ "Key" : "bafyfoo", "CheckRouters" : { "DHT" : { "ID" : "/ipfs/kad/1.0.0", "timeout" : "10s" } }
it could look likereframe/findproviders?Key=bafyfoo&CheckRouters-DHT-ID=/ipfs/kad/1.0.0&CheckRouters-DHT-timeout=10s}
orreframe/findproviders?Key=bafyfoo&CheckRouters=base64(encoded map data)
.This resolves:
Add a pagination token on the relevant methods (or more generally if it really seems globally applicable)
Remaining issues
There is complexity associated with streaming and caching
Yes, As described above IMO this is worthwhile complexity since not supporting streaming limits the utility of this API. The most commonly used (explicit) content routing system in the IPFS ecosystem today is the IPFS Public DHT which will hurt when put behind a non-streaming API.
Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions
A lot of this seems to be alleviated by the proposal above. IIUC the remaining special things are:
The Go implementation is complex and brittle, and is currently maintained by IPFS Stewards who are already over-committed with other priorities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, the alternatives section is definitely missing some real alternatives. I'll add this alternative, and possibly some others, and could you then add these comments to it? Having a conversation on all of these points on this one thread will be pretty difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM 🙏. Some of these are things I cribbed from @lidel and #327, but happy to discuss here where we can break out more threads so it's easier to parse individual discussions out.