Skip to content
This repository has been archived by the owner on Mar 28, 2023. It is now read-only.

Allow serving of FindProvider responses for unknown protocol types #39

Open
aschmahmann opened this issue Aug 22, 2022 · 7 comments
Open
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization

Comments

@aschmahmann
Copy link
Contributor

Background

Reframe is extensible and can support processing responses even when it is not fully aware of the detailed schema of the response. For example, the FindProviders method indicates an extensible (i.e. unioned with "Any") TransportProtocol type and Node type.

In order to enable experimentation Reframe servers should not always need to know about every TransportProtocol/Node and therefore require server-side updates in order to respond to updated/experimental clients with new transport types.

Proposal

Instead of having the DelegatedRoutingService require FindProviders to return whatever the client is exposed to, instead have it expose whatever the server actually needs. For example, instead of

FindProviders(ctx context.Context, key cid.Cid) (<-chan client.FindProvidersAsyncResult, error)

Return something more broad like:

type Provider struct {

or a struct/interface that captures a similar idea but isn't tied to the generated code itself.

Impact

Having this functionality run on the StoreTheIndex Reframe endpoint will enable users to experiment with advertising data with new protocols and serve them via Reframe even before the groups using them have a) finalized their schemas, b) gotten updated code into storetheindex.

This means a user can just roll up one day, make a protocol advertising that is has blocks it'll serve over HTTP at https://foo.tld/blocks<cid> publish it to the Indexers and just pull the responses over Reframe next to the Bitswap and GraphSync responses without coordinating with anyone. Sure, they may still need to decide on a protocol code in the code table but they can get started by either temporarily using a number in the application reserved range, or by choosing some application-specific text as the key.

This code currently has some Bitswap hard-coding that shouldn't be there, but there also isn't enough space in the interface to allow much in the way of other types of responses.
https://github.com/filecoin-project/storetheindex/blob/5fcd096da1095bf6d4f77e7b89b7e0a85ff1304e/server/reframe/reframe.go#L39.

WDYT @petar @willscott @ajnavarro ?

@aschmahmann aschmahmann added kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization labels Aug 22, 2022
@willscott
Copy link
Contributor

  • should we extend the query to allow the client to specify protocols it can handle and filter to those?
  • I'm fine expanding the signature to allow for the direct provider struct data structure / allow arbitrary protocol metadata and have the client filter rather than implicit in reframe

@aschmahmann
Copy link
Contributor Author

should we extend the query to allow the client to specify protocols it can handle and filter to those?

SGTM, we left that out initially since it seemed like the load difference didn't seem like a big deal to start with. However, yes I suspect over time we may benefit from adding more query/filtering parameters to FindProviders if we find the number of responses get out of hand. Having one based on supported protocols and/or Node types seems fine. Ideally we should be able to add them as we go without causing breakages with older implementations when new optional filters are added.

@aschmahmann
Copy link
Contributor Author

aschmahmann commented Aug 22, 2022

I'm fine expanding the signature to allow for the direct provider struct data structure / allow arbitrary protocol metadata and have the client filter rather than implicit in reframe

@willscott How do you feel about effectively specifying something for the Indexers where there's an extra description for Metadata:
https://github.com/filecoin-project/storetheindex/blob/21ec141f580359015cd8a379019006f361ad8830/doc/ingest.md#metadata

Where if the data is in a particular format it can be easily translated into a Reframe response? The details for what that might look like can be tackled in another issue (e.g. in storetheindex, or wherever you think is best to engage), but wondering if that seems reasonable since without that this issue largely doesn't matter (at least in the shorter term/storetheindex use case) since we'd need to bump code in storetheindex anyway.

@willscott
Copy link
Contributor

what do you think would be needed to support such a translation?
we could take the arbitrary bytes pushed into indexer metadata, and pretend they fit the any in reframe
or we can add a assumption, like that any subsequent transport will either (a) encode it's metadata as cbor, or (b) add a varint for the size of it's metadata

@petar
Copy link
Contributor

petar commented Aug 22, 2022 via email

@aschmahmann
Copy link
Contributor Author

what do you think would be needed to support such a translation?

It shouldn't be too big, it depends on what (if any) information the Indexers want from the metadata vs what Reframe needs.

Listing out some levels of flexibility which we can choose to have or not:

  1. Do we care about the structure of the data at all?
    • If yes then some other features like filtering, etc. might still work for us. If no, there's no coupling but all the work has to happen client side
  2. Is the encoding flexible or do we insist it's something like dag-cbor? (i.e. do we need some codec prefix)
    • Could see either way. We can also add the codec prefix and insist it match the dag-cbor one if we're not sure which way we want to go yet.
  3. Do we support the Provider struct, a list of Provider or something else?
    • Consider for example a node (like Boost) is going to want here in terms of serving data over both HTTP and libp2p and therefore have both different Node types and different TransportProtocols. This means we might want to allow storing a list of Providers. Not sure how the Indexers plan on handling this though

Add a new transport type at the protocol level, called STIMeta, which holds a byte field for the uninterpreted metadata

This seems like a potential fallback, but it seems like it'd make sense to allow/encourage users to build things which will end up looking like other Reframe responses which can then be added to the schemas (they might not even have to make changes if they use unique names for their new keys). This is basically saying "no" to option 1.

@petar
Copy link
Contributor

petar commented Aug 22, 2022

This seems like a potential fallback, but it seems like it'd make sense to allow/encourage users to build things which will end up looking like other Reframe responses which can then be added to the schemas (they might not even have to make changes if they use unique names for their new keys). This is basically saying "no" to option 1.

Sure, it would be better if we can unpack the STI meta into a TransferProtocol structure. This would require some processing on the STI side though:

  • STI would have to look into the meta to determine which protocol is being used (and convert the meta into the appropriate reframe TransferProtocol structure), OR
  • the meta is already an encoding of a TransferProtocol structure

@BigLep BigLep moved this to 🥞 Todo in IPFS Shipyard Team Aug 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization
Projects
No open projects
Status: 🥞 Todo
Development

No branches or pull requests

3 participants