-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check block cache across multiple rainbow instances #109
Comments
cc @aschmahmann: the (A) is a brain dump of the idea how we could logically share the block caches I've mentioned earlier this week. sanity check would be appreciated. |
@hacdias fysa after discussing with @aschmahmann it seems that option (C) is easiest to wire up and get to work today (we already have bitswap), but allows us to leverage HTTP in future, once a client exists. I imagine the end user would only need to set up single list: This will both:
|
So one premise of rainbow vs. the older gateways was to avoid hosting: if data is not retrievable from somewhere in ipfs network, then that is not rainbow's problem. This moves into the direction of actually using rainbow to host things, by relying on other rainbow peer's caches, which in turns assumes that rainbow peers do cache things for non-negligible times. Big baggage. Reg RAINBOW_PEERING_ADDRS, the idea of |
The idea is to limit the baggage by enabling "hosting of cached things" only for safelisted peerids. This "cache-sharing" requires mutual agreement and is opportunistic, has no SLA for how long things are cached, and the default bitswap behavior for non-safelisted peers remains to always respond "i dont have it". Perhaps we should rename this feature and move away from "peering" to "cache sharing" to set expectations closer to reality and avoid feature creep? In case of ipfs.io "cache sharing" will be with other rainbow instances, but we have use cases where people self-host their own datasets and want to use rainbow as a dedicated gateway in front of kubo or ipfs-cluster, hoped to create config option which works for them too, that is why explicit Reusing If we go with |
I would separate the feature where rainbow peers autodiscover and connect to each others as something of its own... perhaps call it RAINBOW_SEEDS_PEERING, or RAINBOW_SEEDS_SWARM, or RAINBOW_SEEDS_NETWORK. It's not only useful for caches. My original thought was to mount diverse functionality on top, in particular distributed rate-limiting/quota system across a swarm of rainbows. So after implementing that, on the side you could have RAINBOW_SEEDS_BITSWAP, or RAINBOW_SHARED_CACHE or whatever, which relies on the rainbow swarm. |
Regarding indexes: I would index 0-100 by default, then maybe have an MAX_INDEX option to go higher/lower. I fear having ranges may just one more way to configure things wrong for users. |
Sgtm. We can add opt-in config:
Then the infra at ipfs.io would run with same SEED, |
Sounds good. Only nitpick is that Also I understand these become flags too as everything else. |
Problem
At inbrowser.dev (backed by rainbow from ipfs.io gateway, so a general problem in our infra), we see inconsistent page load times across regions, and sometimes across requests within the same region.
User can get instant response from one instance, and then on subsequent page load, or request, I get stalled page load and timeout, even tho the data exist in cache of one of the other rainbows in the global cluster. We also see inconsistency across subresources on a single page.
Scope
Solutions
A: Add HTTP Retrieval Client to Rainbow, leverage
Cache-Control: only-if-cached
We know we need HTTP retrieval client for Kubo to enable HTTP Gateway over Libp2p by default, and to make direct HTTP retrieval from service providers more feasible. We can't do that without a client and end-to-end tests. Prototyping one in Rainbow sounds like a good plan, improving multiple work streams at the same time.
The idea here is to introduce HTTP client which runs in addition, or in parallel to bitswap retrieval.
Keep it simple, don't mix abstractions, do opportunistic block retrieval like bitswap, but over HTTP.
Using
application/vnd.ipld.raw
and trustless gateway protocol is a good match here: allows us to benefit from HTTP caching and middleware, making it more flexible than bitswap.Rainbow could:
Cache-Control: only-if-cached
going over list in sequence.This way, once a block lands in any of our rainbow caches, we will discover it, and requests won't timeout after 1m on unlucky scenarios.
Open questions:
B: Set up reverse proxy (nginx, lb) to try rainbows with
Cache-Control: only-if-cached
firstWriting this down just to have something other than (A), I don't personally believe (B) is feasible.
The idea here is to update the way our infrastructure proxies gateway requests to rainbow instances, and first ask all upstream instances within the region for resource with
Cache-Control: only-if-cached
, and if none of them has the thing, retry with a normal request that will trigger p2p retrieval.The downside here is that this feels like antipattern:
Cache-Control
C: Reuse Bitswap client and server we already have
Right now, Rainbow runs Bitswap in read-only mode. It always says it does not have data when asked over bitswap.
What we could do is to a permissioned version of peering:
/ip|dns*/.../p2p/peerid
, otherwise we/p2p/
multiaddrs (quick and easy), leverage existing peering config / libraries where possible (Add peering support #35)/http
)D: ?
Ideas welcome.
The text was updated successfully, but these errors were encountered: