-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawler based DHT client #709
Conversation
What does the general extra load on the network look like for this? Crawling 'every peer' every hour seems... potentially expensive, especially as the network grows. Additionally, theres an added resource consumption to consider here, we should quantify that. And finally, what specific metrics should we be gathering on the network to help us better understand these implications? |
All excellent questions.
I'm not sure in that I haven't done large scale testground tests to simulate behavior here, or thought of a good way to test this on a smaller scale. However, at a high level resource consumption should probably look like this:
The winning question! Here are some that are on my mind (but more welcome, cc @petar @barath)
|
4c5ccc6
to
9fa7066
Compare
This is an interesting DHT model I haven't thought of yet. While I don't expect this to replace the default DHT mode, it does offer a great potential for bulk providers for one reason, it gets more efficient the more throughput of query that get fed through it. Over time as short lived and unreliable nodes get culled from the peer table only long running high quality peers should remain and failure rates should drop quite quickly (if the query rate is high enough). Some thoughts on this...
This should net you a few hundred peers throughout the DHT to add to your table each time you run it and then run this either on a fixed interval(every 30 seconds?) or dynamically whenever the node thinks it needs to refresh the table. Additionally it may be required to once in a while(once a week/month?) to clear the unreliable peer BF and do a full crawl to both keep the false positive error rate on the BF down and to give peers another shot at proving to be reliable. |
@robertschris 👍 to basically all your suggestions. Some thoughts on your suggestions:
I'd like to clarify that there are 3 separate optimizations here that work independently from each other (although they're complementary):
Some of these could likely be added into the standard DHT client, and some things from the standard client (like the routing logic) could potentially be reused here in a more limited way to help with gradually filling the trie. Additionally, when this new client is deemed sufficiently stable I'd like to see it be useful as a DHT server as well since these nodes could accelerate lookups by less powerful peers that are using the standard client lookup logic. I'm hoping to do a writeup this month about the work so far, time permitting 🤞. Generally I am hoping that this shows how client code can really give us a lot of progress here and to then be able to push on where the protocol needs changes and on the impacts of some of the tunable client + protocol parameters (e.g. number of peers in a response, routing table refresh periods, how many peers to wait for responses from to consider an operation successful, periodicity of repeating provides/puts, etc.) |
ff71d0a
to
f2ee9d0
Compare
a063d5f
to
3080640
Compare
internal/config/config.go
Outdated
// QueryFilterFunc is a filter applied when considering peers to dial when querying | ||
type QueryFilterFunc func(dht interface{}, ai peer.AddrInfo) bool | ||
|
||
// RouteTableFilterFunc is a filter applied when considering connections to keep in | ||
// the local route table. | ||
type RouteTableFilterFunc func(dht interface{}, conns []network.Conn) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is a breaking change as these used to take a *IpfsDHT
. We could have them take a thing that returns a host
since that's all that was being used but this might be workable as well while we explore this space.
I just wanted to say that this is a pretty cool work, and will likely be useful for Infura. 👍 👍 |
…ning the component it is testing
…to the filter interfaces to support more DHT implementations
… we've already finished the query to help us deal with backoffs + invalid addresses
…g GetClosestPeers calls. Only keep the backup addresses for peers found durin a crawl that we actually connected with. Properly clear out peermap between crawls
02efa46
to
ee4a44e
Compare
add1f46
to
57eeffe
Compare
4fb082c
to
03d275f
Compare
closes #619
Still very WIP. Should merge after the crawler gets merged.
TODO:
Expose two implementations of theWill have to wait for another PRMessageSender
interface that we already haveOne for sending one off messages to peersOne that tries to reuse streams and keep them alive