-
Notifications
You must be signed in to change notification settings - Fork 9
feat: Call provide endpoint in batches. #64
Conversation
This will avoid huge provide payloads. Signed-off-by: Antonio Navarro Perez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick nits:
- batch size based on item count is not enough, we may need second one based on byte size (see comment below why)
- server side is does not enforce batch size limits, which is a DoS vector
- limits should be documented in specs (IPIP-337: Delegated Content Routing HTTP API specs#337)
} | ||
|
||
if c.ProvideBatchSize == 0 { | ||
c.ProvideBatchSize = 30000 // this will generate payloads of ~1MB in size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (1): this comes with various assumptions about average size of a single item, which will quickly get outdated when we have WebTransport enabled by default (/webtransport
multiaddrs with two /certhash
segments will baloon the size of the batch beyond initial estimate), or add more transports in the future.
In other places, such as UnixFS autosharding, we've moved away from ballpark counting items assuming they are of some arbitrary average size, and switched to calculating the total size of the final block.
Thoughts on swithcing to byte size, or having two limits? ProvideBatchSize
(current one) and ProvideBatchByteSize
(new one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (1): this comes with various assumptions about average size of a single item, which will quickly get outdated when we have WebTransport enabled by default (/webtransport multiaddrs with two /certhash segments will baloon the size of the batch beyond initial estimate), or add more transports in the future.
Then we need to add a specific limit on the amount of multiaddrs allowed. But that is not the problem right now.
Checking the raw byte size of the payload will make the code hard to read and understand. In my opinion, is not worth it (have a payload of ~2Mb instead of ~900Kb because we added more multiaddresses)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for implementing.
See my comment about why I think check byte size is not worth-it in that case.
Is up to the implementation to limit it.
They are here: https://github.com/ipfs/specs/blob/main/reframe/REFRAME_KNOWN_METHODS.md#provide |
This will avoid huge provide payloads.
It closes #55
Signed-off-by: Antonio Navarro Perez [email protected]