Skip to content
This repository has been archived by the owner on Mar 28, 2023. It is now read-only.

feat: Call provide endpoint in batches. #64

Closed
wants to merge 1 commit into from

Conversation

ajnavarro
Copy link
Member

This will avoid huge provide payloads.

It closes #55

Signed-off-by: Antonio Navarro Perez [email protected]

This will avoid huge provide payloads.

Signed-off-by: Antonio Navarro Perez <[email protected]>
@ajnavarro ajnavarro self-assigned this Nov 8, 2022
@ajnavarro ajnavarro changed the title Call provide endpoint in batches. feat: Call provide endpoint in batches. Nov 8, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick nits:

  1. batch size based on item count is not enough, we may need second one based on byte size (see comment below why)
  2. server side is does not enforce batch size limits, which is a DoS vector
  3. limits should be documented in specs (IPIP-337: Delegated Content Routing HTTP API specs#337)

}

if c.ProvideBatchSize == 0 {
c.ProvideBatchSize = 30000 // this will generate payloads of ~1MB in size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (1): this comes with various assumptions about average size of a single item, which will quickly get outdated when we have WebTransport enabled by default (/webtransport multiaddrs with two /certhash segments will baloon the size of the batch beyond initial estimate), or add more transports in the future.

In other places, such as UnixFS autosharding, we've moved away from ballpark counting items assuming they are of some arbitrary average size, and switched to calculating the total size of the final block.

Thoughts on swithcing to byte size, or having two limits? ProvideBatchSize (current one) and ProvideBatchByteSize (new one)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (1): this comes with various assumptions about average size of a single item, which will quickly get outdated when we have WebTransport enabled by default (/webtransport multiaddrs with two /certhash segments will baloon the size of the batch beyond initial estimate), or add more transports in the future.

Then we need to add a specific limit on the amount of multiaddrs allowed. But that is not the problem right now.

Checking the raw byte size of the payload will make the code hard to read and understand. In my opinion, is not worth it (have a payload of ~2Mb instead of ~900Kb because we added more multiaddresses)

Copy link
Contributor

@ischasny ischasny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for implementing.

@ajnavarro
Copy link
Member Author

@lidel:

  1. batch size based on item count is not enough, we may need second one based on byte size (see comment below why)

See my comment about why I think check byte size is not worth-it in that case.

  1. server side is does not enforce batch size limits, which is a DoS vector

Is up to the implementation to limit it.

  1. limits should be documented in specs

They are here: https://github.com/ipfs/specs/blob/main/reframe/REFRAME_KNOWN_METHODS.md#provide

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split provide messages (message size limits)
4 participants