Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: produce message size rfc #4202

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fraidev
Copy link
Contributor

@fraidev fraidev commented Oct 2, 2024

TL;DR

batch_size producer config must not reject large records, just send them directly.

Create a new max_request_size producer config that must reject large messages.
I am using max_request_size because Kafka uses max.request.size but we can change it to other config name.

Compression sizes should not be used for these producer configs.

Related:

@fraidev fraidev force-pushed the rfc_produce_msg_size branch 3 times, most recently from 8b495a7 to 7efe6bb Compare October 2, 2024 02:30
@fraidev fraidev marked this pull request as ready for review October 2, 2024 02:32

1. Handling Larger Messages than Batch Size

If a single record exceeds the defined `batch_size`, Fluvio will process the record as a standalone request, ensuring that larger messages are not discarded or delayed. If the record does not exceed the `batch_size`, Fluvio will process the record as part of an already existing batch or create a new one if the batch is full.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, is this true? Customers are complaining of packets being dropped if they are larger than the batch size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the expectation, I'll rewrite it with what is doing now and what should be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this behavior is needed. should move to enhancement if this doesn't applies to original problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be subsumed into exiting batching behavior. if batch_size is exceed, then batch is flushed


3. Compression Behavior

Fluvio will ensure that configuration limits that use size constraints, such as `batch_size` and `max_request_size` will only use the uncompressed message size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem we need to solve. The algorithm needs to compute the result after compression, as that's what's carried in the payload.

Copy link
Contributor Author

@fraidev fraidev Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As batch is a concept before the message, I really think that we should not use or try to guess the compression size for that.

Right now, Fluvio is just "trying to guess" the compression, multiplying every uncompressed size to half (0.5) when it's a compressed batch.

About max_request_size I don't see problem to use compression size for that, but Kafka does not use the compression size for max_request_size. So if we want that, it's better to use another name to not generate confusion for the clients.

@sehz
Copy link
Contributor

sehz commented Oct 2, 2024

Never mind previous comments. Here is clarifying semantics.

max_request_size applies to pre-compression record size so can be applied to per record constrain. batch-size applies to post-compression limit (or underlying raw records limit). We can set max-request-size high for default case and bump up batch-size as well.


### Batch Size

`batch_size` will define the maximum size of a batch of records that can be sent by the producer. If a record exceeds this size, Fluvio will process the record as a standalone message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size should be ultimate raw limit. again not sure what this address

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this behavior will be confused, other products has the same parameter config batch_size. In other products, Its behavior is to just check if the record will be inside a batch or not.

Copy link
Contributor Author

@fraidev fraidev Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never reject them

@fraidev
Copy link
Contributor Author

fraidev commented Oct 2, 2024

max_request_size applies to pre-compression record size so can be applied to per record constrain. batch-size applies to post-compression limit (or underlying raw records limit).

I think that we should use the compressed size for batch-size as we only compress the whole batch. Not each record of the batch.

We can set max-request-size high for default case and bump up batch-size as well.

Sure, Kafka for example uses 16kb for batch_size like us and 1mb for max.request.size


### Compression

`batch_size` and `max_request_size` will only use the uncompressed message size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see suggestion. Having batch_size applies to final raw batch should simplify calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants