-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: produce message size rfc #4202
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Enhancing Fluvio to Message Sizes | ||
|
||
This RFC proposes modifications to Fluvio's handling of message sizes. | ||
|
||
## Proposed Enhancements | ||
|
||
1. Handling Larger Messages than Batch Size | ||
|
||
If a single record exceeds the defined `batch_size`, Fluvio will process the record as a standalone request, ensuring that larger messages are not discarded or delayed. If the record does not exceed the `batch_size`, Fluvio will process the record as part of an already existing batch or create a new one if the batch is full. | ||
|
||
2. Handling Larger Messages than the Max Request Size | ||
|
||
Fluvio will have a new configuration parameter, max_request_size, that will define the maximum size of a request that can be sent by the producer. This configuration will make Fluvio display errors when a message exceeds the defined `max_request_size`, even if it's a message with only one record or a batch of records. | ||
|
||
3. Compression Behavior | ||
|
||
Fluvio will ensure that configuration limits that use size constraints, such as `batch_size` and `max_request_size` will only use the uncompressed message size. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's the problem we need to solve. The algorithm needs to compute the result There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As batch is a concept before the message, I really think that we should not use or try to guess the compression size for that. Right now, Fluvio is just "trying to guess" the compression, multiplying every uncompressed size to half (0.5) when it's a compressed batch. About |
||
It affects the size of messages in transit but doesn't change the maximum request size constraints. | ||
|
||
|
||
## Fluvio CLI | ||
|
||
Preparing the environment, with a topic and a large data file: | ||
|
||
```bash | ||
fluvio topic create large-data-topic | ||
printf 'This is a sample line. ' | awk -v b=500000 '{while(length($0) < b) $0 = $0 $0}1' | cut -c1-500000 > large-data-file.txt | ||
``` | ||
|
||
### Batch Size | ||
|
||
`batch_size` will define the maximum size of a batch of records that can be sent by the producer. If a record exceeds this size, Fluvio will process the record as a standalone message. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this behavior will be confused, other products has the same parameter config There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. never reject them |
||
|
||
```bash | ||
fluvio produce large-data-topic --batch-size 16536 --file large-data-file.txt --raw | ||
``` | ||
|
||
There will not be any errors displayed, even if the message exceeds the batch size. But the record will be processed as a standalone message. | ||
|
||
### Max Request Size | ||
|
||
`max_request_size` will define the maximum size of a message that can be sent by the producer. If a message exceeds this size, Fluvio will throw an error. Even if it's a message with only one record or a batch of them. | ||
|
||
```bash | ||
fluvio produce large-data-topic --max-request-size 16384 --file large-data-file.txt --raw | ||
``` | ||
|
||
Will be displayed the following error: | ||
|
||
```bash | ||
the given record is larger than the max_request_size (16384 bytes). | ||
``` | ||
|
||
### Compression | ||
|
||
`batch_size` and `max_request_size` will only use the uncompressed message size. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see suggestion. Having |
||
|
||
```bash | ||
fluvio produce large-data-topic --batch-size 16536 --compression gzip --file large-data-file.txt --raw | ||
fluvio produce large-data-topic --max-request-size 16384 --compression gzip --file large-data-file.txt --raw | ||
``` | ||
|
||
The first one and the second one will use the uncompressed message size to be calculated. Only the second one will display an error because the uncompressed message exceeds the max request size. | ||
|
||
## References | ||
|
||
### Kafka Behavior | ||
|
||
Kafka has a similar behavior for handling larger messages than batch size and max request size. | ||
|
||
Preparing the environment, with a topic and a large data file: | ||
|
||
```bash | ||
kafka-topics --create --topic large-data-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 | ||
printf 'This is a sample line. ' | awk -v b=500000 '{while(length($0) < b) $0 = $0 $0}1' | cut -c1-500000 > large-data-file.txt | ||
``` | ||
|
||
Producing large messages for the topic with a small batch size will not display any errors. | ||
|
||
```bash | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property batch.size=16384 < large-data-file.txt | ||
``` | ||
|
||
|
||
Producing large messages for the topic with a small max request size will display an error: | ||
|
||
```bash | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property max.request.size=16384 < large-data-file.txt | ||
org.apache.kafka.common.errors.RecordTooLargeException: The message is 500087 bytes when serialized which is larger than 16384, which is the value of the max.request.size configuration. | ||
``` | ||
|
||
Producing large messages to the topic with compression will not use the compression size to calculate the batch size: | ||
|
||
```bash | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property batch.size=16384 --producer-property compression.type=gzip < large-data-file.txt | ||
kafka-console-producer --topic large-data-topic --bootstrap-server localhost:9092 --producer-property max.request.size=16384 --producer-property compression.type=gzip < large-data-file.txt | ||
``` | ||
|
||
Both commands will not use the compression size to calculate the batch size and the max request size, respectively. But only the second one will display an error because the message exceeds the max request size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, is this true? Customers are complaining of packets being dropped if they are larger than the batch size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote the expectation, I'll rewrite it with what is doing now and what should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this behavior is needed. should move to enhancement if this doesn't applies to original problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be subsumed into exiting batching behavior. if
batch_size
is exceed, then batch is flushed