Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to buffer based on S3 minimum chunk size #326

Open
1 task
mdedetrich opened this issue Oct 28, 2022 · 0 comments
Open
1 task

Add an option to buffer based on S3 minimum chunk size #326

mdedetrich opened this issue Oct 28, 2022 · 0 comments
Labels
s3 Specifically related to Amazon's S3 storage backend

Comments

@mdedetrich
Copy link
Contributor

mdedetrich commented Oct 28, 2022

What is currently missing?

When gzip compression was added to Guardian (see #196) due to how gzip compression works the implementation was not ideal. Specifically we compress each Kafka record individually where as ideally we would like to compress larger chunks of data since that will give much better space savings via compression.

How could this be improved?

This feature could be improved by buffering the data in memory up until the S3 minimum chunk size (5 megs) and then compressing that entire memory chunk at once rather than doing it per message.

Is this a feature you would work on yourself?

  • I plan to open a pull request for this feature
@mdedetrich mdedetrich added the s3 Specifically related to Amazon's S3 storage backend label Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s3 Specifically related to Amazon's S3 storage backend
Projects
None yet
Development

No branches or pull requests

1 participant