-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zstd as a compression option? #30
Comments
No objection from me.
…On Thu, Feb 14, 2019 at 4:32 AM Marius van Niekerk ***@***.***> wrote:
would there be interest in adding zstd to partd. At lowish compression
levels I've found it to have better compression and around twice as fast as
snappy.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#30>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszKDTcoy0r4DPyrBoG47Ryb6Gapziks5vNVdPgaJpZM4a7fwP>
.
|
@mariusvniekerk , seems like you'd have to make the PR if you want this to progress :) |
Does this mean Zstandard is now supported in Dask? Also is there some documentations as to how I would use it? As I have some zstandard compressed files filled with msgpack serialized data in chunks and would like to use Dask (multiprocessing) to speed up the read or to operate on the data without reading things in memory. |
This repo is not really about data access, but about temporary store for dask. You should be able to load a single file of your data like
Then, you could make a set of delayed functions for dask to work on, one chunk per file, in parallel
Now the question becomes: what do you want to do with this data? By the way: Zstd supports internal streams and block which can, in theory, provide near random-access. Dask/fsspec does not support this, so you cannot read a single file chunk-wise using the method above. However, msgpack does support streaming object-by-object, so you could change the function to work that way (which much lower memory usage), if you intend to output just aggregated values from each file. |
would there be interest in adding zstd to partd. At lowish compression levels I've found it to have better compression and around twice as fast as snappy.
The text was updated successfully, but these errors were encountered: