Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make age parallel #109

Open
paulmillr opened this issue Mar 9, 2020 · 6 comments
Open

Make age parallel #109

paulmillr opened this issue Mar 9, 2020 · 6 comments

Comments

@paulmillr
Copy link

paulmillr commented Mar 9, 2020

If you encrypt files on a machine with tons of RAM and cores, age isn't any faster versus some basic slow PC.

I think it would be great to utilize resources when they're available.

Tried this on Linux via piping and via -i -o — seeing tiny load of one core.

@RKinsey
Copy link
Contributor

RKinsey commented Mar 18, 2020

This is that it isn't feasible to do that without overhauling Go's cryptograhpy libraries (and might be unsafe, I don't know enough about goroutine security to say for sure).

The only functions in age that actually handle the plaintext are EncryptOAEP/DecryptOAEP from crypto/rsa and Seal/Open from x/crypto/chacha20poly1305, neither of which are parallel. Both could be parallelized, but RSA generally hasn't because it needs a parallel-friendly modular exponentiation function. ChaCha is fairly easy to parallelize, but Go's implementation is handwritten assembly using vector instructions when available (unless you're using a purego build, gccgo, or an uncommon CPU architecture). I have a feeling that probably outperforms a goroutine version, but maybe not.

@xorhash
Copy link

xorhash commented Apr 3, 2020

@RKinsey I'm not sure if this argument actually holds. internal/stream/stream.go seems to read and write in chunks of 64 KiB (plus 16 bytes of Poly1305 tag for each encrypted chunk). Therefore, there's parallelization potential there by queueing up the encryption/decryption of chunks (or multiples of chunks) between cores. Orchestrating the whole thing so that there's no bottleneck when reading or writing is another story though.

@joonas-fi
Copy link

joonas-fi commented Oct 29, 2020

Yeah @RKinsey was talking about the key-wrapping phase. The actual symmetric stream encryption is where the bulk of Age's work happens (at least on larger file sizes) and it looks like it could be parallelizable.

The stream is divided into fixed-size chunks of 64 kB, and each chunk uses the same encryption key but of course a different nonce. The nonce is calculated based on the chunk number. It's a seekable stream and thus theoretically easily parallelizable. Though practically the code will be more complex than what currently is - so it'd need pretty good testing suite.

@Tronic
Copy link

Tronic commented Oct 31, 2021

Just running chacha20-poly1305 in parallel for a few blocks easily more than doubles the speed. My own tool is written in Python and does 2.2 GB/s encryption and decryption (using 4 threads for chacha, otherwise single-threaded). It is a shame that the crypto libraries don't offer threaded implementations of these algorithms.

This is on a machine where age does 1 GB/s and rage only 400 MB/s.

@paulmillr
Copy link
Author

@Tronic are you using the latest rage? The speed difference should be minimal right now

@Tronic
Copy link

Tronic commented Nov 1, 2021

@paulmillr rage 0.7.0 on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants