-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encryption #43
Comments
Hello. Most likely yes, encryption will be added. AFAIK lots of users want this feature. But I cannot promise that it will be done soon - it might look simple, but actually pretty complicated to make it work with all current and planned features. For now you can encrypt it with 3rd party software (like pgp) and then use I will leave this ticket open and will post updates here. (I also had idea to post technical RFC before implementing this, to get some feedback and ideas from community) p.s. |
Hello, first and foremost, thanks to you Victor for this very good job. I would also like some client-side encryption, and would like to help out (assuming I'm able to). I've been working on a small perl script (in the frame of a different project), where we directly used the power of gpg via system calls. I realize this would be platform locked (I use Linux), but having the gpg magic working inside the hood of you Perl script would allow to have encryption without the additional disk space, via proper piping to a system call. I'm thinking Linux, of course, but maybe the same approach could be extended to other OSs. What do you think of this approach? I assume you were more into using perl modules like Crypt:GPG? I'm willing to contribute to this, but I would maybe need some few guidelines. Thanks again Davide |
Actually I was going to use GPG/PGP too. Like Problems with perl encryption modules, like Other problems with implementing encryption in
So I need to encrypt metadata too. Problem now if you retrieve inventory with 10_000 archives, yo'll have to decrypt 10_000 metadata records and thus call GPG command 10_000 times.
Yes it's hard to contribute now. I have good testing tools for this, like glacier emulator (somewhere in Currently I am busy with refactoring of FSM engine to queue HTTP requests and writing another integration testing tool on top of I'll let you know in this ticket when I have better development docs and when I have specification (RFC !, to be discussed) on how encryption should work. |
Actually, that's looks like show-stopper. I don't know good way to solve this. Seems there is no batch-mode in GPG. a) use different encryption for metadata. (via CPAN modules) I don't like both (a) and (b). |
Sorry if my reasoning is bugged, I don't know the code. If I understand correctly, all the metadata records are saved in the journal, in the form of one single file. Isn't it possible to keep that file (the journal) as it is locally and only encrypt the file to be uploaded to Amazon? Now I don't know how much of the code should be rearranged for this to work, of course... |
yes, correct.
yes, that's possible, and does not look like security problem. Also user can manually encrypt and backup his journal by his own (backup to Amazon Glacier or Amazon S3 or other ways).
yes, that would work. However, even we implement it, I don't believe it's clever idea to automate it (because uploading new copy of journal after each small change made to journal is not really effective), I believe I suggested same above: Problems only that: Metadata records stored in Journal file, but also in Amazon Glacier servers (there is special field for this) Currently you can drop your journal, and restore it with I think this feature is pretty usefull. Also it's natural to Amazon Glacier workflow. Pretty much all Amazon Glacier clients use it. Also, usually different clients do not understand metadata format of other clients, but sometimes they do. And if they do, they understand So I would like to preserve this feature, and store encrypted metadata on Amazon side somehow.. |
I've just thought of another approach. We don't really need to encrypt SHA256/TreeHash sums. Because we don't need actual treehash of plaintext (probably!). So we can store I am not really 100% sure yet if this will work. Also, this way we'll completelly loose original TreeHash checksums which will make harder for end user |
Seems there is a way to decrypt several files at once (at least with symmetric encryption):
so number of files now limited by command line length. Interesting, that same does not work with encryption. There is also a way to read filenames from STDIN:
but it asks for passphrase - i think this can be worked around in perl script by supplying |
What about gpgdir? I remember I used it for encrypting several files in a directory... |
Just in case, I want to clarify that those example of decrypting multiple files with PGP were about decrypting metadata (i.e. filenames and checksum information) received I want to represent it with files (one file = one line in journal = one metadata entry) and decrypt with Actual real user files encryption/decryption probably will be done using one file = one pgp command (and using pipes, no intermediate files).
I checked the source, it's not small. It does not look that it decrypts files with |
Probably not. Its main plus is that it allows for encrypting/decrypting multiple directories from command line, nothing that can't be done in a better way with perl tools (or gpg itself) I was also thinking, what about temporarily go for
at least in case the user chooses to encrypt? I understand your thinking, and why it would be desirable to have the journal in Amazon as well. But here's my reasoning: if you set up a remote backup process, a very big use case is disaster recovery (actually, I can't think of any other uses for Glacier). So when you need the data, it's because your house or office was destroyed by fire or earthquake, and your entire IT infrastructure is likely gone together with the data. If you're worried of Hard Disk failures you shouldn't use Glacier, because of the long delay in retrieval. This means that it is certainly of the uttermost importance to have the journal somewhere else than the local machine as well, but not only. I'm currently backing up the entire "infrastructure" needed to recovery the data from Glacier (journal, of course, but also scripts, passwords, vaults names, etc) somewhere else (encrypted in dropbox, but that's just an example), and I think that anyone seriously planning for disaster is likely to do the same. So that the recovery can boot-strap itself anew by just remembering the encryption password of the stuff in dropbox, instead of passphrases, keys, secrets, vault names and all the related configs. If the above makes sense, then it seems to me less unescapable that the encrypted journal should be in Glacier as well, at least in my use case... |
secondary backup (because restore is too expensive for primary backup), archiving, log archiving enforced by law.
good strategy. but some people might want to backup only scripts+password+names once (perhaps by just printing in on paper together with "backup restoration policy"). Even if they wish to backup Journal as well, backuping Journal after each Journal modification can be ineffective in some rare cases (for example when journal size is much higher than size of backup increment -
I agree that such implementation of encryption is higher priority (for end-users) than "proper" implementation with metadata encryption. There are just few small disadvantages: a) If we implement "encryption-without-metadata" and then, later, "encryption-with-metadata" features then would be few additional complexities for end users when they decide if it's safe to drop their journal or not. b) There will be another branching in journal/metadata format (i.e. there will be two) Next, I think solving that problem is not very big part of whole work. Maybe ~20%. So I am not sure if it worth to release intermediate version and introduce additional overhead. Well, let's talk about way to actually contribute to this feature. There are two issues: #39, #40 related to file versioning. The thing when I implemented There are also important issues #3, #37 (not sure yet if it's more important than encryption or no). And, as I told I am busy now with rework of HTTP Queue engine. My plan for encryption implementation: a) I introduce some docs for developer I think (d) can happen pretty soon (maybe 0.5-1 month), but (f) won't happen soon (maybe ~6 months or bit more) Also I want to mention here few things, missing in (development) docs: What features should mtglacier have, how it should behave: In the first place, Natural feature: Amazon glacier has Range-retrievals featue, so it's possible to hack some command to minimize retrieval cost with it. Not natural for glacier client: file deduplication, file encryption (both can be done with 3rd party tools), backup rotation (can be done with scripting) (NOTE: sometimes this priority violated, for example I often think that encryption more important than range-retrieval ) Priority during development
(NOTE: sometimes those priorities violated, but not much) |
I made some benchmarks https://gist.github.com/vsespb/6776512 GPG able to decrypt ~ 1000 small files per second with |
Another problem is how to prevent people from doing this:
and then
i.e. messing two different passwords for same journal/vault and perhaps same filenames (when we'll have versioning for files) and, even worse, if metadata encrypted too, they will not be able to decrypt their matadata into journal, if they ever uploaded files with different passwords to same vault. Currently similar problem presents too - people are allowed to use one journal to upload files to different vaults (this should not be allowed - more info https://github.com/vsespb/mt-aws-glacier#why-journal-does-not-contain-regionvault-information ). |
Another problem with metadata encryption on Amazon Glacier servers, is that, if we allow encryption with asymmetric crypto, encrypted data can exceed (or eat much of) 1024 bytes allowed by amazon for metadata. below sizes for: symmetric encryption: 102 bytes |
Encryption is important, but it is orthogonal to what mt-aws-glacier does. Here is what I do right now:
Others have been doing this with other forms of cloud storage, e.g. http://shrp.me/docs/encrypted_offsite_backup.php . So building encryption facilities into mt-aws-glacier might not actually be worth the effort, which seems to be considerable... |
I agree.
I tried that, indeed that works great. Thanks you for information.
Well, indeed, maybe. Need to think about it. One disadvantage that I see for now is that with Anyway it's definitely worth to mention btw, another example of often requested feature which is orthogonal to mtglacier functionality is bandwidth throttling. |
I may suggest Crypt::OpenPGP. Just recently I've adopted this module and I'm planing to bring it up to speed in the near future |
That is interesting! Looks like pure perl thing (i.e. not calling external PGP command for each operation). That would be slow for encryption of data (I was going to call external GPG command for it), but that is OK, and will be even faster, for encrypting metadata (thouthands of small records) while keeping metadata format compatible with GPG. |
I would love to dump some of our companies old data into glacier.
But I can not do that unencrypted.
This tool seems to solve a lot of my problems. But plaintext is a nogo for me. Any chance encryption could be added?
The text was updated successfully, but these errors were encountered: