-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attack on Bloom filters built from multiple identifiers #40
Comments
My first (brief) read through indicates that yes it is applicable. It looks like the slightly dubious double hash is actually introducing the weakness that they are able to exploit. We need to look into this further. It referenced Cryptanalysis of Basic Bloom Filters Used for |
Some good news, one of the proposed solutions is to use multiple identifiers in the same bloom filter (which we already do): Regarding the double hash the same paper strongly suggests using k independent hash functions: If we use a single strong hash with a counter up to k used as salt would that meet that criteria? |
Slightly different version of "Cryptanalysis of basic Bloom filters used for Privacy Question still stands. |
Warning I feel like I'm making up crypto here... would like help. The problem is to hash a token
|
Notice: this is just thoughts, without any proofs. We need to be sure that the used hash is at least known to be secure against known-plaintext attack.
So the attacker gets |
I need to brush up on this stuff, but one potentially useful observation occurs to me: A hash h := SHA256(m) is 256 bits long, so for a filter of length L, perhaps we can use the first lg(L)~10 bits as the first hash function, the second lg(L) bits for the second hash function, and so on. I'm pretty sure substrings of hashes should be (linearly) independent of one another. This would allow us to get ~25 independent hash functions for the price of one actual call to SHA256. |
A couple of random posts on StackOverflow suggest that using different seeds results in independent hash functions (i.e. initialise your hash function with each seed, then hash the bigram as usual). The two parties in the protocol will have to agree on the sequence (set + ordering) of seeds. It's not clear whether the seeds need to be secret, or whether they can be chosen to be a fixed set of values (e.g. 0, ..., k-1). |
There is also the idea of using XOR-folding |
Weaknesses in our approach that can be exploited (as described in those attack papers):
|
I think we might be able to wrap this issue up. Opening follow up issues to track any loose threads.
@wilko77 can you look over the above and make sure I've captured everything before closing? |
There are two separate issues with independence:
|
In Who Is 1011011111…
1110110010? Automated Cryptanalysis of Bloom Filter Encryptions of Databases with Several Personal Identifiers Kroll and Steinmetzer present cryptanalysis and an attack on Bloom filters built from multiple identifiers.
Is that applicable to our use-case?
Discuss!
The text was updated successfully, but these errors were encountered: