Releases: data61/anonlink
Release 0.9.0
This release contains a major overhaul of Anonlink’s API and introduces support for multi-party linkage.
The changes are all additive, so the previous API continues to work. That API has now been deprecated and will be removed in a future release. The deprecation timeline is:
- v0.9.0: old API deprecated
- v0.10.0: use of old API raises a warning
- v0.11.0: remove old API
Major changes
- Introduce abstract similarity functions. The Sørensen–Dice coefficient is now just one possible similarity function.
- Implement Hamming similarity as a similarity function.
- Permit linkage of records other than CLKs (BYO similarity function).
- Similarity functions now return multiple contiguous arrays instead of a list of tuples.
- Candidate pairs from similarity functions are now always sorted.
- Introduce a standard type for storing candidate pairs. This is now used consistently throughout the API.
- Provide a function for multiparty candidate generation. It takes multiple datasets and compares them against each other using a similarity function.
- Extend the greedy solver to multiparty problems.
- The greedy solver also takes the new candidate pairs type.
- Implement serialisation and deserialisation of candidate pairs.
- Multiple files with serialised candidate pairs can be merged without loading everything into memory at once.
- Introduce type annotations in the new API.
Minor changes
- Automatically test on Python 3.7.
- Remove support for Python 3.5 and below.
- Update Clkhash dependency to 0.11.
- Minor documentation and style in
anonlink.concurrency
. - Provide a convenience function for generating valid candidate pairs from a chunk.
- Change the format of a chunk and move the type definition to
anonlink.typechecking
.
See the changelog for details.
Release 0.8.2
Minor updates:
- Fix discrepancies between Python and C++ versions #102
- Utility added to
anonlink/concurrency.py
help with chunking. - Better Github status messages posted by jenkins.
Release 0.8.1
Just minor fixes and improvements in this release.
Release 0.8.0
Fix to greedy solver, so that mappings are set by the first match, not repeatedly overwritten. #89
Other improvements
- Order of k and threshold parameters now consistent across library
- Limit size of
k
to prevent OOM DoS - Fix misaligned pointer handling #77
Install from Pypi:
pip install anonlink==0.8.0
0.7.0
Introduces support for comparing "arbitrary" length cryptographic linkage keys.
Benchmark is much more comprehensive and more comparable between releases - see the
readme for an example report.
Other improvements
- Internal C/C++ cleanup/refactoring and optimization.
- Expose the native popcount implementation to Python.
- Bug fix to avoid configuring a logger.
- Testing is now with
py.test
and runs on travis-ci
You can test the release from PyPi:
$ pip install anonlink==0.7.0
$ pip install clkhash
$ python -m anonlink.benchmark
0.6.2
Available on PyPi:
$ pip install anonlink==0.6.2
To run the benchmarks first install clkhash
:
$ pip install clkhash
$ python -m anonlink.benchmark
100000 x 1024 bit popcounts in 0.016641 seconds
Popcount speed: 733.55 MiB/s
Size 1 | Size 2 | Comparisons | Compute Time | Million Comparisons per second
1000 | 1000 | 1000000 | 0.073s | 13.710
2000 | 2000 | 4000000 | 0.129s | 31.024
3000 | 3000 | 9000000 | 0.247s | 36.464
4000 | 4000 | 16000000 | 0.406s | 39.425
5000 | 5000 | 25000000 | 0.510s | 49.067
6000 | 6000 | 36000000 | 0.533s | 67.603
7000 | 7000 | 49000000 | 0.543s | 90.299
8000 | 8000 | 64000000 | 0.594s | 107.682
9000 | 9000 | 81000000 | 0.627s | 129.188
10000 | 10000 | 100000000 | 0.824s | 121.289
20000 | 20000 | 400000000 | 2.902s | 137.815
Single Core:
5000 | 5000 | 25000000 | 0.243s | 102.941
(These results from a high end laptop)
Notable changes since 0.5.x
:
- client side code has been removed
- C/C++ performance improvements
- packaging and testing improvements
- testing/benchmarking against clkhash
0.8