- Allow wider range of dependency versions after changes were inadvertently dropped from 0.18.2 release.
- Memory and performance improvements on large files. #675
- Allow wider range of dependency versions
- Performance improvements by caching hashes of tokens. #664
- Switch to using
blakeHash
for benchmarking. #664 - Remove implicit dependency on
setuptools
. #663 - Migrate to pyproject.toml for dependency management and packaging. #659
- Remove use of bitarray fork as upstream project now publishes wheels. #557, #567, #573
- Update dependencies
generate_clk_from_csv
andgenerate_clks
now accept an optionalmax_workers
argument. This means systems that can't create sub-processes such as celery workers and AWS lambda jobs can now useclkhash
. #424- fixed bug in
strategy
definition in the schema. #383 - fixed doc for numeric comparison. #385
- removed support for Python 3.5 #406
Removes rest_client
and cli
modules. This functionality has
been migrated to anonlink-client.
- clkhash continues its metamorphosis from a client to a support library. Clkhash now returns the computed CLKs as bitarrays and not as base64-serialized strings any more. (#370)
- Fixes bug validating linkage schemas with ignored fields. #342
- Added warnings about upcoming removal of
rest_client
andcli
. This functionality has been migrated to anonlink-client - Update dependencies
- fixed issue where NumericComparison couldn't tokenize empty inputs #323
Introduced linkage schema v3 that permits you to specify different comparison techniques. The hashing schema documentation provides more details. There is also a tutorial describing the different comparions techniques.
- CLI can handle rate limiting from the entity service #277
- introduce hypothesis testing #280
- improvements to Azure CI pipeline #284, #294, #312, #313
- Added ability to define alternative comparison techniques #286
- Exact comparison #290
- improved schema documentation #293
- update rest client #297
- renamed the strategies #302
- Switch to using a fork of
bitarray
that distributes binary wheels. This means installing clkhash no longer requires a c compiler. #308 - added new command for schema conversion to clkutil #309
- update randomnames schema #311
- addressed warnings in tests #315
- added numeric comparison #316
- remove mapping type from tutorials and cli #317
- tutorial about comparisons #318
- The cli method
hash
requires only one secret instead of two. #303 - The clks generated with
clkhash
<= 0.14.0 are not compatible with clks from version 0.15.0 onwards.
- Fix bug where empty inputs don't generate tokens.
- CLI commands to delete runs and projects. #265
- Migrate to Azure DevOps for CI testing. #262
- Synthetic data generation using distributions. #271, #275
- Fix example and test linkage schemas using v2.
- Fix mismatch between double hash and blake hash key requirement.
- Update to use newer anonlink-entity-service api.
- Updates to dependencies.
- Better test coverage
- CI now executes tutorial notebooks
- CI now automatically releases to PyPi
- Support packaging the command line tool into a windows executable.
- Additional testing
- New describe command added to cli
- Bugfix to ensure we run on pypy3
- Updates to dependencies
- Bugfix in restclient to support Python 3.7
- Bugfix in progress messages.
- Dependency updates.
- Updates to dependencies.
- Addition of code coverage metrics from travis, appveyor.
- Abstract rest calls out of command line tool. More comprehensive testing of cli and rest client.
Changes to the clkhash command line tool to support new entity service api.
- Code format update and general cleanup following internal review.
- Tutorial's schema was missing value definitions.
- Removal of
HKDFConfig
Introduced a new schema system that permits you to:
- change the settings for hashing, such as the hash length and the number of bits set per token,
- change the tokenisation settings for each field,
- provide a spec against which the input is validated, so you know that whatever you're hashing has been formatted correctly,
- define sentinels for missing values with then will be exempt from validation and can optionally be replaced with another value (e.g.: 'Null' -> ''),
- choose between three different hashing schemes.
The hashing schema documentation provides more details.
- With the new schema, the old schema format will no longer be accepted. This is fine since the previous schema didn't do much.
- You must now provide a schema to perform hashing where previously it was optional.
- Major documentation updates.
- Improvements and bug fix in data generation.
- CI fix disable storing artifacts on AppVeyor.
- Introduced a more secure variant of the double hash encoding scheme.
- Introduced a Blake2 based encoding scheme. Still working on documentation.
- Concurrent hashing now works on Windows as well as Linux. This has also been backported to Python 2.
- Command line tool now outputs basic statistics while hashing.
- Command line tool is now officially supported on Windows.
We now build clkhash with continuous integration tools that anyone can access Travis CI and AppVeyor.
- Adds the option to perform XOR folding. Schnell (2016) claims that it improves privacy whilst having little effect on accuracy; see XOR-Folding for hardening Bloom Filter based Encryptions for PPRL for details.
- Supports online documentation at http://clkhash.readthedocs.io/.
- Fixes minor inconsistency between the treatment of base64 string in Python 2 and Python 3.
- Permits changing of fields' weight in the hash. For example, if the
surname
field has a weight of 2 and thefirst name
field has a weight of 1, then the similarity score between two hashes is twice as dependent on the surname. We do this by permitting the surname to set twice as many bits in the hash.
- Adds a simple progress bar for the command line utility.
- Added type checking with MyPy for both Python 2 and 3.
Try run the type checker yourself with:
pip install mypy
mypy clkhash --ignore-missing-imports --strict-optional --no-implicit-optional --disallow-untyped-calls
Each identifier is hashed using different keys derived with a HKDF.
-
The
bloomfilter
api has changed. Incalculate_bloom_filters(dataset, schema, keys)
the keys have changed into two lists of keys (from just two keys). -
Added cryptography dependency. Removing support Python 3.3.
Several improvements to continuous testing with Jenkins - such as adding in code coverage, posting github status checks.
More e2e testing.
Soft launch - First version on pypi.