new version

0.18.3

Allow wider range of dependency versions after changes were inadvertently dropped from 0.18.2 release.

0.18.2

Memory and performance improvements on large files. #675

0.18.1

Allow wider range of dependency versions

0.18.0

Performance improvements by caching hashes of tokens. #664
Switch to using blakeHash for benchmarking. #664
Remove implicit dependency on setuptools. #663
Migrate to pyproject.toml for dependency management and packaging. #659

0.17.0

Remove use of bitarray fork as upstream project now publishes wheels. #557, #567, #573
Update dependencies

0.16.1

generate_clk_from_csv and generate_clks now accept an optional max_workers argument. This means systems that can't create sub-processes such as celery workers and AWS lambda jobs can now use clkhash. #424
fixed bug in strategy definition in the schema. #383
fixed doc for numeric comparison. #385
removed support for Python 3.5 #406

0.16.0

Removes rest_client and cli modules. This functionality has been migrated to anonlink-client.

Breaking Changes

clkhash continues its metamorphosis from a client to a support library. Clkhash now returns the computed CLKs as bitarrays and not as base64-serialized strings any more. (#370)

0.15.2

Fixes bug validating linkage schemas with ignored fields. #342
Added warnings about upcoming removal of rest_client and cli. This functionality has been migrated to anonlink-client
Update dependencies

0.15.1

fixed issue where NumericComparison couldn't tokenize empty inputs #323

0.15.0

Introduced linkage schema v3 that permits you to specify different comparison techniques. The hashing schema documentation provides more details. There is also a tutorial describing the different comparions techniques.

CLI can handle rate limiting from the entity service #277
introduce hypothesis testing #280
improvements to Azure CI pipeline #284, #294, #312, #313
Added ability to define alternative comparison techniques #286
Exact comparison #290
improved schema documentation #293
update rest client #297
renamed the strategies #302
Switch to using a fork of bitarray that distributes binary wheels. This means installing clkhash no longer requires a c compiler. #308
added new command for schema conversion to clkutil #309
update randomnames schema #311
addressed warnings in tests #315
added numeric comparison #316
remove mapping type from tutorials and cli #317
tutorial about comparisons #318

Breaking Changes

The cli method hash requires only one secret instead of two. #303
The clks generated with clkhash <= 0.14.0 are not compatible with clks from version 0.15.0 onwards.

0.14.0

Fix bug where empty inputs don't generate tokens.
CLI commands to delete runs and projects. #265
Migrate to Azure DevOps for CI testing. #262
Synthetic data generation using distributions. #271, #275

0.13.0

Fix example and test linkage schemas using v2.
Fix mismatch between double hash and blake hash key requirement.
Update to use newer anonlink-entity-service api.
Updates to dependencies.
Better test coverage
CI now executes tutorial notebooks
CI now automatically releases to PyPi

0.12.1

Support packaging the command line tool into a windows executable.
Additional testing

0.12.0

New describe command added to cli
Bugfix to ensure we run on pypy3
Updates to dependencies

0.11.3

Bugfix in restclient to support Python 3.7
Bugfix in progress messages.
Dependency updates.

0.11.2

Updates to dependencies.
Addition of code coverage metrics from travis, appveyor.
Abstract rest calls out of command line tool. More comprehensive testing of cli and rest client.

0.11.1

Changes to the clkhash command line tool to support new entity service api.

Minor changes

Code format update and general cleanup following internal review.
Tutorial's schema was missing value definitions.
Removal of HKDFConfig

0.11.0

Introduced a new schema system that permits you to:

change the settings for hashing, such as the hash length and the number of bits set per token,
change the tokenisation settings for each field,
provide a spec against which the input is validated, so you know that whatever you're hashing has been formatted correctly,
define sentinels for missing values with then will be exempt from validation and can optionally be replaced with another value (e.g.: 'Null' -> ''),
choose between three different hashing schemes.

The hashing schema documentation provides more details.

Breaking changes

With the new schema, the old schema format will no longer be accepted. This is fine since the previous schema didn't do much.
You must now provide a schema to perform hashing where previously it was optional.

0.10.1

Major documentation updates.
Improvements and bug fix in data generation.
CI fix disable storing artifacts on AppVeyor.

0.10.0

Introduced a more secure variant of the double hash encoding scheme.
Introduced a Blake2 based encoding scheme. Still working on documentation.
Concurrent hashing now works on Windows as well as Linux. This has also been backported to Python 2.
Command line tool now outputs basic statistics while hashing.
Command line tool is now officially supported on Windows.

We now build clkhash with continuous integration tools that anyone can access Travis CI and AppVeyor.

0.9.0

Adds the option to perform XOR folding. Schnell (2016) claims that it improves privacy whilst having little effect on accuracy; see XOR-Folding for hardening Bloom Filter based Encryptions for PPRL for details.
Supports online documentation at http://clkhash.readthedocs.io/.
Fixes minor inconsistency between the treatment of base64 string in Python 2 and Python 3.
Permits changing of fields' weight in the hash. For example, if the surname field has a weight of 2 and the first name field has a weight of 1, then the similarity score between two hashes is twice as dependent on the surname. We do this by permitting the surname to set twice as many bits in the hash.

0.8.1

Adds a simple progress bar for the command line utility.
Added type checking with MyPy for both Python 2 and 3.

Try run the type checker yourself with:

pip install mypy
mypy clkhash --ignore-missing-imports --strict-optional --no-implicit-optional --disallow-untyped-calls

0.8.0

Each identifier is hashed using different keys derived with a HKDF.

Breaking Changes

The bloomfilter api has changed. In calculate_bloom_filters(dataset, schema, keys) the keys have changed into two lists of keys (from just two keys).
Added cryptography dependency. Removing support Python 3.3.

Other Changes

Several improvements to continuous testing with Jenkins - such as adding in code coverage, posting github status checks.

More e2e testing.

0.7.3

Soft launch - First version on pypi.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

new version

0.18.3

0.18.2

0.18.1

0.18.0

0.17.0

0.16.1

0.16.0

Breaking Changes

0.15.2

0.15.1

0.15.0

Breaking Changes

0.14.0

0.13.0

0.12.1

0.12.0

0.11.3

0.11.2

0.11.1

Minor changes

0.11.0

Breaking changes

0.10.1

0.10.0

0.9.0

0.8.1

0.8.0

Breaking Changes

Other Changes

0.7.3

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

new version

0.18.3

0.18.2

0.18.1

0.18.0

0.17.0

0.16.1

0.16.0

Breaking Changes

0.15.2

0.15.1

0.15.0

Breaking Changes

0.14.0

0.13.0

0.12.1

0.12.0

0.11.3

0.11.2

0.11.1

Minor changes

0.11.0

Breaking changes

0.10.1

0.10.0

0.9.0

0.8.1

0.8.0

Breaking Changes

Other Changes

0.7.3