Skip to content
/ clkhash Public

CLK hash: hash pii for entity matching

License

Notifications You must be signed in to change notification settings

data61/clkhash

Repository files navigation

CLK Hash

Clkhash Logo

codecov Documentation Status Unit Testing Typechecking Downloads

clkhash is a Python implementation of cryptographic linkage key hashing as described by Rainer Schnell, Tobias Bachteler, and Jörg Reiher in A Novel Error-Tolerant Anonymous Linking Code.

Installation

Install clkhash with all dependencies using pip:

pip install clkhash

Documentation

https://clkhash.readthedocs.io

Python API

To hash a CSV file of entities using the default schema:

from clkhash import clk, randomnames
fake_pii_schema = randomnames.NameList.SCHEMA
clks = clk.generate_clk_from_csv(open('fake-pii-out.csv','r'), 'secret', fake_pii_schema)

Command Line Interface

See Anonlink Client for a command line interface to clkhash.

Citing

Clkhash, and the wider Anonlink project is designed, developed and supported by CSIRO's Data61. If you use any part of this library in your research, please cite it using the following BibTex entry::

@misc{Anonlink,
  author = {CSIRO's Data61},
  title = {Anonlink Private Record Linkage System},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/data61/clkhash}},
}