01 Sep 23:57

wilko77

2fc0169

Version 1.15.1 Latest

Latest

Spring Cleaning Release

Dependency updates

Implemented in #687

Delete upload files on object store after ingestion

If a data provider uploads its data via the object store, we now clean up afterwards.

Implemented in #686

Fixed Record Linkage API tutorial

Adjusted to changes in the clkhash library.

Implemented in #684

Delete encodings from database at project deletion

Encodings will be deleted at project deletion, but only for projects created with this version or higher.

Implemented in #683

Assets 2

23 Aug 07:31

hardbyte

v1.15.0

e02c39f

Version 1.15.0

Highlights

Similarity scores are deduplicated

Previously candidate pairs that appear in more than one block would produce more than one similarity score.
The iterator that processing similarity scores now de-duplicates before storing them.

Implemented in: #660

Provided Block Identifiers are now hashed

We now hash the user provided block identifier before storing in DB.

Implemented in: #633

Failed runs return message indicating the failure reason

The run status for a failed run now includes a message attribute with information on what went wrong.

Implemented in: #624

Other changes

The run status endpoint now includes total_number_of_comparisons for completed runs.
Implemented in: #651

As usual lots of version upgrades - now using the latest stable redis and postgresql.

Assets 2

24 Feb 04:04

wilko77

v1.14.0

f94cdd1

Version 1.14.0

Highlights

API now supports directly downloading similarity scores from the internal object store

If the request includes the header RETURN-OBJECT-STORE-ADDRESS, the response will be a small json payload with
temporary download credentials to pull the binary similarity scores directly from the object store. The json object
has credentials and object keys::

{
  "credentials": {
    "AccessKeyId": "",
    "SecretAccessKey": "",
    "SessionToken": "",
    "Expiration": "<ISO 8601 datetime string>"
  },
  "object": {
      "endpoint": "<config.DOWNLOAD_OBJECT_STORE_SERVER>",
      "secure": "<config.DOWNLOAD_OBJECT_STORE_SECURE>",
      "bucket": "bucket_name",
      "path": "path"
  }
}

The binary file is serialized using anonlink.serialization, you can convert the stream into Python types with::

    mc = Minio(file_info['endpoint'], ...)
    candidate_pair_stream = mc.get_object(file_info['bucket'], file_info['path'])
    sims, (dset_is0, dset_is1), (rec_is0, rec_is1) = anonlink.serialization.load_candidate_pairs(candidate_pair_stream)

The following settings control the optional feature of using an external object store:

======================================= ==========================================
Environment Variable Helm Config
======================================= ==========================================
DOWNLOAD_OBJECT_STORE_SERVER anonlink.objectstore.downloadServer
DOWNLOAD_OBJECT_STORE_SECURE anonlink.objectstore.downloadSecure
DOWNLOAD_OBJECT_STORE_ACCESS_KEY anonlink.objectstore.downloadAccessKey
DOWNLOAD_OBJECT_STORE_SECRET_KEY anonlink.objectstore.downloadSecretKey
DOWNLOAD_OBJECT_STORE_STS_DURATION - (default 43200 seconds)
======================================= ==========================================

Implemented in: #594, #612, #613, #614

Service now uses sqlalchemy for database migrations

Sqlalchemy models have been added for all database tables, initial database setup
now uses alembic for migrations. The database and object store init scripts can now
be run multiple times without causing issues.

Implemented in #603, #611

New configurable limits on maximum number of candidate pairs

Protects the service from running out of memory due to excessive numbers of
candidate pairs being processed. An added side effect is the service now keeps
track of the number of candidate pairs in a run (as well as the number of comparisons).

The configurable is controlled by the following two environment variables, and their initial
default values::

SOLVER_MAX_CANDIDATE_PAIRS="100_000_000"
SIMILARITY_SCORES_MAX_CANDIDATE_PAIRS="500_000_000"

If a run exceeds these limits, the run is put into an error state and further processing is
abandoned to protect the service from running out of memory.

Implemented in #595, #605

Other changes

Ingress now supports a user supplied path. We no longer assume an nginx ingress controller. #587
Migrate off deprecated k8s chart repos #596, #588
Helm chart now uses standard recommended Kubernetes labels. #616
Fix an issue with case sensitivity in object store metadata #590
If the object store bucket doesn't exist it is now automatically created. #577
Ignore but log failures to delete from object store #576
Many dependency updates #578, #579, #580, #582, #581, #583, #596, #604, #609, #615
Update the base image, all base dependencies and migrated from minio-py v5 to v7 #601, #608, #610
CI e2e tests on Kubernetes will now correctly fail if the tests don't run. #618
Add optional pod annotations to init jobs. #619

Assets 2

15 Jun 13:57

wilko77

v1.13.0

e6e6d60

Version 1.13.0

Highlights

The entity service now supports user provided blocking information. This can reduce the amount of required comparisons significantly and thus allows for linkages between larger datasets.
The server can be configured to use an object store for dataset uploads. This allows the use of libraries such as boto3 or minio to improve reliability, especially for large uploads.

Docker Images

data61/anonlink-app:v1.13.0
data61/anonlink-nginx:v1.4.6
data61/anonlink-benchmark:v0.3.3

Breaking Changes

the similarity_score output type has been modified, it now returns a JSON array of JSON objects, where such an object looks like [[party_id_0, row_index_0], [party_id_1, row_index_1], score]. #464
Integration test configuration is now consistent with benchmark config. Instead of setting ENTITY_SERVICE_URL including /api/v1 now just set the host address in SERVER. #495
matching output type was removed. Use the equivalent groups instead. #458

Other Changes

use latest stable minio release #572
add section to API tutorial about uploads to object store #573
plus all the changes introduced in the alpha and beta versions below.

Assets 2

11 Jun 00:16

wilko77

v1.13.0-beta.3

29fbb5b

Version 1.13.0-beta3 Pre-release

Pre-release

Improved performance for blocks of small size #563
fix a problem with the upload to the external object store #564
updated documentation #567, #569

Assets 2

30 Apr 03:52

hardbyte

v1.13.0-beta.2

a3cfe26

Version 1.13.0-beta2 Pre-release

Pre-release

Adds support for users to supply blocking information along with encodings. Data can now be uploaded to
an object store and pulled by the Anonlink Entity Service instead of uploaded via the REST API.
This release includes substantial internal changes as encodings are now stored in Postgres instead of
the object store.

Feature to pull data from an object store and create temporary upload credentials. #537, #544, #551
Blocking implementation #510 #527,
Benchmark container now includes support for blocking #478, #541
Encodings are now stored in Postgres database instead of files in an object store. #516, #522
Start to add integration tests to complement our end to end tests. #520, #528
Use anonlink-client instead of clkhash #536
Use Python 3.8 in base image. #518
A base image is now used for all our Docker images. #506, #511, #517, #519
Binary encodings now stored internally with their encoding id. #505
REST API implementation for accepting clknblocks #503
Update Open API spec to version 3. Add Blocking API #479
CI Updates #476
Chart updates #496, #497, #539
Documentation updates (production deployment, debugging with PyCharm) #473, #504
Fix Jaeger #500, #523

Misc changes/fixes:

Detect invalid encoding size as early as possible #507
Use local benchmark cache #531
Cleanup docker-compose #533, #534, #547
Calculate number of comparisons accounting for user supplied blocks. #543

Try it out

You can pull this repository and try with Docker Compose. The Docker images are all hosted on Docker Hub:

Component	Docker Hub
Base Image	data61/anonlink-base
Backend/Worker	data61/anonlink-app
E2E Tests	data61/anonlink-test
Nginx Proxy	data61/anonlink-nginx
Benchmark	data61/anonlink-benchmark
Docs	data61/anonlink-docs-builder

Using Kubernetes (follow the detailed docs here:

helm repo add data61 https://data61.github.io/charts
helm repo update
helm install data61/entity-service --version 1.13.1 [--values...]

All the documentation, including tutorials can be found at https://anonlink-entity-service.readthedocs.io/en/latest/index.html

Assets 2

10 Feb 22:23

hardbyte

v1.13.0-beta

21d20e0

v1.13.0-beta Pre-release

Pre-release

Fixed a bug where a dataprovider could upload their clks multiple times in a project using the same upload token. (#463)
Fixed a bug where workers accepted work after failing to initialize their database connection pool. (#477)
Modified similarity_score output to follow the group format in preparation to extending this output type to more
parties. (#464)
Tutorials have been improved following an internal review. (#467)
Database schema and CLK upload api has been modified to support blocking. (#470)
Benchmarking results can now be saved to an object store without authentication. Allowing an AWS user to save to S3
using node permissions. (#490)
Removed duplicate/redundant tests. (#466)
Updated dependencies:
- We have enabled dependabot <https://dependabot.com/>_ on GitHub to keep our Python dependencies up to date.
- anonlinkclient now used for benchmarking. (#490)
- Chart dependencies redis-ha, postgres and minio all updated. (#496, #497)

Breaking Changes

the similarity_score output type has been modified, it now returns a JSON array of JSON objects, where such an object
looks like [[party_id_0, row_index_0], [party_id_1, row_index_1], score]. (#464)
Integration test configuration is now consistent with benchmark config. Instead of setting ENTITY_SERVICE_URL including
/api/v1 now just set the host address in SERVER. (#495)

Database Changes (Internal)

the dataproviders table uploaded field has been modified from a BOOL to an ENUM type (#463)
The projects table has a new uses_blocking field. (#470)

Docker Images

data61/anonlink-app:v1.13.0-beta
data61/anonlink-nginx:v1.4.6-beta
data61/anonlink-benchmark:v0.3.1

Install to Kubernetes using the helm chart:

helm repo add data61 https://data61.github.io/charts
helm repo update
helm install data61/entity-service [--values...]

Assets 2

05 Nov 05:03

wilko77

v1.13.0-alpha

f72c466

v1.13.0-alpha Pre-release

Pre-release

fixed bug where invalid state changes could occur when starting a run (#459)
matching output type has been removed as redundant with the groups output with 2 parties. (#458)
Update dependencies:
- requests from 2.21.0 to 2.22.0 (#459)

Breaking Change

matching output type is not available anymore. (#458)

Assets 2

18 Oct 07:09

gusmith

v1.12.0

f0ef5d8

v1.12.0

Created docker images:

data61/anonlink-app:v1.12.0
data61/anonlink-nginx:v1.4.5
data61/anonlink-benchmark:v0.3.0

Changelog:

Logging configurable in the deployed entity service by using the key loggingCfg. (#448)
Several old settings have been removed from the default values.yaml and docker
files which have been replaced by CHUNK_SIZE_AIM (#414):
- SMALL_COMPARISON_CHUNK_SIZE
- LARGE_COMPARISON_CHUNK_SIZE
- SMALL_JOB_SIZE
- LARGE_JOB_SIZE
Remove ENTITY_MATCH_THRESHOLD environment variable (#444)
Celery configuration updates to solve threads and memory leaks in deployment. (#427)
Update docker-compose files to use these new preferred configurations.
Update helm charts with preferred configuration default deployment is a minimal working deployment.
New environment variables: CELERY_DB_MIN_CONNECTIONS, FLASK_DB_MIN_CONNECTIONS, CELERY_DB_MAX_CONNECTIONS
and FLASK_DB_MAX_CONNECTIONS to configure the database connections pool. (#405)
Simplify access to the database from services relying on a single way to get a connection via a connection pool. (#405)
Deleting a run is now implemented. (#413)
Added some missing documentation about the output type groups (#449)
Sentinel name is configurable. (#436)
Improvement on the Kubernetes deployment test stage on Azure DevOps:
- Re-order cleaning steps to first purge the deployment and then deleting the remaining. (#426)
- Run integration tests in parallel, reducing pipeline stage Kubernetes deployment tests from 30 minutes to 15 minutes. (#438)
- Tests running on a deployed entity-service on k8s creates an artifact containing all the logs of all the containers, useful for debugging. (#445)
- Test container not restarted on test failure. (#434)
Benchmark improvements:
- Benchmark output has been modified to handle multi-party linkage.
- Benchmark to handle more than 2 parties, being able to repeat experiments.
  and pushing the results to minio object store. (#406, #424 and #425)
- Azure DevOps benchmark stage runs a 3 parties linkage. (#433)
Improvements on Redis cache:
- Refactor the cache. (#430)
- Run state kept in cache (instead of fully relying on database) (#431 and #432)
Update dependencies:
- anonlink to v0.12.5. (#423)
- redis to from 3.2.0 to 3.2.1 (#415)
- alpine from 3.9 to 3.10.1 (#404)
Add some release documentation. (#455)

Assets 2

16 Oct 22:48

gusmith

v1.12-b1

c12f73b

v1.12 pre release Pre-release

Pre-release

We are creating this tag to be able to deploy an entity-service having all the necessary configurations introduced in develop required for our testing service on kubernetes.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spring Cleaning Release

Highlights

Other changes

Highlights

Other changes

Highlights

Docker Images

Breaking Changes

Try it out

Breaking Changes

Database Changes (Internal)

Docker Images

Breaking Change

Releases: data61/anonlink-entity-service

Version 1.15.1

Spring Cleaning Release

Version 1.15.0

Highlights

Other changes

Version 1.14.0

Highlights

Other changes

Version 1.13.0

Highlights

Docker Images

Breaking Changes

Version 1.13.0-beta3

Version 1.13.0-beta2

Try it out

v1.13.0-beta

Breaking Changes

Database Changes (Internal)

Docker Images

v1.13.0-alpha

Breaking Change

v1.12.0

v1.12 pre release