Sort similarity scores #335

nbgl · 2019-03-05T05:54:23Z

Similarity scores need to be sorted before solving. There are some helper tools in anonlink.concurrency for this.

The text was updated successfully, but these errors were encountered:

nbgl · 2019-03-06T02:42:32Z

I’ve disabled two tests in test_result_correctness.py to account for the output of anonlink v0.11 being correct, but the Entity Service being still incorrect. They need to be reenabled.

hardbyte · 2019-06-28T00:48:40Z

I think this may have been incorrectly closed by a comment in #339. The entity service doesn't do a merge sort on similarity scores, however the solving occurs in a single celery task (on a high memory machine) and the anonlink library sorts before solving. @nbgl is there any reason to still want the entity service to merge sort (other than if/when we have to support a parallel solver?)

See:
#336

nbgl · 2019-06-28T06:40:53Z

The greedy solver needs to consider scores from highest to lowest to maximise accuracy. Sorting before solving is the usual way of doing this (I describe a more efficient way in data61/anonlink#212, but it is more complex to implement).

The Entity Service does actually sort similarity scores now, so this issue may be closed. The code is here as part of #339.

A concern might be that this aggregation is single-threaded, so it might not be the most efficient. Parallelising this merge sort is not a research question, but a software engineering one. (Not hard, but annoying.) But this should be identified as a bottleneck before any work is done, and it should be a separate issue.

hardbyte · 2019-06-29T00:05:15Z

Thanks for the clarification Jakub

nbgl self-assigned this Mar 5, 2019

hardbyte added this to the Entity Service v1.10 milestone Mar 5, 2019

hardbyte added bug P0: critical labels Mar 5, 2019

nbgl mentioned this issue Mar 6, 2019

Use anonlink’s new similarities api #336

Merged

nbgl mentioned this issue Mar 13, 2019

Use Anonlink binary format for serialisation #339

Merged

nbgl closed this as completed in #339 Mar 15, 2019

hardbyte reopened this Jun 28, 2019

hardbyte unassigned nbgl Jun 28, 2019

hardbyte removed this from the Entity Service v1.10 milestone Jun 28, 2019

hardbyte closed this as completed Jun 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort similarity scores #335

Sort similarity scores #335

nbgl commented Mar 5, 2019

nbgl commented Mar 6, 2019

hardbyte commented Jun 28, 2019 •

edited

Loading

nbgl commented Jun 28, 2019

hardbyte commented Jun 29, 2019

Sort similarity scores #335

Sort similarity scores #335

Comments

nbgl commented Mar 5, 2019

nbgl commented Mar 6, 2019

hardbyte commented Jun 28, 2019 • edited Loading

nbgl commented Jun 28, 2019

hardbyte commented Jun 29, 2019

hardbyte commented Jun 28, 2019 •

edited

Loading