Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception while calculating comparisons for multiparty linkage #387

Open
hardbyte opened this issue Jun 23, 2019 · 0 comments
Open

Exception while calculating comparisons for multiparty linkage #387

hardbyte opened this issue Jun 23, 2019 · 0 comments
Assignees

Comments

@hardbyte
Copy link
Collaborator

hardbyte commented Jun 23, 2019

@wilko77 I noticed your comment that the testing deployment was down and had a look to see what was going on. The anonlink-entity-service v1.11.0 has a traceback in the logs:

 [2019-06-11 00:27:09,208: ERROR/ForkPoolWorker-4] Task entityservice.tasks.stats.calculate_comparison_rate[aec9c7b3-2274-48a7-a220-36242ba08f16] raised unexpected: ValueError('expected at most 2 datasets, got 3',)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/var/www/entityservice/tasks/stats.py", line 18, in calculate_comparison_rate
    comparisons = get_total_comparisons_for_project(dbinstance, run['project_id'])
  File "/var/www/entityservice/database/selections.py", line 249, in get_total_comparisons_for_project
    raise ValueError(f'expected at most {expected_datasets} '
ValueError: expected at most 2 datasets, got 3 

Looking at get_total_comparisons_for_project in selections.py it gets the expected_datasets from the database (parties column of the projects table). Where the number of datasets that were got comes from this query:

SELECT bloomingdata.count as rows
from dataproviders, bloomingdata
where
    bloomingdata.dp=dataproviders.id AND dataproviders.project=%s

It seems to me that either a multiparty project is getting created with the projects.parties value not getting correctly set, or a 2 party linkage is uploading multiple bloomingdata entries for a single upload.

Looking into the project.parties value, we see in models/project.py that the number_parties is optional and defaults to 2:

        # Get optional fields from JSON data
        name = data.get('name', '')
        notes = data.get('notes', '')
        parties = data.get('number_parties', 2)
@hardbyte hardbyte self-assigned this Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant