-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Anonlink binary format for serialisation #339
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to ship once the deletion is dealt with. A few other comments inline.
csvwriter.writerows(partial_sparse_result) | ||
bytes_iter, file_size \ | ||
= anonlink.serialization.dump_candidate_pairs_iter(chunk_results) | ||
iter_stream = iterable_to_stream(bytes_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - I like the removal of a write to local disk
generate_code(12)) | ||
merged_file_stream = iterable_to_stream(merged_file_iter) | ||
try: | ||
mc.put_object(Config.MINIO_BUCKET, merged_file_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they partial merge files deleted after merge is complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They’re deleted on line 257 :)
file0 = heapq.heappop(files) | ||
file1 = heapq.heappop(files) | ||
merged_file = _merge_files(mc, log, file0, file1) | ||
heapq.heappush(files, merged_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps delete file0 and file1 at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are deleted on line 257.
assert dset_i1 == 1 | ||
mapping[str(rec_i0)] = str(rec_i1) # Strings make JSON happy. | ||
mapping = {str(i): str(j) | ||
for i, j in anonlink.solving.pairs_from_groups(groups)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can get rid of the cast to str
- are the candidate_pairs
loaded via anonlink.serialization.load_candidate_pairs(score_file)
strings already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are integers when they are loaded by load_candidate_strings
. greedy_solve
requires an array of integers and it will not accept strings.
# Process the CSV into JSON | ||
csv_text_stream = io.TextIOWrapper(csv_data_stream, encoding="utf-8") | ||
sims_data_stream = mc.get_object(config.MINIO_BUCKET, filename) | ||
# TODO: Below is an Anonlink 'private' API. It should be made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please create an issue in anonlink for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: data61/anonlink#191.
The file deletion is already there. Pls approve. |
NB: There is a bug in anonlink that causes the tests to fail. This is fixed in data61/anonlink#190, which must be followed by a release and a version bump in the requirements file.