-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix paragraph deletion problems #2259
Conversation
Keep Bw/c with old paragraph_id deletion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark
Benchmark suite | Current: d45903d | Previous: 0d03d9f | Ratio |
---|---|---|---|
tests/search/unit/search/test_fetch.py::test_highligh_error |
2248.574612098063 iter/sec (stddev: 0.0000014307247281066337 ) |
2841.0684406726436 iter/sec (stddev: 0.000004954958228416619 ) |
1.26 |
This comment was automatically generated by workflow using github-action-benchmark.
self.brain.paragraphs_to_delete.append( | ||
ids.FieldId(rid=self.rid, field_id=field_key, subfield_id=subfield).full() | ||
) | ||
# TODO: Bw/c, remove this when the new paragraph deletion by field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is paragraph deletion by field?
Isn't that what we have implemented in this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe the new is not the best word. But yes, is the one implemented in this PR
# is promoted | ||
self.brain.paragraphs_to_delete.append( | ||
f"{self.rid}/{field_key}/{paragraph.start}-{paragraph.end}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this be problematic? AFAIK the index writer is now expecting always the field key, not including the start-end paragraphs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the index to expect a mix of paragraph_id and field_id. It tries to remove from both terms. I think it's the safer way to promote this without breaking anything. Although, if we plan to do a rollover, maybe it's not really important
Description
Paragraph deletion is currently done by paragraph_id, this sometimes lead to not deleted paragraphs. This PR adds the capability to delete paragraphs by field_id
How was this PR tested?
Integration tests