Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix paragraph deletion problems #2259

Merged
merged 14 commits into from
Jun 20, 2024
Merged

Fix paragraph deletion problems #2259

merged 14 commits into from
Jun 20, 2024

Conversation

jotare
Copy link
Contributor

@jotare jotare commented Jun 19, 2024

Description

Paragraph deletion is currently done by paragraph_id, this sometimes lead to not deleted paragraphs. This PR adds the capability to delete paragraphs by field_id

How was this PR tested?

Integration tests

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Benchmark suite Current: d45903d Previous: 0d03d9f Ratio
tests/search/unit/search/test_fetch.py::test_highligh_error 2248.574612098063 iter/sec (stddev: 0.0000014307247281066337) 2841.0684406726436 iter/sec (stddev: 0.000004954958228416619) 1.26

This comment was automatically generated by workflow using github-action-benchmark.

self.brain.paragraphs_to_delete.append(
ids.FieldId(rid=self.rid, field_id=field_key, subfield_id=subfield).full()
)
# TODO: Bw/c, remove this when the new paragraph deletion by field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is paragraph deletion by field?
Isn't that what we have implemented in this pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe the new is not the best word. But yes, is the one implemented in this PR

Comment on lines 245 to 247
# is promoted
self.brain.paragraphs_to_delete.append(
f"{self.rid}/{field_key}/{paragraph.start}-{paragraph.end}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this be problematic? AFAIK the index writer is now expecting always the field key, not including the start-end paragraphs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the index to expect a mix of paragraph_id and field_id. It tries to remove from both terms. I think it's the safer way to promote this without breaking anything. Although, if we plan to do a rollover, maybe it's not really important

@jotare jotare merged commit da49fac into main Jun 20, 2024
54 checks passed
@jotare jotare deleted the reproduce-concurrences-error branch June 20, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants