Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] The index adds automatic force merge function to reduce segments #16376

Open
kkewwei opened this issue Oct 18, 2024 · 2 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Oct 18, 2024

Is your feature request related to a problem? Please describe

In our product, for small index (about 20g-), the frequency of writing/updating is not high, we have to frequently execute forceMerge segment to reduce the segments to improve query performance. As opensearch use TieredMergePolicy, which just merges segments of approximately equal size.

As ISM provides abality to forcemerge periodically, brings a lot of query glitches, the new created segment can't be merged quickly.

Describe the solution you'd like

Customizing/extending MergePolicy a supported API and designed for users in lucene, If we should support another MergePolicy in opensearch, which can auto merge as much as possible to reduce the number of segments.

If this is reasonable, I will follow up with how to design rules to auto force merge.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@kkewwei kkewwei added enhancement Enhancement or improvement to existing feature or request untriaged labels Oct 18, 2024
@andrross
Copy link
Member

@msfroh, can you follow up with any tuning parameters available for the existing merge policies which might be able to solve the problem here?

@msfroh
Copy link
Collaborator

msfroh commented Oct 23, 2024

Hey @kkewwei -- I think there might be a few knobs you can try tuning on TieredMergePolicy to merge small segments without making the merges too expensive:

  1. index.merge.policy.floor_segment: This sets the size of the lowest "tier", where everything <= the value is considered part of that same tier and eligible to participate in a merge (where the output is one tier higher). The default is 2MB. If you increase it, then more small segments can be eligible to get merged. (I think something like 50 or 100MB is probably more reasonable for all but the tiniest indices.) Small segments may get merged multiple times as a result (e.g. 2MB segments get merged into a 20MB segment, then that gets merged with other 2MB segments to produce a 38MB segment, etc). Usually that's fine, though, as the merges under 100MB tend to be really fast.
  2. index.merge.policy.segments_per_tier: This is the number of segments in the same tier that will get selected for a merge. The default value of 10 means that you need a lot of segments in the same tier before a merge kicks in. Lowering it to something like 5 will encourage small segments to get merged and generally reduce the overall segment count. It does mean that a little more overall compute effort will be spent on merging. For index-heavy workloads, maybe it's not worth it, but for search-heavy workloads, the lower value is usually better.

There might be some other parameters worth tuning, but I think those two would be a good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants