Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.18] Fix race condition in PageListener #1353

Merged
merged 1 commit into from
Oct 24, 2024

Conversation

opensearch-trigger-bot[bot]
Copy link

Backport f62885a from #1351.

* Fix race condition in PageListener

This PR
- Introduced an `AtomicInteger` called `pagesInFlight` to track the number of pages currently being processed. 
- Incremented `pagesInFlight` before processing each page and decremented it after processing is complete
- Adjusted the condition in `scheduleImputeHCTask` to check both `pagesInFlight.get() == 0` (all pages have been processed) and `sentOutPages.get() == receivedPages.get()` (all responses have been received) before scheduling the `imputeHC` task. 
- Removed the previous final check in `onResponse` that decided when to schedule `imputeHC`, relying instead on the updated counters for accurate synchronization.

These changes address the race condition where `sentOutPages` might not have been incremented in time before checking whether to schedule the `imputeHC` task. By accurately tracking the number of in-flight pages and sent pages, we ensure that `imputeHC` is executed only after all pages have been fully processed and all responses have been received.

Testing done:
1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
2. added an IT for the above scenario.

Signed-off-by: Kaituo Li <[email protected]>

* make sure increment before schedule

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
(cherry picked from commit f62885a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.17%. Comparing base (71ddd8f) to head (de825d1).
Report is 1 commits behind head on 2.18.

Files with missing lines Patch % Lines
...ensearch/timeseries/transport/ResultProcessor.java 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               2.18    #1353      +/-   ##
============================================
- Coverage     80.19%   80.17%   -0.03%     
+ Complexity     5684     5682       -2     
============================================
  Files           533      533              
  Lines         23409    23410       +1     
  Branches       2333     2332       -1     
============================================
- Hits          18773    18769       -4     
- Misses         3540     3543       +3     
- Partials       1096     1098       +2     
Flag Coverage Δ
plugin 80.17% <85.71%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...imeseries/transport/ResultBulkTransportAction.java 70.58% <ø> (-8.83%) ⬇️
...ensearch/timeseries/transport/ResultProcessor.java 78.90% <85.71%> (+0.05%) ⬆️

... and 11 files with indirect coverage changes

@kaituo kaituo merged commit 63996b8 into 2.18 Oct 24, 2024
25 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants