Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two flaky tests #30278

Merged
merged 5 commits into from
Feb 11, 2024
Merged

Fix two flaky tests #30278

merged 5 commits into from
Feb 11, 2024

Conversation

shunping
Copy link
Contributor

@shunping shunping commented Feb 9, 2024

  • For test of testCloseVisibleToAwaitCompletionCallerAndProducer

    • It is flaky due to a race condition.
    • There are two threads running (future and future2). Thread 2 may execute and close the observer before Thread 1 reaching the while loop. This will cause BeamFnDataInboundObserver.CloseException thrown at calling future.get().
    • The fix is just to let Thread2 wait a little time so that Thread 1 can reach the infinite while loop before the observer is closed by Thread 2.
    • I was able to reproduce the flake 10/10 prior to the fix, but after that I have tested it about 30 times with no failure.
  • For test testThreadsAreAddedOnlyAsNeededWithContention, my attempt is to increase the threshold so we can allow more extra threads being created.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@github-actions github-actions bot added the java label Feb 9, 2024
@shunping shunping force-pushed the flaky-test branch 2 times, most recently from 410b069 to 907d784 Compare February 9, 2024 15:39
@shunping shunping changed the title Fix a flaky test that is caused by race condition Fix a flaky test caused by race condition Feb 9, 2024
@@ -165,6 +165,7 @@ public void testCloseVisibleToAwaitCompletionCallerAndProducer() throws Exceptio
Future<?> future2 =
executor.submit(
() -> {
Thread.sleep(500); // give the previous future some time to reach the inf while loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Busy sleep is generally not preferable. Consider use wait/notify which sounds exactly what we need here? That is block this thread until future1 reaches while(true)

Copy link
Contributor

github-actions bot commented Feb 9, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@shunping
Copy link
Contributor Author

shunping commented Feb 9, 2024

Run Java PreCommit

@shunping shunping changed the title Fix a flaky test caused by race condition Fix two flaky tests Feb 9, 2024
@shunping
Copy link
Contributor Author

shunping commented Feb 9, 2024

Run Java PreCommit

@@ -553,7 +553,7 @@ public void testThreadsAreAddedOnlyAsNeededWithContention() throws Exception {
LOG.info("Created {} threads to execute at most 100 parallel tasks", largestPool);
// Ideally we would never create more than 100, however with contention it is still possible
// some extra threads will be created.
assertTrue(largestPool <= 104);
assertTrue(largestPool <= 110);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, you mind explain briefly about the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment above the change says, there could be more than 100 threads created in the UnboundedScheduledExecutorService. The previous threshold is set to 104 but it causes some flakiness recently. I attempt to increase to a little bit more to see if it can stop or at least reduce the chance of failure.

@@ -165,6 +171,11 @@ public void testCloseVisibleToAwaitCompletionCallerAndProducer() throws Exceptio
Future<?> future2 =
executor.submit(
() -> {
synchronized (isReady) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, actually this cannot be synchronized (either using this or isReady). Because synchronized block will block other thread's same synchronized block. Here it will wait for signal indefinitely and cause deadlock.

Copy link
Contributor Author

@shunping shunping Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using synchronized block on isReady works here. I don't see there is a deadlock.

(1) If it runs the first synchronized block first, then it will call notify() and isReady is set to true. Then when the second synchronized block is executed, it will not reach the wait() call because isReady is true.

(2) If it runs the second synchronized block first, then it will wait() and give up the lock. Then in the first synchronized block, it will pick up the lock, set isReady to true, and then call notify().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I confused myself, LGTM

@Abacn Abacn merged commit 6d10e0f into apache:master Feb 11, 2024
17 of 19 checks passed
@shunping shunping deleted the flaky-test branch February 13, 2024 14:56
@shunping
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants