Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hangs due to tasks being stuck inside of local queues #104

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

notgull
Copy link
Member

@notgull notgull commented Feb 24, 2024

This should catch the errors from earlier.

@notgull
Copy link
Member Author

notgull commented Feb 24, 2024

I've confirmed that these tests pass on master with 7592d41 dropped

Copy link
Contributor

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bevy was failing because we were ticking on another thread that wasn't spawning new tasks as a way to send and execute futures on a single target thread. All of these seem to either block on or locally tick the executor in one way or another.

use std::time::Duration;

fn do_run<Fut: Future<Output = ()>>(mut f: impl FnMut(Arc<Executor<'static>>) -> Fut) {
// This should not run for longer than two minutes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to split into multiple to figure out which specific ones are deadlocking, and which are working.

@notgull
Copy link
Member Author

notgull commented Mar 2, 2024

It turns out that with the current strategy it is possible for tasks to
be stuck in the local queue without any hope of being picked back up.
In practice this seems to happen when the only entities polling the
system are tickers, as opposed to runners. Since tickets don't steal
tasks, it is possible for tasks to be left over in the local queue that
don't filter out.

One possible solution is to make it so tickers steal tasks, but this
kind of defeats the point of tickers. So I've instead elected to replace
the current strategy with one that accounts for the corner cases with
local queues.

The main difference is that I replace the Sleepers struct with two
event_listener::Event's. One that handles tickers subscribed to the
global queue and one that handles tickers subscribed to the local queue.
The other main difference is that each local queue now has a reference
counter. If this count reaches zero, no tasks will be pushed to this
queue. Only runners increment or decrement this counter.

This makes the previously instituted tests pass, so hopefully this works
for most use cases.

@notgull notgull marked this pull request as ready for review March 2, 2024 05:18
@notgull notgull requested a review from fogti March 2, 2024 05:19
@notgull notgull changed the title Add tests with more complicated futures Fix hangs due to tasks being stuck inside of local queues Mar 2, 2024
@notgull
Copy link
Member Author

notgull commented Mar 2, 2024

This has a worrying effect on benchmarks. Specifically it significantly impacts the time needed to spawn futures.

cargo bench
   Compiling async-executor v1.9.0 (/home/jtnunley/Projects/smol-rs/async-executor)
    Finished `bench` profile [optimized] target(s) in 6.64s
     Running unittests src/lib.rs (/hard_data/cargo_target/release/deps/async_executor-0bbdb4d134e2792f)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/executor.rs (/hard_data/cargo_target/release/deps/executor-d0a4570616a96d1b)
Benchmarking executor::create: Collecting 100 samples in estimated 5.0029 s (4.0
executor::create        time:   [1.2513 µs 1.2524 µs 1.2535 µs]
                        change: [+4.8189% +4.9816% +5.1490%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking single_thread/executor::spawn_one: Collecting 100 samples in estima
single_thread/executor::spawn_one
                        time:   [5.0070 µs 5.2179 µs 5.3522 µs]
                        change: [+4.9170% +8.2084% +11.374%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low severe
  1 (1.00%) high mild
Benchmarking single_thread/executor::spawn_many_local: Collecting 100 samples in
single_thread/executor::spawn_many_local
                        time:   [11.191 ms 11.445 ms 11.642 ms]
                        change: [-2.8703% +0.0866% +3.1215%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
Benchmarking single_thread/executor::spawn_recursively: Collecting 100 samples i
single_thread/executor::spawn_recursively
                        time:   [32.584 ms 32.709 ms 32.853 ms]
                        change: [-1.1096% -0.5031% +0.1012%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
Benchmarking single_thread/executor::yield_now: Collecting 100 samples in estima
single_thread/executor::yield_now
                        time:   [3.4152 ms 3.4236 ms 3.4317 ms]
                        change: [-4.1054% -3.7504% -3.3735%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking multi_thread/executor::spawn_one: Collecting 100 samples in estimat
multi_thread/executor::spawn_one
                        time:   [5.5760 µs 5.8597 µs 6.1143 µs]
                        change: [+53.945% +64.434% +75.289%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking multi_thread/executor::spawn_many_local: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, or reduce sample count to 70.
Benchmarking multi_thread/executor::spawn_many_local: Collecting 100 samples in 
multi_thread/executor::spawn_many_local
                        time:   [59.348 ms 59.555 ms 59.820 ms]
                        change: [+183.08% +198.35% +214.78%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.8s, or reduce sample count to 40.
Benchmarking multi_thread/executor::spawn_recursively: Collecting 100 samples in
multi_thread/executor::spawn_recursively
                        time:   [118.47 ms 119.09 ms 119.69 ms]
                        change: [-4.2800% -3.6190% -2.9537%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
Benchmarking multi_thread/executor::yield_now: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
Benchmarking multi_thread/executor::yield_now: Collecting 100 samples in estimat
multi_thread/executor::yield_now
                        time:   [1.5802 ms 1.5864 ms 1.5927 ms]
                        change: [-45.618% -45.135% -44.675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Still, removing bugs takes priority over speed here.

src/lib.rs Show resolved Hide resolved
@james7132
Copy link
Contributor

james7132 commented Mar 2, 2024

A cursory test on Bevy's end seems to confirm that this fixes the issue; however, unlike #102, this fix causes upwards of a 50+% regression in frame time, with our scheduler tasks taking up to 10x longer on average. It does seem like all of the threads are actively running tasks now, which tackles #100, but the contention on them seems to negatively impact all of our running tasks. From the instrumentation on the scheduler tasks on a common Bevy benchmark:

image

src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated
return false;
}
// Try for both again.
if let Some(runnable) = local_ticker(true).or_else(|| self.queue.pop().ok()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we try to pull from self.queue.pop twice simultaneously here usually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, local_ticker only reads from local queues.

@notgull notgull requested a review from fogti March 7, 2024 04:17
@notgull
Copy link
Member Author

notgull commented Mar 7, 2024

A cursory test on Bevy's end seems to confirm that this fixes the issue; however, unlike #102, this fix causes upwards of a 50+% regression in frame time, with our scheduler tasks taking up to 10x longer on average. It does seem like all of the threads are actively running tasks now, which tackles #100, but the contention on them seems to negatively impact all of our running tasks. From the instrumentation on the scheduler tasks on a common Bevy benchmark:

This is fixable but it'll have to be fixed in a later release. Since bevy mostly uses tick, we'll have to deal with ticks in a better way.

@james7132
Copy link
Contributor

james7132 commented Mar 7, 2024

This is fixable but it'll have to be fixed in a later release. Since bevy mostly uses tick, we'll have to deal with ticks in a better way.

We're using run as normal on all threads, but occasionally nest and interweave other executors ticking to enable sending tasks to specific threads due to how we're handling !Send values. We're trying to get rid of this usage, but that may take some time to do.

src/lib.rs Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@fogti
Copy link
Member

fogti commented Mar 7, 2024

Why does this change interact with the way tasks are spawned? (in benchmarks)

@notgull
Copy link
Member Author

notgull commented Mar 9, 2024

Why does this change interact with the way tasks are spawned? (in benchmarks)

Because spawning a task immediately schedules it.

@notgull notgull requested a review from fogti March 12, 2024 01:04
notgull added 2 commits March 12, 2024 20:34
This should catch the errors from earlier.

Signed-off-by: John Nunley <[email protected]>
It turns out that with the current strategy it is possible for tasks to
be stuck in the local queue without any hope of being picked back up.
In practice this seems to happen when the only entities polling the
system are tickers, as opposed to runners. Since tickets don't steal
tasks, it is possible for tasks to be left over in the local queue that
don't filter out.

One possible solution is to make it so tickers steal tasks, but this
kind of defeats the point of tickers. So I've instead elected to replace
the current strategy with one that accounts for the corner cases with
local queues.

The main difference is that I replace the Sleepers struct with two
event_listener::Event's. One that handles tickers subscribed to the
global queue and one that handles tickers subscribed to the local queue.
The other main difference is that each local queue now has a reference
counter. If this count reaches zero, no tasks will be pushed to this
queue. Only runners increment or decrement this counter.

This makes the previously instituted tests pass, so hopefully this works
for most use cases.

Signed-off-by: John Nunley <[email protected]>
@notgull notgull merged commit 22a9e8b into master Mar 13, 2024
8 checks passed
@notgull notgull deleted the notgull/tests branch March 13, 2024 03:38
@notgull notgull mentioned this pull request Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants