Fix hangs due to tasks being stuck inside of local queues #104

notgull · 2024-02-24T23:24:17Z

This should catch the errors from earlier.

notgull · 2024-02-24T23:31:17Z

I've confirmed that these tests pass on master with 7592d41 dropped

james7132

Bevy was failing because we were ticking on another thread that wasn't spawning new tasks as a way to send and execute futures on a single target thread. All of these seem to either block on or locally tick the executor in one way or another.

james7132 · 2024-02-25T07:08:07Z

tests/larger_tasks.rs

+use std::time::Duration;
+
+fn do_run<Fut: Future<Output = ()>>(mut f: impl FnMut(Arc<Executor<'static>>) -> Fut) {
+    // This should not run for longer than two minutes.


Might want to split into multiple to figure out which specific ones are deadlocking, and which are working.

notgull · 2024-03-02T05:17:59Z

It turns out that with the current strategy it is possible for tasks to
be stuck in the local queue without any hope of being picked back up.
In practice this seems to happen when the only entities polling the
system are tickers, as opposed to runners. Since tickets don't steal
tasks, it is possible for tasks to be left over in the local queue that
don't filter out.

One possible solution is to make it so tickers steal tasks, but this
kind of defeats the point of tickers. So I've instead elected to replace
the current strategy with one that accounts for the corner cases with
local queues.

The main difference is that I replace the Sleepers struct with two
event_listener::Event's. One that handles tickers subscribed to the
global queue and one that handles tickers subscribed to the local queue.
The other main difference is that each local queue now has a reference
counter. If this count reaches zero, no tasks will be pushed to this
queue. Only runners increment or decrement this counter.

This makes the previously instituted tests pass, so hopefully this works
for most use cases.

notgull · 2024-03-02T05:27:25Z

This has a worrying effect on benchmarks. Specifically it significantly impacts the time needed to spawn futures.

cargo bench
   Compiling async-executor v1.9.0 (/home/jtnunley/Projects/smol-rs/async-executor)
    Finished `bench` profile [optimized] target(s) in 6.64s
     Running unittests src/lib.rs (/hard_data/cargo_target/release/deps/async_executor-0bbdb4d134e2792f)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/executor.rs (/hard_data/cargo_target/release/deps/executor-d0a4570616a96d1b)
Benchmarking executor::create: Collecting 100 samples in estimated 5.0029 s (4.0
executor::create        time:   [1.2513 µs 1.2524 µs 1.2535 µs]
                        change: [+4.8189% +4.9816% +5.1490%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking single_thread/executor::spawn_one: Collecting 100 samples in estima
single_thread/executor::spawn_one
                        time:   [5.0070 µs 5.2179 µs 5.3522 µs]
                        change: [+4.9170% +8.2084% +11.374%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low severe
  1 (1.00%) high mild
Benchmarking single_thread/executor::spawn_many_local: Collecting 100 samples in
single_thread/executor::spawn_many_local
                        time:   [11.191 ms 11.445 ms 11.642 ms]
                        change: [-2.8703% +0.0866% +3.1215%] (p = 0.93 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
Benchmarking single_thread/executor::spawn_recursively: Collecting 100 samples i
single_thread/executor::spawn_recursively
                        time:   [32.584 ms 32.709 ms 32.853 ms]
                        change: [-1.1096% -0.5031% +0.1012%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
Benchmarking single_thread/executor::yield_now: Collecting 100 samples in estima
single_thread/executor::yield_now
                        time:   [3.4152 ms 3.4236 ms 3.4317 ms]
                        change: [-4.1054% -3.7504% -3.3735%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild
  1 (1.00%) high severe

Benchmarking multi_thread/executor::spawn_one: Collecting 100 samples in estimat
multi_thread/executor::spawn_one
                        time:   [5.5760 µs 5.8597 µs 6.1143 µs]
                        change: [+53.945% +64.434% +75.289%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking multi_thread/executor::spawn_many_local: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, or reduce sample count to 70.
Benchmarking multi_thread/executor::spawn_many_local: Collecting 100 samples in 
multi_thread/executor::spawn_many_local
                        time:   [59.348 ms 59.555 ms 59.820 ms]
                        change: [+183.08% +198.35% +214.78%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 11.8s, or reduce sample count to 40.
Benchmarking multi_thread/executor::spawn_recursively: Collecting 100 samples in
multi_thread/executor::spawn_recursively
                        time:   [118.47 ms 119.09 ms 119.69 ms]
                        change: [-4.2800% -3.6190% -2.9537%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
Benchmarking multi_thread/executor::yield_now: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
Benchmarking multi_thread/executor::yield_now: Collecting 100 samples in estimat
multi_thread/executor::yield_now
                        time:   [1.5802 ms 1.5864 ms 1.5927 ms]
                        change: [-45.618% -45.135% -44.675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Still, removing bugs takes priority over speed here.

src/lib.rs

james7132 · 2024-03-02T09:40:12Z

A cursory test on Bevy's end seems to confirm that this fixes the issue; however, unlike #102, this fix causes upwards of a 50+% regression in frame time, with our scheduler tasks taking up to 10x longer on average. It does seem like all of the threads are actively running tasks now, which tackles #100, but the contention on them seems to negatively impact all of our running tasks. From the instrumentation on the scheduler tasks on a common Bevy benchmark:

src/lib.rs

fogti · 2024-03-02T11:44:44Z

src/lib.rs

-                    return false;
-                }
+            // Try for both again.
+            if let Some(runnable) = local_ticker(true).or_else(|| self.queue.pop().ok()) {


Do we try to pull from self.queue.pop twice simultaneously here usually?

No, local_ticker only reads from local queues.

notgull · 2024-03-07T04:19:34Z

A cursory test on Bevy's end seems to confirm that this fixes the issue; however, unlike #102, this fix causes upwards of a 50+% regression in frame time, with our scheduler tasks taking up to 10x longer on average. It does seem like all of the threads are actively running tasks now, which tackles #100, but the contention on them seems to negatively impact all of our running tasks. From the instrumentation on the scheduler tasks on a common Bevy benchmark:

This is fixable but it'll have to be fixed in a later release. Since bevy mostly uses tick, we'll have to deal with ticks in a better way.

james7132 · 2024-03-07T04:43:39Z

This is fixable but it'll have to be fixed in a later release. Since bevy mostly uses tick, we'll have to deal with ticks in a better way.

We're using run as normal on all threads, but occasionally nest and interweave other executors ticking to enable sending tasks to specific threads due to how we're handling !Send values. We're trying to get rid of this usage, but that may take some time to do.

src/lib.rs

fogti · 2024-03-07T11:17:25Z

Why does this change interact with the way tasks are spawned? (in benchmarks)

notgull · 2024-03-09T22:38:47Z

Why does this change interact with the way tasks are spawned? (in benchmarks)

Because spawning a task immediately schedules it.

This should catch the errors from earlier. Signed-off-by: John Nunley <[email protected]>

It turns out that with the current strategy it is possible for tasks to be stuck in the local queue without any hope of being picked back up. In practice this seems to happen when the only entities polling the system are tickers, as opposed to runners. Since tickets don't steal tasks, it is possible for tasks to be left over in the local queue that don't filter out. One possible solution is to make it so tickers steal tasks, but this kind of defeats the point of tickers. So I've instead elected to replace the current strategy with one that accounts for the corner cases with local queues. The main difference is that I replace the Sleepers struct with two event_listener::Event's. One that handles tickers subscribed to the global queue and one that handles tickers subscribed to the local queue. The other main difference is that each local queue now has a reference counter. If this count reaches zero, no tasks will be pushed to this queue. Only runners increment or decrement this counter. This makes the previously instituted tests pass, so hopefully this works for most use cases. Signed-off-by: John Nunley <[email protected]>

notgull mentioned this pull request Feb 24, 2024

Fix deadlock when only calling Executor::tick to drive it #102

Closed

james7132 reviewed Feb 25, 2024

View reviewed changes

notgull marked this pull request as ready for review March 2, 2024 05:18

notgull requested a review from fogti March 2, 2024 05:19

notgull changed the title ~~Add tests with more complicated futures~~ Fix hangs due to tasks being stuck inside of local queues Mar 2, 2024

fogti reviewed Mar 2, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

fogti reviewed Mar 2, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

fogti reviewed Mar 2, 2024

View reviewed changes

notgull requested a review from fogti March 7, 2024 04:17

fogti reviewed Mar 7, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

fogti reviewed Mar 7, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

fogti reviewed Mar 7, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

notgull requested a review from fogti March 12, 2024 01:04

fogti approved these changes Mar 12, 2024

View reviewed changes

notgull added 2 commits March 12, 2024 20:34

tests: Add tests with more complicated futures

e191313

This should catch the errors from earlier. Signed-off-by: John Nunley <[email protected]>

notgull force-pushed the notgull/tests branch from 2abd20f to a6b832c Compare March 13, 2024 03:34

notgull merged commit 22a9e8b into master Mar 13, 2024
8 checks passed

notgull deleted the notgull/tests branch March 13, 2024 03:38

notgull mentioned this pull request Mar 13, 2024

v1.9.1 #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hangs due to tasks being stuck inside of local queues #104

Fix hangs due to tasks being stuck inside of local queues #104

notgull commented Feb 24, 2024

notgull commented Feb 24, 2024

james7132 left a comment •

edited

Loading

james7132 Feb 25, 2024

notgull commented Mar 2, 2024

notgull commented Mar 2, 2024

james7132 commented Mar 2, 2024 •

edited

Loading

fogti Mar 2, 2024

notgull Mar 7, 2024

notgull commented Mar 7, 2024

james7132 commented Mar 7, 2024 •

edited

Loading

fogti commented Mar 7, 2024 •

edited

Loading

notgull commented Mar 9, 2024

Fix hangs due to tasks being stuck inside of local queues #104

Fix hangs due to tasks being stuck inside of local queues #104

Conversation

notgull commented Feb 24, 2024

notgull commented Feb 24, 2024

james7132 left a comment • edited Loading

Choose a reason for hiding this comment

james7132 Feb 25, 2024

Choose a reason for hiding this comment

notgull commented Mar 2, 2024

notgull commented Mar 2, 2024

james7132 commented Mar 2, 2024 • edited Loading

fogti Mar 2, 2024

Choose a reason for hiding this comment

notgull Mar 7, 2024

Choose a reason for hiding this comment

notgull commented Mar 7, 2024

james7132 commented Mar 7, 2024 • edited Loading

fogti commented Mar 7, 2024 • edited Loading

notgull commented Mar 9, 2024

james7132 left a comment •

edited

Loading

james7132 commented Mar 2, 2024 •

edited

Loading

james7132 commented Mar 7, 2024 •

edited

Loading

fogti commented Mar 7, 2024 •

edited

Loading