-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only a single cpu core is utilized when join
-ing on tokio
runtime
#185
Comments
|
Thanks for reporting this @manifest. I can reproduce this locally, and we should look into the root cause of this further. You mentioned Tokio only uses a single core with this; how did you measure that? I've been able to see the slowdown, but I didn't yet look at core utilization. I'd like to test the scaling with |
I've just checked the CPU usage in htop. The process with the latest test case using Tokio runtime utilized up to 100% (unitilization of a single cpu core). My comment isn't entirely accurate because different CPU cores were involved, but it seemed as sequential execution. |
There is another interesting case resulting in a stalled execution. At least on my machine, I can constantly reproduce it when #[derive(Debug, Clone)]
struct Workload {
task_duration: Duration,
n_tasks: usize,
start: Instant,
}
#[tokio::main]
async fn main() {
let workload = Workload {
task_duration: Duration::from_millis(100),
n_tasks: 10,
start: Instant::now(),
};
futures_concurrency_co_stream(workload.clone()).await;
let total_duration = (Instant::now() - workload.start).as_secs_f64();
let rate = workload.n_tasks as f64 / total_duration;
println!("Total duration: {:.4} s", total_duration);
println!("Rate: {:.4} tasks/s", rate);
}
async fn futures_concurrency_co_stream(workload: Workload) {
use futures_concurrency::prelude::*;
(0..workload.n_tasks)
.collect::<Vec<_>>()
.into_co_stream()
.for_each(|_| async {
sleep(workload.task_duration.clone()).await;
})
.await;
} |
@manifest The last test case you posted almost certainly will hit #182 -
I see, thank you for clarifying! |
I've created two more test cases using async fn future_util_join_all(workload: Workload) {
let tasks = (0..workload.n_tasks)
.map(|_| async move {
sleep(workload.task_duration.clone()).await;
})
.collect::<Vec<_>>();
join_all(tasks).await;
}
async fn futures_unordered_join(workload: Workload) {
let tasks = (0..workload.n_tasks)
.map(|_| async move {
sleep(workload.task_duration.clone()).await;
})
.collect::<Vec<_>>();
let mut group = FuturesUnordered::from_iter(tasks.into_iter());
while let Some(_) = group.next().await {}
} The results for tokio naive
Total duration: 0.1288 s
Rate: 77654.0458 reqs/s
futures_concurrency_join
Total duration: 0.6355 s
Rate: 15736.2799 reqs/s
futures_join_all
Total duration: 0.7635 s
Rate: 13097.1088 reqs/s
futures_unordered
Total duration: 0.8850 s
Rate: 11299.5670 reqs/s Comparatively the
|
Interestingly, the excessive awaiting points seems to be hurting performance. For me that is counter-intuitive, because I was thinking about them as zero-cost abstractions. There is almost 30% degradation in performance between the following test cases. I use
async fn futures_unordered_join(workload: Workload) {
use futures::stream::{FuturesUnordered, StreamExt};
let tasks = (0..workload.n_tasks)
.map(|_| {
sleep(workload.task_duration.clone())
})
.collect::<Vec<_>>();
let mut group = FuturesUnordered::from_iter(tasks.into_iter());
while let Some(_) = group.next().await {}
}
async fn futures_unordered_join_excessive_await_points(workload: Workload) {
use futures::stream::{FuturesUnordered, StreamExt};
let tasks = (0..workload.n_tasks)
.map(|_| async {
sleep(workload.task_duration.clone()).await
})
.collect::<Vec<_>>();
let mut group = FuturesUnordered::from_iter(tasks.into_iter());
while let Some(_) = group.next().await {}
} I've compared naive vs futures_unordered_join on
async fn naive(workload: Workload) {
let tasks = (0..workload.n_tasks)
.map(|_| spawn({
sleep(workload.task_duration)
}))
.collect::<Vec<_>>();
for task in tasks {
task.await.ok();
}
} Here is stats for naive
Total duration: 0.1079 s
Rate: 92649.3189 tasks/s
futures_unordered_join
Total duration: 0.1049 s
Rate: 95305.6797 tasks/s
futures_unordered_join_excessive_await_points
Total duration: 0.1048 s
Rate: 95426.9807 tasks/s
futures_concurrency_join
Total duration: 0.1198 s
Rate: 83464.3257 tasks/s |
My previous analysis wasn't right; I spent some time on a plane debugging this today, and I realized a number of things:
I think what needs to happen is change the internal |
Only a single CPU core is utilized when
join
-ing futures ontokio
runtime. The issue does not arise when usingasync_std
.some stats:
tokio test cases:
async_std test cases:
The text was updated successfully, but these errors were encountered: