Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to batch spawn tasks #92

Merged
merged 1 commit into from
Mar 30, 2024
Merged

Add a way to batch spawn tasks #92

merged 1 commit into from
Mar 30, 2024

Conversation

notgull
Copy link
Member

@notgull notgull commented Feb 13, 2024

For some workloads many tasks are spawned at a time. This requires
locking and unlocking the executor's inner lock every time you spawn a
task. If you spawn many tasks this can be expensive.

This commit exposes a new "spawn_many" method on both types. This
method allows the user to spawn an entire set of tasks at a time.

Closes #91

@notgull notgull requested a review from fogti February 13, 2024 15:32
src/lib.rs Outdated Show resolved Hide resolved
/// [`Executor::spawn_many`]: Executor::spawn_many
pub fn spawn_many<T: Send + 'a, F: Future<Output = T> + Send + 'a>(
&self,
futures: impl IntoIterator<Item = F>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces the futures to be the same type and also requires something to collect the tasks. If we're ok with users potentially injecting extra non-blocking code via iterator implementations (with the doc comments as a deterrent), wouldn't a SpawnScope<'a> be a more flexible API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much harder to accidentally create a blocking Iterator than it is to accidentally hold a lock type across an await point. I've seen the former a couple of times and the latter all the time.

src/lib.rs Show resolved Hide resolved
futures: impl IntoIterator<Item = F>,
handles: &mut impl Extend<Task<F::Output>>,
) {
let mut active = self.inner().state().active.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a local executor, why does it need a lock at all? Probably not necessary to address in this PR, but I thought the local executor itself could be !Send and !Sync, so wouldn't a RefCell be sufficient for a LocalExecutor state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require a reimplementation of LocalExecutor's internals, which is something I don't feel like doing right now as the current implementation works well enough. I would accept a PR for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, as a change this would propagate through the entire code base because it would require a different State struct to actually eliminate all the mutexes, etc, which then probably needs either a duplication of all the other internal structs which hold references to State, or an abstraction over the two variants of State.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably doable with macro_rules, but that might have a serious impact on readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a proper abstraction would require a lot of interior mutability, we just can't directly abstract all that without pulling in niche dependencies, I suppose..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally thread unsafe primitives are provided by unsend instead of smol. In this case see unsend::Executor

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
task
for future in futures {
// SAFETY: F and T are both `Send`.
handles.extend(Some(unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::iter::once may be preferable to Option, though I assume the branch it's iterator implementation gets optimized out at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized that we can just convert futures into an Iterator of Tasks, and then use handles.extend with that iterator. That should probably enable more optimizations.

@notgull notgull force-pushed the notgull/batch-spawn branch from b139978 to 72febd5 Compare February 17, 2024 19:49
@notgull
Copy link
Member Author

notgull commented Feb 22, 2024

@smol-rs/admins I would appreciate a review on this, as it's blocking performance work in Bevy.

@taiki-e
Copy link
Collaborator

taiki-e commented Feb 22, 2024

A common bug with *_many functions in Rust is that memory safety is violated when the iterator panics. So could you please add a Miri-compatible test for such a case? (I do not believe that this implementation violates memory safety on that case, but at least the implementation does not seem to care about cases where the iterator panics.)
Otherwise it looks ok to me.

@notgull notgull force-pushed the notgull/batch-spawn branch from 4f03951 to bbefd6a Compare February 22, 2024 04:44
@notgull
Copy link
Member Author

notgull commented Feb 22, 2024

A common bug with *_many functions in Rust is that memory safety is violated when the iterator panics. So could you please add a Miri-compatible test for such a case? (I do not believe that this implementation violates memory safety on that case, but at least the implementation does not seem to care about cases where the iterator panics.) Otherwise it looks ok to me.

Added! I've also added a workaround for the mutex poisoning that shows up here.

@taiki-e
Copy link
Collaborator

taiki-e commented Feb 22, 2024

It seems miri reported memory leaks.

error: memory leaked: alloc44788 (Rust heap, size: 120, align: 8), allocated here:

https://github.com/smol-rs/async-executor/actions/runs/7999577501/job/21847610801?pr=92

@notgull notgull force-pushed the notgull/batch-spawn branch from bbefd6a to dff5a59 Compare March 30, 2024 15:10
@notgull
Copy link
Member Author

notgull commented Mar 30, 2024

Hmm, looks like the issue was related to the thread-optimized backend? It looks like it doesn't leak memory now.

For some workloads many tasks are spawned at a time. This requires
locking and unlocking the executor's inner lock every time you spawn a
task. If you spawn many tasks this can be expensive.

This commit exposes a new "spawn_batch" method on both types. This
method allows the user to spawn an entire set of tasks at a time.

Closes #91

Signed-off-by: John Nunley <[email protected]>
@notgull notgull force-pushed the notgull/batch-spawn branch from dff5a59 to b0c6f7a Compare March 30, 2024 15:14
@notgull notgull merged commit d319699 into master Mar 30, 2024
8 checks passed
@notgull notgull deleted the notgull/batch-spawn branch March 30, 2024 15:18
@notgull notgull mentioned this pull request Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Efficient way of batch spawning a large number of tasks
4 participants