Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a schematic state machine implementing Future #2048

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,8 @@
- [Async Basics](concurrency/async.md)
- [`async`/`await`](concurrency/async/async-await.md)
- [Futures](concurrency/async/futures.md)
- [State Machine](concurrency/async/state-machine.md)
- [Recursion](concurrency/async/state-machine/recursion.md)
- [Runtimes](concurrency/async/runtimes.md)
- [Tokio](concurrency/async/runtimes/tokio.md)
- [Tasks](concurrency/async/tasks.md)
Expand Down
11 changes: 4 additions & 7 deletions src/concurrency/async-pitfalls/pin.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,10 @@ minutes: 20

# `Pin`

Async blocks and functions return types implementing the `Future` trait. The
type returned is the result of a compiler transformation which turns local
variables into data stored inside the future.

Some of those variables can hold pointers to other local variables. Because of
that, the future should never be moved to a different memory location, as it
would invalidate those pointers.
Recall an async function or block creates a type implementing `Future` and
containing all of the local variables. Some of those variables can hold
references (pointers) to other local variables. To ensure those remain valid,
the future can never be moved to a different memory location.

To prevent moving the future type in memory, it can only be polled through a
pinned pointer. `Pin` is a wrapper around a reference that disallows all
Expand Down
112 changes: 112 additions & 0 deletions src/concurrency/async/state-machine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
minutes: 7
---

# State Machine

Rust transforms an async function or block to a hidden type that implements
`Future`, using a state machine to track the function's progress. The details of
this transform are complex, but it helps to have a schematic understanding of
what is happening.

```rust,editable,compile_fail
use futures::executor::block_on;
use pin_project::pin_project;
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};

async fn send(s: String) -> usize {
println!("{}", s);
s.len()
}

/*
async fn example(x: i32) -> usize {
let double_x = x*2;
let mut count = send(format!("x = {x}")).await;
count += send(format!("double_x = {double_x}")).await;
count
}
*/

fn example(x: i32) -> ExampleFuture {
ExampleFuture::Init { x }
}

#[pin_project(project=ExampleFutureProjected)]
enum ExampleFuture {
Init {
x: i32,
},
FirstSend {
double_x: i32,
#[pin]
fut: Pin<Box<dyn Future<Output = usize>>>,
},
SecondSend {
count: usize,
#[pin]
fut: Pin<Box<dyn Future<Output = usize>>>,
},
}

impl std::future::Future for ExampleFuture {
type Output = usize;
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
loop {
match self.as_mut().project() {
ExampleFutureProjected::Init { x } => {
let double_x = *x * 2;
let fut = Box::pin(send(format!("x = {x}")));
*self = ExampleFuture::FirstSend { double_x, fut };
}
ExampleFutureProjected::FirstSend { double_x, mut fut } => {
match fut.as_mut().poll(cx) {
Poll::Pending => return Poll::Pending,
Poll::Ready(count) => {
let fut =
Box::pin(send(format!("double_x = {double_x}")));
*self = ExampleFuture::SecondSend { count, fut };
}
}
}
ExampleFutureProjected::SecondSend { count, mut fut } => {
match fut.as_mut().poll(cx) {
Poll::Pending => return Poll::Pending,
Poll::Ready(tmp) => {
*count += tmp;
return Poll::Ready(*count);
}
}
}
}
}
}
}

fn main() {
println!("result: {}", block_on(example(5)));
}
```

<details>

While this code will run, it is simplified from what the real state machine
would do. The important things to notice here are:

- Calling an async function does nothing but construct a value, ready to start
on the first call to `poll`.
- All local variables are stored in the function's future struct, including an
enum to identify where execution is currently suspended. The real generated
state machine would not initialize `i` to 0.
Comment on lines +100 to +102
Copy link
Collaborator

@fw-immunant fw-immunant May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense for explanation to first trace the control flow so we can enumerate states and get in our heads something like "we'll have three possible execution states for this async fn, so it makes sense that its future type would be (morally) an enum with three variants". Then we can look at variable liveness at each of the awaits to determine the payload of each variant. This lets us get to "futures store live locals" without having to introduce the notion of liveness explicitly like a compilers course would--these are just the variables whose values we'll need to run the rest of the function.

I think it's better for an example of the state machine transform to use an async fn that doesn't have async in a loop, so that it's easy for readers to enumerate the entire set of states in their head.

So I might suggest an example more like this:

async fn send(s: String) -> usize {
    println!("{}", s);
    s.len()
}

async fn example(x: i32) -> usize {
    let double_x = x*2;
    let mut bytes_written = send(format!("x = {x}")).await;
    bytes_written += send(format!("double_x = {double_x}")).await;
    bytes_written
}

This gives us three states:

  • an initial state holding the argument x
  • a state holding double_x until we return from our first await
  • a state holding bytes_written until we return from our second await

This is, I think, less confusing than a state we return to across iterations of our loop on 1..=count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit in this suggestion is to use enum variants to represent the liveness of variables. That's great, but does require a bit more mucking about with pins than we want to present at this point. In fact, it uses pin_project, which we don't even talk about in the Pin section. Should I maybe revert this to a flat struct with a fut field, just to avoid this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's so important to make this example have perfect fidelity or be compilable/runnable--pseudocode to capture the transformation would be fine, and would allow us to side-step Pin questions. The sub-future, if we really want to represent it, could just be a std::future::Ready or isomorphic. The most significant thing is not the state of the child future but the fact that at await points we capture live state.

- An `.await` in the async function is translated into a call to that async
function, then polling the future it returns until it is `Poll::Ready`. The
real generated state machine would contain the future type defined by `send`,
but that cannot be expressed in Rust syntax.
- Execution continues eagerly until there's some reason to block. Try returning
`Poll::Pending` in the `ExampleState::Init` branch of the match, in hopes that
`poll` will be called again with state `ExampleState::Sending`. `block_on`
will not do so!

</details>
43 changes: 43 additions & 0 deletions src/concurrency/async/state-machine/recursion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
minutes: 3
---

# Recursion

An async function's future type _contains_ the futures for all functions it
calls. This means a recursive async functions are not allowed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would clarify this by invoking the analogy with recursive enums: they need an indirection to avoid being infinite-sized types. Prior to Rust 1.77, recursion in async fn was forbidden entirely and code had to forgo the async fn transform in favor of explicitly returning a Box<impl Future<Output=...>> type, but now only "bare" recursion (without an indirection) is forbidden.


```rust,editable,compile_fail
use futures::executor::block_on;

async fn count_to(n: u32) {
if n > 0 {
count_to(n - 1).await;
println!("{n}");
}
}

fn main() {
block_on(count_to(5));
}
```

<details>

This is a quick illustration of how understanding the state machine helps to
understand errors. Recursion would require `CountToFuture` to contain a field of
type `CountToFuture`, which is impossible. Compare to the common Rust error of
building an `enum` that contains itself, such as

```rust
enum BinTree<T> {
Node { value: T, left: BinTree<T>, right: BinTree<T> },
Nil,
}
```

Fix this with `Box::pin(count_to(n-1)).await;`, boxing the future returned from
`count_to`. This only became possible recently (Rust 1.77.0), before which all
recursion was prohibited.

</details>
Loading