How does Rust async - behind the magic

Preface

Sometimes when we are programming, we tend to perform some actions that require "waiting", such as waiting for an HTTP response.

But in reality, we don't want to stop the entire program from running. We want our program to do something else while it is waiting on this one task to complete. As a result, Rust provides a flexible and extensive framework for async. Hence, in our previous for loop, we can just add an async and await pattern receive_request().await;

async fn

There are a lot of resources on how to use async Rust, like the one in Tokio-rs. As a result, in this blog post, I would like to talk about how async Rust works behind the scenes.

What is a Future

In Rust, an async function will return a Future. A Future in Rust is essentially a trait like so(taken from Rust core):

pub trait Future {
    // the type that this future will produce on completion
    type Output;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

As a result, any struct that implements this trait is able to be used with the ".await" syntax

What is this??

The poll function seems a bit intimidating, so let's break down everything in it.

The poll function requires that Self must be a mutable reference wrapped in a Pin, which, according to the Rust documentation, is: "We say that a value has been pinned when it has been put into a state where it is guaranteed to remain located at the same place in memory from the time it is pinned until its drop is called." Here, the drop function refers to the destructor of the struct, which can happen when the program goes out of scope or when drop is called on the struct. You can find more information about this concept in RAII (Resource Acquisition Is Initialization) here. The poll function also takes in a Context, which we will cover later.

The poll function will return the specified Output in a Poll enum, which contains two variants:

Poll::Ready(T) - indicates that the future has completed and produced a value of type T.

Poll::Pending - indicates that the future is not yet complete and needs to be polled again later.

Now we talk about Context and Waker

The poll function also takes in a Context, which contains a field of type Waker. A Waker has two functions wake(self) and wake_by_ref(&self). As a result, the poll function can pass this waker to other places that may complete the future, and then the future is completed, it can call waker.wake() to notify the executor. The implementation of a waker is up to the executor, so this allows significant volatility for the executor.

The executor

The simplest executor is a single-threaded event loop that polls futures in a loop. It will generally contain a queue of tasks, which contain futures to poll. After it calls poll, it will check the result and continue polling until all futures are complete. Take an example in a bare-metal environment where we depend on interrupts to read data from a sensor. We then create a Future that will be completed when the interrupt is triggered. The executor will poll this future, and when the interrupt is triggered, it will call waker.wake() to notify the executor that the future is complete, which is normally done by putting the task back to the executor's task queue, and then the executor can continue to poll other futures.

Of course, there are more complex executors that can run on multiple threads and use work-stealing to balance the load across threads. However, the core concept of polling futures and using wakers to notify the executor remains the same. There are several design choices that can be made. For example, some executors may prefer keeping a task in the same thread for the sake of less context switching and better cpu cache performance.

What are async functions?

Now that we can create structs that have impl Future, we would like to use them in actual async functions. However, a function packs more data than a simple future, such as the function's stack frame. As a result, async fn is essentially a syntax sugar for a function that returns an impl Future. The compiler will transform the function body into a state machine, where each state corresponds to a point in the function where it can yield control (such as an .await point). The state machine will implement the Future trait, and the poll function will execute the state machine until it reaches a yield point or completes.

Conclusion

With this system, one can basically construct any types of async operations with executors tailored to their needs. It is also worth noting that the entire system of wakers and futures does not require a heap allocator, and each individual thread does not require an actual thread that has its own stack. This allows a near-zero cost async system that can be used in all environments, from embedded systems to high-performance servers. I am still learning about async Rust while developing an async runtime for my custom operating system kernel, so if you have any suggestions or corrections, please let me know!