609 - Futurelock / RFD / Oxide

RFD

609

Authors

Updated

This RFD describes futurelock: a type of deadlock where a resource owned by Future A is required for another Future B to proceed, while the Task responsible for both Futures is no longer polling A. Futurelock is a particularly subtle risk in writing asynchronous Rust.

Oxide initially saw this problem in oxidecomputer/omicron#9259.

Example of the problem

Consider the following program (in the playground):

use std::sync::Arc;
use std::time::Duration;
use tokio::sync::Mutex;
use tokio::time::sleep;
use futures::FutureExt;

#[tokio::main]
async fn main() {
    // Create a lock that will be shared by multiple tasks.
    let lock = Arc::new(Mutex::new(()));

    // Start a background task that takes the lock and holds it for a few
    // seconds.  This is just to simulate some contention.  This function only
    // returns once the lock has been taken in the background task.
    start_background_task(lock.clone()).await;

    // The guts of the example.
    do_stuff(lock.clone()).await;
}

// Starts a background task that grabs the lock, holds it for 5 seconds,
// and then drops it.  Returns once the task is holding the lock.
// The purpose of this is to simulate contention.
async fn start_background_task(lock: Arc<Mutex<()>>) {
    // Use a channel to coordinate with the task so that it can tell us when
    // its taken the lock.
    let (tx, rx) = tokio::sync::oneshot::channel();
    let _ = tokio::spawn(async move {
        println!("background task: start");
        let _guard = lock.lock().await;
        let _ = tx.send(());
        sleep(Duration::from_secs(5)).await;
        println!("background task: done (dropping lock)")
    });
    // Wait for the task to take the lock before returning.
    let _ = rx.await;
}

// The guts of the example
async fn do_stuff(lock: Arc<Mutex<()>>) {
    let mut future1 = do_async_thing("op1", lock.clone()).boxed();

    // Try to execute `future1`.  If it takes more than 500ms, do
    // a related thing instead.
    println!("do_stuff: entering select");
    tokio::select! {
        _ = &mut future1 => {
            println!("do_stuff: arm1 future finished");
        }
        _ = sleep(Duration::from_millis(500)) => {
            do_async_thing("op2", lock.clone()).await;
        }
    };
    println!("do_stuff: all done");
}

async fn do_async_thing(label: &str, lock: Arc<Mutex<()>>) {
    println!("{label}: started");
    let _ = lock.lock().await;
    println!("{label}: acquired lock");
    println!("{label}: done");
}

This program reliably deadlocks. This surprises a lot of people! A background Task takes a lock, waits 5s, drops the lock and exits. In the meantime, we do_stuff. That stuff consists of waiting for two Futures concurrently via select!. One future waits for the lock while the other sleeps for 0.5s and waits for the lock. So there’s just one lock and all logical streams of execution seem to execute concurrently. How could this possibly hang?

The interesting bits are all in do_stuff():

async fn do_stuff(lock: Arc<Mutex<()>>) {
    let mut future1 = do_async_thing("op1", lock.clone()).boxed();

    // Try to execute `future1`.  If it takes more than 500ms, do
    // a related thing instead.
    println!("do_stuff: entering select");
    tokio::select! {
        _ = &mut future1 => { 
            println!("do_stuff: arm1 future finished");
        }
        _ = sleep(Duration::from_millis(500)) => { 
            do_async_thing("op2", lock.clone()).await; 
        }
    };
    println!("do_stuff: all done");
}

1	`future1` is the (boxed) future returned by `do_async_thing()`, an async function.
2	We’ll call the future returned by `sleep`: `future2` (or, the "sleep" future).
3	The second branch of the `select!` is its own future. We’ll call this `future3`.

It’s really important to understand what’s happening here so let’s be clear about the sequence.

First:

background task takes lock, begins holding it for 5 seconds
tokio::select! begins polling &mut future1.^[1] This future attempts to take the lock, blocks, returns Poll::Pending.
tokio::select! begins polling future2 (the sleep future) and blocks, returning Poll::Pending.

At this point:

the background task holds the lock
the main task is blocked in tokio::select! on two different futures:
- future1 is blocked on taking the lock
- future2 (the sleep future) waiting for 500ms

500ms later, tokio wakes up the main task because future2 (the sleep future) is ready. Inside tokio::select!:

The task polls both futures.
- future1 is still blocked on the lock and returns Pending.^[2]
- future2 (the sleep future) is ready and returns Ready.
tokio::select! chooses the second branch
- &mut future1 is dropped, but this is just a reference and so has no effect. Importantly, the future itself (future1) is not dropped.
- the second branch is entered. do_async_thing("op2", …) is called, creating a new future future3. This future immediately blocks trying to take the lock, which is still held by the background task.

At this point, we have:

the lock (still) held by the background task
the lock’s wait queue contains two waiting futures:
- future1
- future3 (the second arm of the tokio::select!)

There are two key points here:

The lock’s wait queue is literally a queue: only future1 can take the lock once it is released by the background task (unless future1 is dropped).
The behavior of tokio::select! is to poll all branches' futures only until one of them returns `Ready`. At that point, it drops the other branches' futures and only runs the body of the branch that’s ready.

Critically: the same task is responsible for both of the futures waiting on the lock. But that task is currently only polling on one of them. Unfortunately, it’s the wrong one.

About 4.5 seconds later:

The background task drops the lock.
- The lock is given to future1. (See below for more on why.)
- The task that polled future1 (the main task) is woken up.
However, that task is not polling future1. future1 is polled at the top-level tokio::select!. But the tokio::select! has already chosen the other branch. It’s now only polling future3. (In fact, even absent the imminent hang, future1 would never be polled again. It would be cancelled without having completed when it got dropped at the end of do_stuff.)

Thus:

There is only one task left. It’s blocked on future3.
future3 is blocked on a Mutex that’s owned by future1.
future1 cannot run (and therefore cannot drop the Mutex) until the task starts running it.

We call this specific kind of deadlock futurelock. The program is stuck in this state forever.

FAQ: why doesn’t the Mutex wake up the other future?

This particular example uses tokio::sync::Mutex, which is a fair Mutex. That means that the lock is given to waiters in the order that they started waiting. It has to give it to future1.

An unfair Mutex would not fix things. The problem wouldn’t be guaranteed to happen with an unfair Mutex, but it wouldn’t be guaranteed not to, either. The Mutex does not (and cannot) know which future would be "better" to wake up, or which one is being polled. You could imagine an unfair Mutex that always woke up all waiters and let them race to grab the lock again. That would not suffer from risk of futurelock, but it would have the thundering herd problem plus all the liveness issues associated with unfair synchronization primitives. And it’s not how many synchronization primitives work.

It’s helpful to view this in terms of responsibilities: the Mutex’s job here is to wake up the next task waiting for the lock. And it’s doing that. It’s that task’s responsibility to check on all the futures that it’s responsible for. The Mutex cannot do that.

FAQ: why isn’t the `tokio::select!` polling on `future1`? Isn’t that the whole idea of `tokio::select!`

The idea of tokio::select! is to poll on multiple futures concurrently and enter the branch for whichever one finishes first. Once one of the futures does finish (as the sleep one has in our case), control enters that specific branch. It essentially commits to that branch and it’s only running that branch at that point.

The tokio::select! docs are explicit about this:

By running all async expressions on the current task, the expressions are able to run concurrently but not in parallel. This means all expressions are run on the same thread and if one branch blocks the thread, all other expressions will be unable to continue. If parallelism is required, spawn each async expression using tokio::spawn and pass the join handle to select!.

FAQ: doesn’t `future1` get cancelled?

When one of the futures that tokio::select! is polling on completes, the others get dropped. In this case, what’s dropped is &mut future1. But future1 is not dropped, so the actual future is not cancelled.

If future1 did get cancelled, you’d get no deadlock. Try it: change the above to wait on future1 instead of &mut future1. Alternatively, you can add an explicit drop(future1); at line 51 between the sleep and the do_async_thing. This mimics what select! does if we use future1 rather than &mut future1.

Tasks vs. Futures

Note

Tasks vs. Futures

When first learning async Rust, it’s common to think of tasks and futures almost interchangeably. When you want parallelism, you spawn a new task and give it the future that you want to run. If you want to do 10 things in parallel, you spawn 10 tasks and then wait for them all to finish.

You can have concurrency without tasks (and without parallelism) using something like tokio::select!. Within a single task, you can do 10 things concurrently (not in parallel) using tokio::select! or FuturesUnordered or the like. In this case, your one task is polling on all these futures and getting woken up when any of them might be ready.

Tasks are the top-level entities that the runtime executes. Each task runs one top-level future. That future can choose to do only do one thing at a time (as in the case of sequential code using await), or it can choose to do things concurrently by polling many futures, using tokio::select! or FuturesUnordered or the like.

What causes futurelock?

The general problem here is that you have:

task T is blocked on future F1 completing (and T is directly awaiting F1)
future F1 is blocked on future F2 in some way (e.g., acquiring a shared Mutex)
future F2 is blocked on task T polling it, but T isn’t polling it because it’s only polling F1

Most commonly this happens after T started polling F2 earlier, but then switched to F1. This can happen in a bunch of different cases:

using tokio::select! with a &mut future and awaiting in one of the other branches (our example above)
polling futures from a FuturesOrdered/FuturesUnordered (e.g., by calling next()) and then awaiting on any other future (e.g., each time one of the futures completes from the set, you do some async activity)
in a hand-written Future impl that behaves analogously

Note

You can hit futurelock even if you never start polling one of the futures. Consider this example:

use futures::FutureExt;

#[tokio::main]
async fn main() {
    let (tx, rx) = tokio::sync::oneshot::channel();
    let _future1 = async { tx.send(()).unwrap(); }.boxed();
    let future2 = async { rx.await.unwrap(); };

    future2.await;
}

This deadlocks, too. And for the same reason: this task is waiting on a future that itself depends on a future that this task is responsible for running. This is possible but feels contrived. This RFD focuses on cases where the dependency between futures relates to a shared resource. That generally requires that the futures all start running so they can get in line for the resource.

How you can hit this with `tokio::select!`

Hitting this problem with tokio::select! (as in the example above) requires two things to be true:

You must be passing a &mut future to one of the branches. If you’re passing an owned future, then it will get dropped when the tokio::select! enters a different branch. That generally releases the resources that might have been blocking other futures.
You must be using await in one of the branches' handlers. If you’re not doing this, then the task does not get blocked on any particular future at the expense of the others.

That said, it’s just as problematic to have an owned future across a tokio::select! and await after it (full example):

async fn do_stuff(lock: Arc<Mutex<()>>) {
    let mut future1 = do_async_thing("op1", lock.clone()).boxed();

    // Try to execute `future1`.  If it takes more than 500ms, do
    // a related thing instead.
    println!("do_stuff: entering select");
    tokio::select! {
        _ = &mut future1 => {
            println!("do_stuff: arm1 future finished");
        }
        _ = sleep(Duration::from_millis(500)) => {}
    };

    do_async_thing("op2", lock.clone()).await;
    println!("do_stuff: all done");
}

This results in exactly the same behavior.

How you can hit this with Streams

If you pull a future from a Stream and then await a future that somehow depends on another Future in the stream, you can wind up with futurelock. Here’s what it looks like with FuturesOrdered (full example):

async fn do_stuff(lock: Arc<Mutex<()>>) {
    let future1 = sleep(Duration::from_millis(500)).boxed();
    let future2 = do_thing("op1", lock.clone()).boxed();

    let mut futs = FuturesOrdered::new();
    futs.push_back(future1);
    futs.push_back(future2);

    let _ = futs.next().await;
    println!("one future finished");
    do_thing("op2", lock.clone()).await;
}

A nearly identical example creates futurelock with FuturesUnordered.

These are often used in a loop, so it may tend to look more like this (full example):

async fn do_stuff(lock: Arc<Mutex<()>>) {
    let future1 = do_thing("op1", lock.clone()).boxed();
    let future2 = sleep(Duration::from_millis(500)).boxed();

    let mut futs = FuturesUnordered::new();
    futs.push(future2);
    futs.push(future1);

    while let Some(_) = futs.next().await {
        println!("a future finished");
        do_thing("op2", lock.clone()).await;
    }
}

It seems likely that futurelock is a risk when using many other Stream functions.

What about `join_all`?

You can’t hit this with futures::future::join_all. That’s because it polls all of its futures and does not stop polling any of the pending futures.

Failure mode, debugging

Futurelock is a type of deadlock and tends to manifest as a hang of part or all of the program. When we saw this in omicron#9259, every future attempting to access the database became part of the futurelock. Since authorization uses the database, essentially every incoming HTTP request hung indefiniteily.

Debugging this problem from direct observation can be next to impossible. Typically, you’d only start looking at data long after the problem happened. At that point, it’s not clear what evidence you’d find even if you could peer into the executor state. The problem looks like a pending future whose task has been woken up because of that future, but the task has not polled the future. (Maybe tokio-console could help?)

In omicron#9259, we were able to determine (by tracing individual function calls with DTrace) that:

all incoming requests were blocking on attempts to send on an mpsc channel with capacity 1
the receiving end of this channel was regularly checking it and finding no messages queued

This confused us for quite a while. Why are senders blocking if there’s nothing in the channel? In hindsight, the answer’s implied by the documentation for Sender, which says:

Sends a value, waiting until there is capacity.

…

This channel uses a queue to ensure that calls to send and reserve complete in the order they were requested.

One can infer that a given call to send may block either because there is no capacity or because another sender’s send() is not completing. That could be because the channel is full, but in our case it’s because the future for that sender had run into futurelock.

It’s hard to give useful advice for debugging this sort of problem aside from advising that you consider futurelock as a possibility if you’re debugging a hang and some future appears blocked when other evidence suggests that it shouldn’t be.

Determinations: avoiding this problem

Like async cancellation (see [rfd397]), futurelock defeats Rust’s goal of being able to reason locally about correctness. If we look at the pieces involved in our example:

Using tokio::select! to wait for any of a few things to happen
Using await in a tokio::select! branch
Using a &mut future with tokio::select!
Using a Mutex^[3]

None of these by itself is wrong, but combining them results in futurelock. Remember too that the Mutex could be buried beneath several layers of function calls in different modules or packages. It could require looking across many layers of the stack at once to be able to see the problem.

There’s no one abstraction, construct, or programming pattern we can point to here and say "never do this". Still, we can provide some guidelines.

In general

The most specific general advice we can give is: any time you have a single task polling multiple futures concurrently, be extremely careful that the task never stops polling a future that it previously started polling.

One way to avoid this situation is to bias towards spawning futures in new tasks instead. There are other considerations with this approach: futures would be cancelled when they’re dropped, but tasks won’t be aborted when their JoinHandle is dropped. If you want this, see AbortOnDropHandle.

When using `tokio::select!`

If you find yourself writing or reviewing code that does either of these:

Uses a &mut future as one of the async expressions in the tokio::select!
Awaits inside the handler of a tokio::select! branch or after the tokio::select! before the future has been dropped

then look for the other as well. If both are present, pay close attention to the risk of futurelock. To avoid it, you either need to avoid doing both of these things in the same tokio::select! call or else be very sure that future never blocks with shared resources held that could block other futures.

Let’s consider a variation of our original example:

    let mut future1 = do_async_thing("op1", lock.clone()).boxed();

    // Execute `future1`.  Every 500ms, do something related
    // (e.g., report progress).
    loop {
        println!("do_stuff: entering select");
        tokio::select! {
            _ = &mut future1 => {
                println!("do_stuff: arm1 future finished");
                break;
            }
            _ = sleep(Duration::from_millis(500)) => {
                do_async_thing("op2", lock.clone()).await;
            }
        };
    }
    println!("do_stuff: all done");

Here, we’ve wrapped the tokio::select! in a loop. This is a common pattern. The idea here is mainly to run future1, but every 500ms we do something related (like report progress or check if we should cancel the like).

The easiest way to make this safer is to spawn future in a new task. Then use the JoinHandle in the tokio::select!, like this version:

    let future1 = do_async_thing("op1", lock.clone());
    let mut future1_task = tokio::spawn(future1);

    // Execute `future1`.  Every 500ms, do something related
    // (e.g., report progress).
    loop {
        println!("do_stuff: entering select");
        tokio::select! {
            _ = &mut future1_task => {
                println!("do_stuff: arm1 future finished");
                break;
            }
            _ = sleep(Duration::from_millis(500)) => {
                do_async_thing("op2", lock.clone()).await;
            }
        };
    }
    println!("do_stuff: all done");

This has the same desired effect of keeping future1 running, but now future1_task is a separate future. It’s cancellable, and cancelling it won’t cancel future1. (If you want that, you can still use future1_task.abort().) This construction cannot result in futurelock.

If you’re not using a loop, this approach is even better: then you can just pass future1_task to tokio::select! (rather than &mut future1_task) and it’ll be more obvious that this is safe.

In the end, you should always be extremely careful with tokio::select!. That’s because:

If you use it with borrowed futures, beware of futurelock.
If you use it with owned futures, beware of cancel-safety (see [rfd397] and [rfd400]).

So either way you’ve got a subtle, non-locally-reasonable, undebuggable problem to worry about that the compiler can’t really help with.

When using `Stream`

When using a FuturesOrdered or FuturesUnordered, consider instead using tokio’s JoinSet. This provides a similar interface, but the futures you’re waiting for are all running in separate tasks.

If for whatever reason that’s not appropriate (e.g., you’re not using tokio, or you really need a Stream interface), then in the body of a loop that pulls completed futures from the Stream, do not await any other futures. If you’re working with a FuturesUnordered, consider putting those futures into the set instead.

When using bounded channels

Bounded channels are not really the issue here. Even in omicron#9259, the capacity=1 channel was basically behaving as documented and as one would expect. It woke up a sender when capacity was available, and the other senders were blocked to maintain the documented FIFO property. However, some of the patterns that we use with bounded channels are problematic on their own and, if changed, could prevent the channel from getting caught up in a futurelock.

In Omicron, we commonly use bounded channels with send(msg).await. The bound is intended to cap memory usage and provide backpressure, but using the blocking send creates a second unbounded queue: the wait queue for the channel. Instead, we could consider using a larger capacity channel plus try_send() and propagate failure from try_send().

As an example, when we use the actor pattern, we typically observe that there’s only one actor and potentially many clients, so there’s not much point in buffering messages in the channel. So we use capacity = 1 and let clients block in send().await. But we could instead have capacity = 16 and have clients use try_send() and propagate failure if they’re unable to send the message. The value 16 here is pretty arbitrary. You want it to be large enough to account for an expected amount of client concurrency, but not larger. If the value is too small, you’ll wind up with spurious failures when the client could have just waited a bit longer. If the value is too large, you can wind up queueing so much work that the actor is always behind (and clients are potentially even timing out at a higher level). One might observe:

Channel limits, channel limits: always wrong!

Some too short and some too long!

But as with timeouts, it’s often possible to find values that work in practice.

Using send_timeout() is not a mitigation because this still results in the sender blocking. It needs to be polled after the timeout expires in order to give up. But with futurelock, it will never be polled.

Anti-pattern: just make the channel bigger

In our initial encounter with this problem, we had a bounded tokio::sync::mpsc channel of capacity 1. Why not bump the capacity up?

To avoid futurelock, the channel would have to have capacity big enough that nobody in the call stack could possibly have that many futures that they’ve started and aren’t polling. There is of course no way to know how big this needs to be, and it could change over time as the program evolves. Further, there are big side effects to having big channels like this in terms of latency, backpressure, and memory usage.

Anti-pattern: try to avoid dependencies between futures

In principle, you could avoid this problem if you avoid dependencies between futures. Aside from using spawn to do this, we do not recommend this in general because it’s brittle and risky.

First, it’s hard to know there are no dependencies. Any shared resource can be such a dependency: a bounded channel of any kind, a Mutex, a request to a remote service, etc. And it can be anywhere in the stack, including several dependency packages down the call chain.

Even if there’s no such dependency now, one could be added later. You could imagine future1 calling some_crate::func1() and future2 calling other_crate::func2() that seem like simple functions. some_crate could decide to add a global Mutex that is otherwise safe and correct, but this would now break your tokio::select! that was previously assuming these futures shared no dependencies.

The exception to this is that using tokio::spawn is a good way to replace one or more futures that could be subject to futurelock with ones that can’t. The returned JoinHandle is a future that becomes ready under the same conditions as the underlying one, but it does not hold shared resources and it’s very unlikely that that would ever change as tokio evolves. (Such a change would almost certainly break lots of correctly-written programs.)

Open Questions

Can we write clippy lints to:

Warn when passing &mut future to a tokio::select! arm and suggest that tokio::spawn be used instead, and
Warn when using await in a tokio::select! arm? (This is problematic for other reasons anyway when select! is used in a loop.)

There are certainly cases to do this and it’s okay to override the warning, but it’d be nice to have that guard rail.

Security Considerations

None actionable. Futurelock is a potential vector for denial of service, but it’s bad anyway, and we know we want to avoid it.

External References

[rfd397] Oxide Computer Co. RFD 397 Challenges with async/await in the control plane. 2023.
[rfd400] Oxide Computer Co. RFD 400 Dealing with cancel safety in async Rust

Footnotes

1
Note that steps 2 and 3 can happen in the other order because tokio::select! polls its futures in random order. But the result is the same.
View
2
As above, it’s possible that future2 is polled first, in which case future1 will not be polled at all. That doesn’t change anything else.
View
3
Although the use of the Mutex is simplistic, it’s realistic and correct: there are no lock ordering problems nor unsafe operations done with the lock held, etc.
View

RFD 609 Futurelock

Table of Contents