This RFD describes futurelock: a type of deadlock where a resource owned by Future A is required for another Future B to proceed, while the Task responsible for both Futures is no longer polling A. Futurelock is a particularly subtle risk in writing asynchronous Rust.
Oxide initially saw this problem in oxidecomputer/omicron#9259.
Example of the problem
Consider the following program (in the playground):
use std::sync::Arc;
use std::time::Duration;
use tokio::sync::Mutex;
use tokio::time::sleep;
use futures::FutureExt;
#[tokio::main]
async fn main() {
// Create a lock that will be shared by multiple tasks.
let lock = Arc::new(Mutex::new(()));
// Start a background task that takes the lock and holds it for a few
// seconds. This is just to simulate some contention. This function only
// returns once the lock has been taken in the background task.
start_background_task(lock.clone()).await;
// The guts of the example.
do_stuff(lock.clone()).await;
}
// Starts a background task that grabs the lock, holds it for 5 seconds,
// and then drops it. Returns once the task is holding the lock.
// The purpose of this is to simulate contention.
async fn start_background_task(lock: Arc<Mutex<()>>) {
// Use a channel to coordinate with the task so that it can tell us when
// its taken the lock.
let (tx, rx) = tokio::sync::oneshot::channel();
let _ = tokio::spawn(async move {
println!("background task: start");
let _guard = lock.lock().await;
let _ = tx.send(());
sleep(Duration::from_secs(5)).await;
println!("background task: done (dropping lock)")
});
// Wait for the task to take the lock before returning.
let _ = rx.await;
}
// The guts of the example
async fn do_stuff(lock: Arc<Mutex<()>>) {
let mut future1 = do_async_thing("op1", lock.clone()).boxed();
// Try to execute `future1`. If it takes more than 500ms, do
// a related thing instead.
println!("do_stuff: entering select");
tokio::select! {
_ = &mut future1 => {
println!("do_stuff: arm1 future finished");
}
_ = sleep(Duration::from_millis(500)) => {
do_async_thing("op2", lock.clone()).await;
}
};
println!("do_stuff: all done");
}
async fn do_async_thing(label: &str, lock: Arc<Mutex<()>>) {
println!("{label}: started");
let _ = lock.lock().await;
println!("{label}: acquired lock");
println!("{label}: done");
}This program reliably deadlocks. This surprises a lot of people! A background Task takes a lock, waits 5s, drops the lock and exits. In the meantime, we do_stuff. That stuff consists of waiting for two Futures concurrently via select!. One future waits for the lock while the other sleeps for 0.5s and waits for the lock. So there’s just one lock and all logical streams of execution seem to execute concurrently. How could this possibly hang?
The interesting bits are all in do_stuff():
async fn do_stuff(lock: Arc<Mutex<()>>) {
let mut future1 = do_async_thing("op1", lock.clone()).boxed();
// Try to execute `future1`. If it takes more than 500ms, do
// a related thing instead.
println!("do_stuff: entering select");
tokio::select! {
_ = &mut future1 => {
println!("do_stuff: arm1 future finished");
}
_ = sleep(Duration::from_millis(500)) => {
do_async_thing("op2", lock.clone()).await;
}
};
println!("do_stuff: all done");
}| 1 | future1 is the (boxed) future returned by do_async_thing(), an async function. |
| 2 | We’ll call the future returned by sleep: future2 (or, the "sleep" future). |
| 3 | The second branch of the select! is its own future. We’ll call this future3. |
It’s really important to understand what’s happening here so let’s be clear about the sequence.
First:
background task takes
lock, begins holding it for 5 secondstokio::select!begins polling&mut future1.[1] This future attempts to take the lock, blocks, returnsPoll::Pending.tokio::select!begins pollingfuture2(the sleep future) and blocks, returningPoll::Pending.
At this point:
the background task holds the lock
the main task is blocked in
tokio::select!on two different futures:future1is blocked on taking the lockfuture2(thesleepfuture) waiting for 500ms
500ms later, tokio wakes up the main task because future2 (the sleep future) is ready. Inside tokio::select!:
The task polls both futures.
future1is still blocked on the lock and returnsPending.[2]future2(the sleep future) is ready and returnsReady.
tokio::select!chooses the second branch&mut future1is dropped, but this is just a reference and so has no effect. Importantly, the future itself (future1) is not dropped.the second branch is entered.
do_async_thing("op2", …)is called, creating a new futurefuture3. This future immediately blocks trying to take the lock, which is still held by the background task.
At this point, we have:
the lock (still) held by the background task
the lock’s wait queue contains two waiting futures:
future1future3(the second arm of thetokio::select!)
There are two key points here:
The lock’s wait queue is literally a queue: only
future1can take the lock once it is released by the background task (unlessfuture1is dropped).The behavior of
tokio::select!is to poll all branches' futures only until one of them returns `Ready`. At that point, it drops the other branches' futures and only runs the body of the branch that’s ready.
Critically: the same task is responsible for both of the futures waiting on the lock. But that task is currently only polling on one of them. Unfortunately, it’s the wrong one.
About 4.5 seconds later:
The background task drops the lock.
The lock is given to
future1. (See below for more on why.)The task that polled
future1(the main task) is woken up.
However, that task is not polling
future1.future1is polled at the top-leveltokio::select!. But thetokio::select!has already chosen the other branch. It’s now only pollingfuture3. (In fact, even absent the imminent hang,future1would never be polled again. It would be cancelled without having completed when it got dropped at the end ofdo_stuff.)
Thus:
There is only one task left. It’s blocked on
future3.future3is blocked on a Mutex that’s owned byfuture1.future1cannot run (and therefore cannot drop the Mutex) until the task starts running it.
We call this specific kind of deadlock futurelock. The program is stuck in this state forever.
FAQ: why doesn’t the Mutex wake up the other future?
This particular example uses tokio::sync::Mutex, which is a fair Mutex. That means that the lock is given to waiters in the order that they started waiting. It has to give it to future1.
An unfair Mutex would not fix things. The problem wouldn’t be guaranteed to happen with an unfair Mutex, but it wouldn’t be guaranteed not to, either. The Mutex does not (and cannot) know which future would be "better" to wake up, or which one is being polled. You could imagine an unfair Mutex that always woke up all waiters and let them race to grab the lock again. That would not suffer from risk of futurelock, but it would have the thundering herd problem plus all the liveness issues associated with unfair synchronization primitives. And it’s not how many synchronization primitives work.
It’s helpful to view this in terms of responsibilities: the Mutex’s job here is to wake up the next task waiting for the lock. And it’s doing that. It’s that task’s responsibility to check on all the futures that it’s responsible for. The Mutex cannot do that.
FAQ: why isn’t the tokio::select! polling on future1? Isn’t that the whole idea of tokio::select!
The idea of tokio::select! is to poll on multiple futures concurrently and enter the branch for whichever one finishes first. Once one of the futures does finish (as the sleep one has in our case), control enters that specific branch. It essentially commits to that branch and it’s only running that branch at that point.
The tokio::select! docs are explicit about this:
By running all async expressions on the current task, the expressions are able to run concurrently but not in parallel. This means all expressions are run on the same thread and if one branch blocks the thread, all other expressions will be unable to continue. If parallelism is required, spawn each async expression using tokio::spawn and pass the join handle to select!.
FAQ: doesn’t future1 get cancelled?
When one of the futures that tokio::select! is polling on completes, the others get dropped. In this case, what’s dropped is &mut future1. But future1 is not dropped, so the actual future is not cancelled.
If future1 did get cancelled, you’d get no deadlock. Try it: change the above to wait on future1 instead of &mut future1. Alternatively, you can add an explicit drop(future1); at line 51 between the sleep and the do_async_thing. This mimics what select! does if we use future1 rather than &mut future1.
When first learning async Rust, it’s common to think of tasks and futures almost interchangeably. When you want parallelism, you spawn a new task and give it the future that you want to run. If you want to do 10 things in parallel, you spawn 10 tasks and then wait for them all to finish.
You can have concurrency without tasks (and without parallelism) using something like tokio::select!. Within a single task, you can do 10 things concurrently (not in parallel) using tokio::select! or FuturesUnordered or the like. In this case, your one task is polling on all these futures and getting woken up when any of them might be ready.
Tasks are the top-level entities that the runtime executes. Each task runs one top-level future. That future can choose to do only do one thing at a time (as in the case of sequential code using await), or it can choose to do things concurrently by polling many futures, using tokio::select! or FuturesUnordered or the like.
What causes futurelock?
The general problem here is that you have:
task
Tis blocked on futureF1completing (andTis directly awaitingF1)future
F1is blocked on futureF2in some way (e.g., acquiring a shared Mutex)future
F2is blocked on taskTpolling it, butTisn’t polling it because it’s only pollingF1
Most commonly this happens after T started polling F2 earlier, but then switched to F1. This can happen in a bunch of different cases:
using
tokio::select!with a&mut futureand awaiting in one of the other branches (our example above)polling futures from a
FuturesOrdered/FuturesUnordered(e.g., by callingnext()) and then awaiting on any other future (e.g., each time one of the futures completes from the set, you do some async activity)in a hand-written
Futureimpl that behaves analogously
You can hit futurelock even if you never start polling one of the futures. Consider this example:
use futures::FutureExt;
#[tokio::main]
async fn main() {
let (tx, rx) = tokio::sync::oneshot::channel();
let _future1 = async { tx.send(()).unwrap(); }.boxed();
let future2 = async { rx.await.unwrap(); };
future2.await;
}This deadlocks, too. And for the same reason: this task is waiting on a future that itself depends on a future that this task is responsible for running. This is possible but feels contrived. This RFD focuses on cases where the dependency between futures relates to a shared resource. That generally requires that the futures all start running so they can get in line for the resource.
How you can hit this with tokio::select!
Hitting this problem with tokio::select! (as in the example above) requires two things to be true:
You must be passing a
&mut futureto one of the branches. If you’re passing an owned future, then it will get dropped when thetokio::select!enters a different branch. That generally releases the resources that might have been blocking other futures.You must be using
awaitin one of the branches' handlers. If you’re not doing this, then the task does not get blocked on any particular future at the expense of the others.
That said, it’s just as problematic to have an owned future across a tokio::select! and await after it (full example):
async fn do_stuff(lock: Arc<Mutex<()>>) {
let mut future1 = do_async_thing("op1", lock.clone()).boxed();
// Try to execute `future1`. If it takes more than 500ms, do
// a related thing instead.
println!("do_stuff: entering select");
tokio::select! {
_ = &mut future1 => {
println!("do_stuff: arm1 future finished");
}
_ = sleep(Duration::from_millis(500)) => {}
};
do_async_thing("op2", lock.clone()).await;
println!("do_stuff: all done");
}This results in exactly the same behavior.
How you can hit this with Streams
If you pull a future from a Stream and then await a future that somehow depends on another Future in the stream, you can wind up with futurelock. Here’s what it looks like with FuturesOrdered (full example):
async fn do_stuff(lock: Arc<Mutex<()>>) {
let future1 = sleep(Duration::from_millis(500)).boxed();
let future2 = do_thing("op1", lock.clone()).boxed();
let mut futs = FuturesOrdered::new();
futs.push_back(future1);
futs.push_back(future2);
let _ = futs.next().await;
println!("one future finished");
do_thing("op2", lock.clone()).await;
}These are often used in a loop, so it may tend to look more like this (full example):
async fn do_stuff(lock: Arc<Mutex<()>>) {
let future1 = do_thing("op1", lock.clone()).boxed();
let future2 = sleep(Duration::from_millis(500)).boxed();
let mut futs = FuturesUnordered::new();
futs.push(future2);
futs.push(future1);
while let Some(_) = futs.next().await {
println!("a future finished");
do_thing("op2", lock.clone()).await;
}
}It seems likely that futurelock is a risk when using many other Stream functions.
What about join_all?
You can’t hit this with futures::future::join_all. That’s because it polls all of its futures and does not stop polling any of the pending futures.
Failure mode, debugging
Futurelock is a type of deadlock and tends to manifest as a hang of part or all of the program. When we saw this in omicron#9259, every future attempting to access the database became part of the futurelock. Since authorization uses the database, essentially every incoming HTTP request hung indefiniteily.
Debugging this problem from direct observation can be next to impossible. Typically, you’d only start looking at data long after the problem happened. At that point, it’s not clear what evidence you’d find even if you could peer into the executor state. The problem looks like a pending future whose task has been woken up because of that future, but the task has not polled the future. (Maybe tokio-console could help?)
In omicron#9259, we were able to determine (by tracing individual function calls with DTrace) that:
all incoming requests were blocking on attempts to send on an
mpscchannel with capacity 1the receiving end of this channel was regularly checking it and finding no messages queued
This confused us for quite a while. Why are senders blocking if there’s nothing in the channel? In hindsight, the answer’s implied by the documentation for Sender, which says:
Sends a value, waiting until there is capacity.
…
This channel uses a queue to ensure that calls to send and reserve complete in the order they were requested.
One can infer that a given call to send may block either because there is no capacity or because another sender’s send() is not completing. That could be because the channel is full, but in our case it’s because the future for that sender had run into futurelock.
It’s hard to give useful advice for debugging this sort of problem aside from advising that you consider futurelock as a possibility if you’re debugging a hang and some future appears blocked when other evidence suggests that it shouldn’t be.
Determinations: avoiding this problem
Like async cancellation (see [rfd397]), futurelock defeats Rust’s goal of being able to reason locally about correctness. If we look at the pieces involved in our example:
Using
tokio::select!to wait for any of a few things to happenUsing
awaitin atokio::select!branchUsing a
&mut futurewithtokio::select!Using a Mutex[3]
None of these by itself is wrong, but combining them results in futurelock. Remember too that the Mutex could be buried beneath several layers of function calls in different modules or packages. It could require looking across many layers of the stack at once to be able to see the problem.
There’s no one abstraction, construct, or programming pattern we can point to here and say "never do this". Still, we can provide some guidelines.
In general
The most specific general advice we can give is: any time you have a single task polling multiple futures concurrently, be extremely careful that the task never stops polling a future that it previously started polling.
One way to avoid this situation is to bias towards spawning futures in new tasks instead. There are other considerations with this approach: futures would be cancelled when they’re dropped, but tasks won’t be aborted when their JoinHandle is dropped. If you want this, see AbortOnDropHandle.
When using tokio::select!
If you find yourself writing or reviewing code that does either of these:
Uses a
&mut futureas one of the async expressions in thetokio::select!Awaits inside the handler of a
tokio::select!branch or after thetokio::select!before thefuturehas been dropped
then look for the other as well. If both are present, pay close attention to the risk of futurelock. To avoid it, you either need to avoid doing both of these things in the same tokio::select! call or else be very sure that future never blocks with shared resources held that could block other futures.
Let’s consider a variation of our original example:
let mut future1 = do_async_thing("op1", lock.clone()).boxed();
// Execute `future1`. Every 500ms, do something related
// (e.g., report progress).
loop {
println!("do_stuff: entering select");
tokio::select! {
_ = &mut future1 => {
println!("do_stuff: arm1 future finished");
break;
}
_ = sleep(Duration::from_millis(500)) => {
do_async_thing("op2", lock.clone()).await;
}
};
}
println!("do_stuff: all done");Here, we’ve wrapped the tokio::select! in a loop. This is a common pattern. The idea here is mainly to run future1, but every 500ms we do something related (like report progress or check if we should cancel the like).
The easiest way to make this safer is to spawn future in a new task. Then use the JoinHandle in the tokio::select!, like this version:
let future1 = do_async_thing("op1", lock.clone());
let mut future1_task = tokio::spawn(future1);
// Execute `future1`. Every 500ms, do something related
// (e.g., report progress).
loop {
println!("do_stuff: entering select");
tokio::select! {
_ = &mut future1_task => {
println!("do_stuff: arm1 future finished");
break;
}
_ = sleep(Duration::from_millis(500)) => {
do_async_thing("op2", lock.clone()).await;
}
};
}
println!("do_stuff: all done");This has the same desired effect of keeping future1 running, but now future1_task is a separate future. It’s cancellable, and cancelling it won’t cancel future1. (If you want that, you can still use future1_task.abort().) This construction cannot result in futurelock.
If you’re not using a loop, this approach is even better: then you can just pass future1_task to tokio::select! (rather than &mut future1_task) and it’ll be more obvious that this is safe.
In the end, you should always be extremely careful with tokio::select!. That’s because:
So either way you’ve got a subtle, non-locally-reasonable, undebuggable problem to worry about that the compiler can’t really help with.
When using Stream
When using a FuturesOrdered or FuturesUnordered, consider instead using tokio’s JoinSet. This provides a similar interface, but the futures you’re waiting for are all running in separate tasks.
If for whatever reason that’s not appropriate (e.g., you’re not using tokio, or you really need a Stream interface), then in the body of a loop that pulls completed futures from the Stream, do not await any other futures. If you’re working with a FuturesUnordered, consider putting those futures into the set instead.
When using bounded channels
Bounded channels are not really the issue here. Even in omicron#9259, the capacity=1 channel was basically behaving as documented and as one would expect. It woke up a sender when capacity was available, and the other senders were blocked to maintain the documented FIFO property. However, some of the patterns that we use with bounded channels are problematic on their own and, if changed, could prevent the channel from getting caught up in a futurelock.
In Omicron, we commonly use bounded channels with send(msg).await. The bound is intended to cap memory usage and provide backpressure, but using the blocking send creates a second unbounded queue: the wait queue for the channel. Instead, we could consider using a larger capacity channel plus try_send() and propagate failure from try_send().
As an example, when we use the actor pattern, we typically observe that there’s only one actor and potentially many clients, so there’s not much point in buffering messages in the channel. So we use capacity = 1 and let clients block in send().await. But we could instead have capacity = 16 and have clients use try_send() and propagate failure if they’re unable to send the message. The value 16 here is pretty arbitrary. You want it to be large enough to account for an expected amount of client concurrency, but not larger. If the value is too small, you’ll wind up with spurious failures when the client could have just waited a bit longer. If the value is too large, you can wind up queueing so much work that the actor is always behind (and clients are potentially even timing out at a higher level). One might observe:
Channel limits, channel limits: always wrong!
Some too short and some too long!
But as with timeouts, it’s often possible to find values that work in practice.
Using send_timeout() is not a mitigation because this still results in the sender blocking. It needs to be polled after the timeout expires in order to give up. But with futurelock, it will never be polled.
Anti-pattern: just make the channel bigger
In our initial encounter with this problem, we had a bounded tokio::sync::mpsc channel of capacity 1. Why not bump the capacity up?
To avoid futurelock, the channel would have to have capacity big enough that nobody in the call stack could possibly have that many futures that they’ve started and aren’t polling. There is of course no way to know how big this needs to be, and it could change over time as the program evolves. Further, there are big side effects to having big channels like this in terms of latency, backpressure, and memory usage.
Anti-pattern: try to avoid dependencies between futures
In principle, you could avoid this problem if you avoid dependencies between futures. Aside from using spawn to do this, we do not recommend this in general because it’s brittle and risky.
First, it’s hard to know there are no dependencies. Any shared resource can be such a dependency: a bounded channel of any kind, a Mutex, a request to a remote service, etc. And it can be anywhere in the stack, including several dependency packages down the call chain.
Even if there’s no such dependency now, one could be added later. You could imagine future1 calling some_crate::func1() and future2 calling other_crate::func2() that seem like simple functions. some_crate could decide to add a global Mutex that is otherwise safe and correct, but this would now break your tokio::select! that was previously assuming these futures shared no dependencies.
The exception to this is that using tokio::spawn is a good way to replace one or more futures that could be subject to futurelock with ones that can’t. The returned JoinHandle is a future that becomes ready under the same conditions as the underlying one, but it does not hold shared resources and it’s very unlikely that that would ever change as tokio evolves. (Such a change would almost certainly break lots of correctly-written programs.)
Open Questions
Can we write clippy lints to:
Warn when passing
&mut futureto atokio::select!arm and suggest thattokio::spawnbe used instead, andWarn when using
awaitin atokio::select!arm? (This is problematic for other reasons anyway whenselect!is used in a loop.)
There are certainly cases to do this and it’s okay to override the warning, but it’d be nice to have that guard rail.
Security Considerations
None actionable. Futurelock is a potential vector for denial of service, but it’s bad anyway, and we know we want to avoid it.
External References
[rfd397] Oxide Computer Co. RFD 397 Challenges with async/await in the control plane. 2023.
[rfd400] Oxide Computer Co. RFD 400 Dealing with cancel safety in async Rust