Uazú blog

See the sidebar for blog entries.

This is for publishing stuff that may be of interest related to Uazú projects in Rust.

2021-04-06: Interfacing a low-level actor system to Rust async/await, part 1

I've been coding on never-blocking actor systems for maybe 8 years, and that is "home" to me and the natural way to go about things. But in Rust most of the async ecosystem is based around async/await. So in order to join that ecosystem and make use of some of those crates, I need to interface my actor runtime to async/await. So Stakker needs to become an async/await executor.

So inspired by the Async Foundations Visioning exercise, I'm documenting this process to provide some hard data for a possible status quo story about interfacing to async/await from a foreign runtime, and perhaps to highlight what is needed to better support executor-independence.

Contents:

Ground rules

First of all, here are the relevant characteristics of the runtime that I'm interfacing from:

  • Never-blocking. This means that all events or messages must be handled by the actor immediately on delivery, and the runtime delivers messages ASAP. The actor can't temporarily block its queue whilst waiting for some external process to complete, nor selectively accept just certain types of messages, like some actor systems allow. This may seem limiting but actually it works out fine in practice, not least because there can't be deadlocks in the messaging layer. So I don't anticipate this being a big problem for interfacing to async/await. Note that in this runtime an actor message IS an event which IS an asynchronous actor call which IS an FnOnce closure on the queue. The are equivalent.

  • No futures or promises. Everything is imperative and direct. You simply make an asynchonous call (i.e. conceptually send a message) when you have something to communicate to another actor. If you want to be notified of something or receive data or a response at some point in the future, you provide a callback in the form of a Ret or Fwd instance. Ret is effectively the opposite of a Future, the other end of the conceptual pipe passing a result back to the code that requested it, and Fwd is the opposite of Stream. So the problem is to interface Ret and Fwd to the common async/await traits. Note that Fwd and Ret handlers run inline at the callsite but typically result in asynchronous calls (i.e. FnOnce closures) being pushed to the queue.

  • Anything might fail: It is expected that actors may fail and be restarted, and the rest of the actor system should continue running fine. This is normal operation. This raises questions about how to deal with failure when async/await code is waiting for data from an actor that goes away.

  • Single-threaded. Stakker makes a conscious choice to optimise for single-threaded operation and insist that load-balancing/etc be done at a higher level. This encourages load-balancing of larger units of work, which should improve parallel performance when several Stakker runtimes work in parallel. This might cause some problems because async/await seems to be oriented around multi-threaded operation.

The characteristics of the target ecosystem (Rust async/await) presumably don't need describing.

Impressions from an actor perspective

Futures and streams

First of all, futures seem weird as a concept. You want a result and you effectively get given an IOU. What use is that? What purpose does a future serve? Why can't the other end just wait and pass us the final result when it is done, instead of giving us a proxy for the result? But then I realized that a future is effectively a temporary mailbox. If the receiving code does not already have some kind of a mailbox, i.e. some concept of a component and a way for events to be delivered to that component, then this may be the only way to get the response delivered. However Stakker has no need for futures as it already has a means for messages to be delivered asynchronously to a destination. So Stakker works the other way around: A Ret sends a value to an end-point, whereas a Future is held by an end-point to receive a value.

Stream in future-core seems to work similarly, i.e. a Stream acts as a mailbox where values will be received by an end-point. Contrast this with Fwd which sends a stream of values to an end-point, i.e. conceptually Fwd is at the opposite end of the pipe. Both Stream and Future operate on a "pull" model. The Stakker primitives on the other hand are clearly "push" operations. So this is a difference in approach.

Forced use of RefCell

The poll method of the Future trait seems like a narrow door in a wall between two bodies of code. There is no way to do qcell/GhostCell-style statically-checked cell borrowing within a Future, because there is no way to communicate an active borrow up through the poll calls from the runtime. Given that, the path of least resistance leads to using RefCell::borrow_mut, which IMHO is a bad habit to get into. I found myself writing "Borrow-safety: ..." comments to justify my use of borrow_mut() and how/why it was going to be panic-free, just as if I was dealing with unsafe. (It's hard to go back to manually-verified cell borrowing once you've got used to statically-checked cell borrowing ...)

Could this work differently and still be executor-independent? Maybe. The std::task::Context is where you'd have to put a borrow of a cell-owner (or "brand" owner), but then it would have to be built into the standard library and be one that all executors could support. For QCell, TCell and TLCell the poll signature is already adequate: the Context<'_> already indicates that it can contain borrows of other things. However for LCell or GhostCell, it would also need an <'id> added to the signature. The GhostCell style has least restrictions, but unfortunately adding <'id> to all poll implementations would be a difficulty. Could the compiler derive this automatically? I would be totally in favour of Context adding GhostCell-like cell borrowing if the compiler didn't require the <'id>.

In Stakker, I pass active borrows to cell-owners up through all the calls, which allows statically-checked access to two independent classes of data: both actor-state and Share-state. But I have full control in Stakker and I don't have to conform to any external traits, nor worry about compatibility with other actor runtimes. To allow statically-checked cell borrowing in async/await, the standard library would have to adopt one single qcell/GhostCell-like solution for std::task::Context -- and probably this is not an easy decision right now.

Cache-efficiency

Rust async/await seems like it might be more cache-efficient than Stakker. Since the code is "pull"-based, it will keep on pulling data until there is no more data immediately available. So that exhausts a single resource in one go, whilst all the related code is still in cache. On the other hand, Stakker processes actor calls in submission order. So if the input events are interleaved, then so will be the processing. However Stakker can still do bulk operations, e.g. if what is queued is a notification to examine a resource rather than the individual chunks of data, then the entire resource can still be flushed in one go. There are probably pros and cons of both approaches.

Complexity

It may be that the complexity behind async/await is the minimum necessary complexity to get the job done, but it doesn't seem very simple or elegant on the surface. In particular working with pinning just seems really awkward. Maybe this is conceptually elegant underneath and it's just the initial implementation that is a bit rough, but I haven't got to that point with it yet.

Multi-threaded

The standard library Waker is Send, so this wake-up mechanism is obviously not designed for efficient single-threaded use. Given that I'm writing a single-threaded executor, that is an unacceptable cost, so I implemented a separate wake-up mechanism that Stakker-specific glue code can use instead. So this means that where async/await code is passing data to or from actor code, no synchronization operations (atomics, mutexes, etc) are required at all. I will later add a separate spawn_with_waker call to spawn with a traditional Waker where that is required, e.g. where some async code spawns threads and needs to send wake-up notifications back across threads to the Stakker thread.

Mapping between Ret and Future

Conceptually Ret and Future are like opposite ends of the same pipe, so it's quite natural to interface the two together. There are a few combinations:

  • Spawn: FutureRet: Connects an existing Future and Ret together and runs the future to completion, passing the final value to the Ret. Allows flow of data from async/await to the actor system.

  • Push pipe: RetFuture: Create a new Future / Ret pair where the future will resolve to the value passed to the Ret, as soon as that value is provided. Allows flow of data from the actor system to async/await.

  • Pull pipe: FutureRet / Ret: Create a Future which when first polled sends a message to the provided Ret requesting data and providing another Ret to return it with, which when responded to resolves the future to that value. This is like a push pipe, except that data doesn't have to be generated until it is needed.

Within those combinations there are also choices about handling of failure. Ret has the property that if it is dropped (e.g. the actor handling it fails), None is sent back. There are two ways of mapping that failure onto async/await:

  • Drop the whole async/await task. This means that from the point of view of the async/await code, it will abruptly stop executing at whatever .await it was stuck on. However, the plus side is that no special handling of failure is required.

  • Pass the error through, using Result<T, ActorFail> as the type returned by the Future. That way the task can handle the failure and continue executing.

Mapping between Fwd and Stream

First of all StreamFwd can be supported with a spawn-like operation, i.e. running the stream in the background until it terminates, forwarding all values to the Fwd. This is straightforward.

For FwdStream, it is more complex. There is a fundmental difference in semantics between Fwd and Stream. Fwd is effectively just a connection between A and B, allowing an endless stream of values to be pushed, whereas Stream is a "pull" connection and supports termination. Connecting a Fwd to a Stream directly as a "push pipe" requires a queue in between, because we can't force the owner of the Stream to handle values if it doesn't want to. So given the requirement for queuing and no mechanism for backpressure, this is probably not the ideal setup in most cases.

However a "push pipe" can be made more manageable with a Fwd<()> callback that is called whenever the queue becomes empty. That way the sender can refill batches of data when requested. If the sender pushes just one value each time it is called, then the "push pipe" has become a "pull pipe". So this allows the full spectrum of implementations. To handle termination, we can require that the Fwd passes Option<T> values the same as the Stream does.

Then there is the question of handling actor failures. If the last reference to the Fwd goes away before the final None is sent then that's assumed to be an irregular termination of the stream. As for Future there are two ways to handle this:

  • Drop the whole async/await task. This means that poll_next can return Option<T> as normal, and the async/await code doesn't have to do any special handling of failure.

  • Pass the error through, meaning that the return from the poll_next will be Option<Result<T, ActorFail>>. So normally you'd get zero or more Some(Ok(value)) values, then a None for termination of the stream. However in the case of actor failure instead you'd get Some(Err(ActorFail)) then None to terminate the stream.

State of play

Spawning futures and streams, plus the actor interfaces to futures and streams are running fine, with basic tests, in the current version of the stakker_async_await crate.

So far things have not been too bad. Figuring out how best to implement a single-threaded task wake-up mechanism within Stakker took a while, trying various different methods. Pinning was awkward but manageable after going through the docs. Finding the best mapping between the two sides took some consideration.

Next steps

The aim is to attempt to support all the executor-independent async/await interfaces available in the ecosystem, to see how that goes and what the differences are. Also to see how much executor-independent code is out there, and what precisely it requires of the runtime.

So these crates will be looked at, to see how much can be supported:

Are there any other crates out there that would be worth looking at?

In addition, it would be good to be able to make asynchronous actor calls from async/await code written specifically to run on Stakker. But that will probably need quite a bit of work to get the ergonomics right.

2021-02-17: The Stakker actor runtime: Beyond "Go++"

It's been more than a year since Stakker's first release, and it is now shipping in a commercial product, so it seems like time to announce it and to compare notes on Rust actor systems in general.

Contents:

Rust + async/await + runtime = Go++

Rust's async/await was a big step forward, but now a lot of people seem to think that that's the end of the story, and that's how concurrency should be done in Rust. However Rust can go much much further than that. Async/await plus an associated runtime is effectively "Go++", i.e. you can do almost everything Go can do but in a low-level Rust style.

However the fact that there is a proliferation of actor crates shows that it doesn't cover everyone's requirements. Async/await emphasises sequential execution. Awaiting on something means your coroutine is blocking on one thing and can't receive any other events. Whilst you can choose to await on multiple things, that is not what this model is streamlined for at the source level.

The actor model on the other hand is the opposite. In the purest actor model, nothing blocks. By default everything is asynchronous and events can arrive from any source at any time. Multiple calls can be outstanding on an actor, not just one as in the case of await.

For things that are asynchronous in a big way with events arriving from all directions, or where it's just not going to work out to try to force things down a mostly sequential code path, or where it is much cleaner to reason about things as a state machine plus incoming events (which is essentially what an actor is), then you really need an actor system.

Different interpretations of the actor model

There are different points of view on how to manage actor queues.

Pony-like: My own crate Stakker takes a pure "nothing blocks" approach, something like the Pony language. Once an actor is up and running, nothing blocks — ever. Events arriving at an actor must be handled immediately. If handling an event starts a process that may take some time, then the actor must arrange to be notified when that process is complete. This is very clean and easy to reason about. Events from all different directions multiplex seamlessly at the actor.

Await-blocking: It seems that some actor implementations choose to temporarily block their queue to facilitate interop with async/await. For example an actor may stop processing its own queue whilst it waits for an external process to complete, e.g. an await call on something from the async/await ecosystem. It seems to me that leaving stuff unprocessed on the queue adds latency and maybe could even lead to deadlocks if the coder isn't careful. What happens if something arrives that changes your view on whether you should continue with the awaited operation? It seems like a half-way compromise between the async/await and actor models.

Erlang-like: In Erlang you can scan your queue looking for particular types of messages. I don't know whether anyone implements this in Rust, but it certainly makes the queuing more complicated. Again it's up to the coder to make sure that they don't leave important messages unprocessed on their queue. Given the popularity of Erlang and Elixir, it must work out for those coders, though.

Note that there is an essential impedence mismatch between the async/await model and the actor model. If we have an awaitable object and wish to interface it to an actor system, the cleanest way is to wrap it in an actor and have that actor accept calls and queue them up internally, and then feed them to the awaitable object one by one. Then the rest of the actor system can run without blocking. This is because you can't have two awaits running at the same time on an awaitable object, which is a limitation that actors don't have.

When multi-threaded is slower than single-threaded

It's known, with rayon for example, that if your unit of work is too small, then parallelizing the job will make it slower. So even though you now have 4 or 8 threads working on the problem, it takes longer to execute than with a single thread. This is because the synchronization overheads dominate the execution time. So to parallelize your job effectively, you have to split it into larger units of work.

The same applies to actor systems. Typically the work done when an actor handles a message is very small: update the state, and maybe send a message or two. So spreading the handling of individual actor messages across threads is a bad idea because the unit of work is way too small. So any implementation of an actor system on top of channels will not be very efficient. You need to find a way to break things into larger units of work before sharing the work between threads.

This is one of the reasons why Stakker chooses to be single-threaded at its core (although several independent Stakker runtimes can be run on different threads). This does shift the burden of dividing work between threads back to the coder, a job which the coder would perhaps prefer not to have to think about. However if you care about efficiency, then you have to think about it. And in any case, without a lot of care, throwing more threads at an actor system will just make it slower. So typical actor code will likely run faster single-threaded anyway.

There has been a lot of pressure on coders over the years to move away from single-threaded coding styles and to accept multi-threaded coding. But this has been reduced to "single-threaded bad, multi-threaded good", so people see a multi-threaded solution and think that it must be better. But that's just not true. The whole thing is a lot more nuanced than that. Really it comes down to: How big is your unit of work? i.e. how much work can you get done between synchronization points.

People may say "but single-threaded doesn't scale". Well, multi-threaded doesn't scale either — at least it only lets you scale to the capacity of the machine. Beyond that you have to think about load balancing or sharding or some other mechanism to distribute work between different machines. So in any case adopting multi-threading is only going to solve your problems temporarily, and for actors will possibly even make things worse.

So given that for an actor work-load, the unit of work is often small compared to synchronization overheads, running multiple single-threaded actor runtimes has its advantages. Load-balancing is something you're going to have to solve at a higher level anyway if you really want to scale.

Stakker features

Here are some of the features that hopefully make Stakker stand out:

  • Static checks: Everything possible is statically checked to find bugs at compile-time rather than run-time. Whereas other actor systems may rely on dynamic checks behind the scenes to maintain safety (e.g. RefCell), Stakker does this at compile-time. It uses the qcell crate to extend Rust's borrow-checking into actors. For example, this guarantees and proves at compile-time that no actor can access any other actor's state. So you can have confidence that your code will not unexpectedly panic due to a coding error causing a check to fail. It also means that the checks have no runtime overhead.

  • High efficiency: Message queueing and execution does not require locks or atomics or allocations or match. Stakker does not use structures for messages, so does not need to match on them. Instead a message is a closure that makes a static call to a method on the destination actor struct. When an actor sends a message, it adds a FnOnce to an internal queue that directly calls that method. Rust is free to inline all that code, so handling a message can be reduced to a single branch to a piece of optimised inlined code that directly modifies the actor's state. Rust may even choose to inline the constant values that you're passing within the message, effectively giving you specialization of the handler too. The FnOnce queue is a flat area of memory, so typically no allocations are required to queue or execute a message either.

  • Choice of implementations: Stakker provides a choice of internal implementations controlled by cargo features, all behind the same fixed API. So if you're running just one Stakker instance, internally it can use globals as an optimisation, but if you change your mind, you can just add a feature and it will switch off that optimisation. If you don't want to risk the unsafe code within Stakker, you can turn off unsafe, and it will use safe alternatives (at some cost in memory and time).

  • Rust-native: Stakker is as low-level and Rust-like as possible. Everything that makes Rust what it is has been extended into the actor model. So it is not an emulation of some other actor system on top of Rust, with all the inefficiencies that brings. Rather it aims to be a fully Rust-native actor system. Amongst other things, this means everything possible is borrow-checked and type-checked; it means that the Rust compiler has static knowledge of what you're doing and can inline and optimise; it means that you can count on drop cleaning things up; and it means that everything is safe.

  • Event system independent: It is not tied to any particular underlying I/O or event system. So it can layer on top of anything. Already implemented is mio, but it should be possible to layer it on top of tokio or async_std if required, or even on top of event loops from other languages. (One of the design requirements was that it should integrate well with C++ applications.) It requires just one timer from the external event system to drive the whole Stakker timer queue.

  • Shared state: It's pragmatic about intentionally shared state. Whilst shared state is not allowed in the pure actor model, there is no way in Rust to stop someone passing around an Rc<RefCell<V>>> — at least not without limiting other useful features of Rust. So Stakker has a Share<V> to make this explicit, and also to make it statically-checked and safe from panics. Since shared state is explicit, its use can be monitored in the codebase.

  • Single-threaded: Each Stakker instance runs single-threaded, so there are no locks, no atomics, no memory fences, etc, etc. Your code will run unimpeded on that thread, using the full speed of the core.

  • Solid handling of failure: Actors can be arranged in trees, and if required actors can be set up to automatically fail upwards, destroying each actor and its children as the failure propagates. In addition a Ret will inform the caller if it is dropped, i.e. if the actor it was sent to (or that was storing it) has died, so callers have a way to deal with actor failure too.

  • Virtualization of time: Stakker doesn't know where you got your Instant from, and it doesn't care. So you can make time run faster for tests.

  • Zero overhead: Really zero overhead and as close to the metal as possible, wherever possible.

Resources

History

  • Initial development of qcell happened 2018-2019
  • A simple Rust actor library was developed in Rust in 2019
  • This was redesigned and rewritten as the open-source Stakker crate later in 2019, with first release in Jan-2020
  • As of Feb-2021 Stakker is shipping in a commercial product

Some future possibilities for Stakker

  • Actor coroutines that can be started by the actor itself, and that run alongside the actor with direct safe access to the actor's state. This would allow certain parts of the actor's behaviour to be coded in a sequential way where that suits them. Multiple coroutines could be run for a single actor. Unfortunately it's impossible to implement this on top of async/await, because it needs an 'until_next_yield lifetime. With a planned feature of Rust generators it should be possible.

  • Remote actors, i.e. allow sending calls to actors in other threads or on other machines.

  • Crates to layer Stakker on top of tokio or async_std, and to wrap awaitable objects.

  • Support for offloading CPU-intensive work or I/O work to a threadpool. This would be cleanest if integrated with actor coroutines.

  • Maybe allow Actor<dyn Trait> instead of Actor<Box<dyn Trait>>, if Rust's new union of ManuallyDrop feature turns out to be helpful.