About Stakker

Stakker is a lightweight low-level single-threaded actor runtime for Rust. Some features:

  • It is designed to be layered on top of whatever event loop the user prefers to use, including ones from other languages.
  • Asynchronous calls are addressed directly to individual methods within an actor, rather like Pony behaviours.
  • All calls and argument types are known and statically checked at compile-time, which is very efficient and gives the optimiser a lot of scope.
  • It provides a timer queue for timeouts or delayed calls, a lazy queue to allow batching recent operations, and an idle queue for running a call when nothing else is outstanding.
  • Uses unsafe by default for added efficiency at runtime. However, enabling the "no-unsafe" cargo feature switches to a fully safe implementation, and builds with #[forbid(unsafe_code)] to guarantee that no unsafe code is present.

Resources:

Questions not covered in the FAQ can be asked in the GitHub discussion area. Please raise bugs and other issues on the GitHub issues page.

In future if more people come to use Stakker, it would be good to have some means of notifying those people if new features are being considered, to include them in discussions. This could be a forever-open GitHub issue, or some other low-tech channel. I'm open to suggestions if someone has done this before.

Related crates:

Let me know if you have a crate you'd like added here.

Stakker FAQ

To submit a question, please raise an issue on the Stakker GitHub page.

The cargo features seem confusing. Which ones do I need?

Remember that changing the cargo features does not change the API of the Stakker crate. Rather the features change the implementations that back the API, to support different ways of using Stakker. If you are only using a single Stakker instance in your app, the default features will be fine. If you plan to run more than one instance, then check the features documentation on the front page of the Stakker rustdoc on docs.rs, or else you'll get a panic.

How can I create an ActorOwn<dyn Trait> and call it?

See the actor_of_trait! macro for details of the best current solution, and the Traits Design Notes for alternative solutions and the reasons why Actor<dyn Traits> is not possible right now.

Stakker Design Notes

This section explains the reasoning behind some of the design choices of the Stakker crate.

Origin

The Stakker crate didn't start out as an actor library. Rather it started out as an exploration of how to deliver events to the correct contexts for them to be handled, and doing that in a way that is fully compatible with "the Rust way", i.e. that takes full advantage of Rust's approach to borrowing and ownership.

So effectively this is a low-level actor crate that evolved directly and organically from Rust's fundamental principles, from the ground up.

Addressing events to objects

The most natural way to organize objects or components in core Rust is in tree-like graphs with single ownership and no cross-references or back-references within the tree. However to manage an event system, events must be delivered to specific objects or components. So some form of references to those objects are required to support the event system.

In Rust's standard library, multiple ownership or long-lived direct references within a heterogeneous set of objects means using Rc. However this also means giving up compile-time borrowing checks and reverting to RefCell, which does run-time checks instead. Immediately we've lost one of Rust's most important compile-time checks. Investigating how to regain this compile-time check resulted in the qcell crate, which re-enables zero-cost compile-time checks of ownership behind Rc references.

So this means that we can now have both Rc references and safe mutable access to the Rc contents without any run-time checks or run-time cost.

Delivering events

The next problem is how to deliver events. One approach is to deliver an event to the destination as soon as it occurs, as a direct method call. However this results in nested callbacks on the stack, i.e. one event causes an event handler to be called on another object, which in responding to that event causes other events to be generated, calling event handlers in other objects, all before the original event handler has completed. The main problem with this approach is that it can lead to re-entrant calls to the same object, which in other languages can result in very hard to understand bugs.

However in Rust there is a bigger problem, because a re-entrant call will require borrowing the same object twice. If we use RefCell, we'll only discover this bug if we're lucky enough to come across it in testing. However if we use qcell instead, there is absolutely no way to construct a program that will execute a re-entrant call on an object. So letting Rust do compile-time borrowing checks on data behind Rc pays off, because it forces us to adopt a design that has none of the problems that are easy to produce in other languages accidentally.

So to avoid borrowing problems and re-entrant calls, event sending and event delivery must be separated, which means that a queue is necessary. The most fundamental queue would be one that stores a list of FnOnce closures to execute later. It's possible to make this efficient by storing the closures in a flat Vec<u8>, which means that no allocations are required to send a message, so long as the queue buffer has grown big enough.

This also demonstrates another pattern in Rust, that Rust's rules seem to lead to shallow call-stacks. This is because when a borrow is active, that often restricts access to other things. To get access to those things again means dropping back down the stack again. Also when borrows are passed deeper and deeper into the code, they seem to become more and more invasive and restrictive, as you end up having to annotate more and more functions and structures with lifetimes.

Using a FnOnce queue to defer operations untangles all of this and means that each operation is run directly from the main loop with the minimum restrictions from the borrow checker.

Becoming an actor system

So at this point we have:

  • Components which can be addressed from anywhere, with ref-counting references to them

  • No access to component state and no possibility to make direct calls to a component from another component, all guaranteed by the borrow checker

  • A queue that allows calls to methods on other components to be deferred to run later

This is so close to an actor system, that we might as well formalize it as one to make it easier to reason about.

However, compared to other actor systems, there are no per-actor message queues, and the 'messages' are actually FnOnce closures which call a method directly on the actor, rather than some arbitrary message structure that needs to be interpreted. So it is much closer to actors as provided by the Pony language, rather than some classical actor system where all the messages are visible and dealt with by hand.

Effectively each actor method on a Stakker actor takes the role of a message type, and the arguments of that method take the role of the data of that message.

Actor guarantees

Stakker is not a 100% strict actor system. Due to Rust's interior mutability (e.g. Rc<Cell>, Rc<RefCell>, Arc<Mutex>, etc), it is hard to totally forbid shared mutable state between actors. Also there are cases when shared mutable state may be useful for performance reasons. So Stakker accepts the existence of shared mutable state and also offers the Share type which provides zero-cost compile-time-borrow-checked shared mutable data between actors.

Sharing mutable state breaks the actor model and so must be used with care (if used at all), as it might complicate the coder's ability to reason cleanly about their code. The danger is about the same as using IPC shared memory between processes, or shared buffers between threads. However there are no locking concerns as Rust's borrow checker ensures that the coder has exclusive access whilst they hold a mutable reference to shared state.

The external interface of a group of actors bound together by shared mutable data still follows the actor model. So whilst the actor model is broken within the group, the larger system is still easy to reason about. It's only necessary to take care about interactions within that group.

So this means that it's possible to break up a single actor into many smaller actors where it is cleaner in the code to receive events directly to those actors, and yet they may still work together using shared mutable state if necessary for efficiency reasons.

Remember that the purpose of actors in Stakker is only to provide a context that is capable of receiving events. You can create complex trees and tables of logical components within a single actor, and that is fine. Not everything needs to be an actor. However as soon as an individual component needs to receive and handle events itself (including timers), it's likely that the code will be simplified by splitting it out as a separate actor. The cost of doing so in Stakker is small.

Making Stakker an actor system by default means that most of the code is easy to reason about locally, but those parts that need some extra performance through shared mutable state can still be written efficiently.

So, in Stakker:

  • Rust's guarantees always apply

  • The actor-model guarantees always apply regarding forbidding access to another actor's state or methods. (Not even interior mutability can break this.)

  • But the actor-model assumption of no shared mutable state only applies by default, until the coder intentionally passes sharable data between two actors.

Privilege levels

Another way of looking at the statically-checked cell borrowing approach used by Stakker (i.e. qcell-based borrowing) is to consider all the code associated with the actor system as belonging to one of three privilege levels.

Level 0: Code in the highest level of privilege has access to a &mut Stakker. This is the code outside of the actor system (the main loop and any associated external code that the actor system interfaces to) and the code called by the main loop that handles a queue-deferred action (actor call, return, forward, etc). In terms of statically-checked borrowing, both the Actor-owner and Share-owner are available.

Level 1: Actor methods all run in this level. They have a &mut Core (maybe via a &mut Cx), but have no access to Stakker methods. In terms of statically-checked borrowing, the Actor-owner is unavailable (because it was used to get access to the actor state), but the Share-owner is still free.

Level 2: Methods called on Share objects run in this level, as well as Drop handlers. In fact any code which doesn't accept a &mut Core or &mut Cx argument runs in this level. Neither of the cell owners are available. In the case of a method call on a Share object (e.g. share.rw(cx).method(args)), the &mut Core isn't available because it was used to get access to the share.

If you take a snapshot of the callstack at any point in time you'd find code running in one or more of these levels. At the base of the stack would be the main loop code, in level 0. If an actor method is running then this would be code in level 1. If the actor is calling out to Share code or arbitrary external libraries, this would be code in level 2. The levels form bands on the callstack.

Here are some examples of different things you can do at different levels:

Level:0: main loop1: actor methods2: share methods,
drop handlers
Available borrow:&mut Stakker&mut Core or
&mut Cx
none
Run the queuesYes--
Run an actor callYes--
query! an actorYes--
lazy!, idle!YesYes-
after!, at!, etcYesYes-
Access a ShareYesYes-
ret! data to RetYesYesYes
fwd! data to FwdYesYesYes
call! an actorYesYesYes
defer using DeferrerYesYesYes

Remember that these levels are statically enforced by the compiler (without any runtime overhead), so there is no way around it in safe code. However note that even in level 2, you can defer an operation, or forward data elsewhere. So the system of "privileges" only stops you doing things right now. It doesn't stop you doing them a little bit later. So it blocks synchronous operations when those could potentially cause issues, but doesn't stop you doing those same things asynchronously.

Also note that even in the internal code of Stakker it's impossible to break these rules. The rules are enforced by the Rust compiler and a tiny bit of code in qcell. If a borrow is performed to get access to an actor's state, then the &mut Stakker borrow is locked up until that actor borrow is released. Similarly a share borrow locks up the &mut Cx or &mut Core. See qcell documentation for more details. So this provides a very strong guarantee of correctness.

Low-level, not medium- or high-level

The aim is that the cost of a deferred inter-actor call should be on the order of a normal direct inter-object call. Obviously we can never as low as a direct call, but we want to get as close as possible. This means that Stakker is not aiming to replace multi-threaded asynchronous runtimes, multi-threaded asynchronous message systems, inter-thread channels, distributed systems, or anything else like that.

Rather Stakker's aim is to replace a mess of Rc<RefCell<...>> and a tangle of direct and indirect inter-object calls (or any other improvised collection of communicating components in the same thread) with a nice ordered, well-behaved set of actors, easy to reason about and maintain.

Higher-level inter-thread load balancing, work distribution and message passing can be layered on top of Stakker as necessary.

Fast message execution

When an actor call is deferred to the FnOnce queue, for example with the call! macro, the type and target method and any constant arguments are fully known to the Rust compiler. The only variables are the target actor's address and the remaining arguments. Rust can inline and optimise this closure, effectively specializing it to the type and method and the constant arguments provided. So this means that the closure might never even branch to the actor's code, since that might all have been inlined and optimised down. So the queue can execute much faster than any kind of traditional actor messaging system. The call effectively bypasses all the message creation and interpretation, and directly calls (or inlines) the actual actor code that needs to be run.

Similarly, where an arbitrary callback is required (using Fwd or Ret), for example where the type of the target actor is not known to the calling actor, this is handled as two closures. The first closure accepts the arguments in the Fwd signature, and pushes the second closure to the FnOnce queue. The second closure is just like a normal call!, so can be fully optimised down to a specific type and method. The first closure is just some glue that assembles the variable arguments ready for the second closure. So again, this is as direct as you can get, with no superfluous activity.

So again, aligning with Rust's strengths and making full use of Rust's compile-time checks and compile-time knowledge pays off.

Queue execution behaviour

The normal pattern is to introduce one or more events into the system as actor calls pushed onto the main queue, and then to run the queue to completion. This means that the queue is run repeatedly until it is empty. You can imagine that a queued call to the actor may trigger other calls as a consequence, and those might cause other calls, but eventually all the necessary changes will finish propagating through the system of actors, and then things will go quiet again.

Stakker always runs the main queue to completion before doing anything else. This means that if you wish to avoid saturating the CPU, it's necessary to regulate fetching input data or accepting input events. Load regulation does not occur within the actor system.

To aid with this, two additional queues are provided:

  • The first is the "lazy" queue. This is run when the main queue is fully exhausted, but before checking for new I/O events. As an example, it can be used to help batch output together into a single big flush. For example when writing to TCP, executing the main queue might mean several chunks of data being written to a stream's output buffer. Flushing after each write would be inefficient, and flushing after a delay would be too slow. Instead an actor can put the flush call on the lazy queue, and do one big flush once the current batch of processing is complete.

  • The second is the "idle" queue. This is executed only when the thread becomes idle, i.e. the main and lazy queues are empty and there are no new external events that need handling. This may be used to apply back-pressure on an incoming stream, by fetching more data only when there is nothing else to do. When there are several items on the idle queue, they will execute in a round-robin fashion, assuming each call pushes a new call back onto the queue when it executes.

Time virtualization

Actors should use cx.now() to get the current Instant. The current time is provided to Stakker by the external code that is running the event loop.

This has several consequences. For one, a batch of processing will all occur at the same logical time. Another is that the overhead of constantly calling Instant::now() throughout the code is avoided. (Instant::now() uses a Mutex on some platforms.) In addition, when it's necessary to integrate Stakker into non-Rust code, the current time from that code can be used instead of Rust's idea of time.

However a more interesting aspect of this is that it allows time to be virtualized. So you can make time appear to go faster or slower than realtime. You can skip over long sleeps when testing your application, or trigger timeouts when testing an individual actor, without consuming real time to do so. If you have a suitable external tool communicating with your main loop, you can coordinate a group of processes to skip over common sleeps to accelerate any testing that isn't CPU-bound. The Stakker runtime and the application won't know the difference.

Event source independence

The core actor runtime crate of Stakker has no dependency on any event system. An interface to mio is provided as the stakker_mio crate, but it should be possible to integrate Stakker into any event system, even ones written in other languages.

All that Stakker requires is that the external event system provides it with one timeout (to wake it when the next Stakker timer expires), and that it delivers events to it as calls pushed onto the Stakker queue.

Maximised single-threaded performance

Each Stakker instance is oriented around running as fast as possible within its own thread, avoiding any synchronization completely unless specifically requested. So a single Stakker instance and all of its associated actors are intentionally limited to running on a single thread. Unless something uses a Waker, there should be no code in Stakker that will cause execution of any kind of a CPU memory fence or other synchronization primitive. So the CPU core can run at full speed.

This means that inter-actor calls can be very fast. With the default features, making or executing an actor call does not require a memory allocation. A closure is added to the end of a queue stored in a flat preallocated buffer, and is then later executed straight out of that buffer. The average call cost varies by load, but overall it is roughly similar to calling things through Rc<RefCell>, once code has been added to handle reentrant calls in a Rc<RefCell>-based solution.

Some example deployment scenarios:

  • When the workload can be comfortably handled by a single core, Stakker works more efficiently than a multi-threaded runtime because there are no synchronization costs. This fits the traditional efficient "main thread" select/epoll non-blocking I/O model used by many internet servers and GUI applications.

  • When the event-driven workload requires more than one core, several Stakker instances (each with their associated actors) may be run, on different cores and/or on different machines. Workload must then be distributed between them at a higher level using sharding or some other form of load balancing.

  • In both cases when there is heavy processing to do, then that processing could be offloaded to a threadpool. In this case the synchronization costs are small compared to the saving in processing costs in the event-driven thread, and typically there will be no contention with other threads as the background processing runs.

Essentially if you're going to scale beyond one machine, you're going to have to solve the problem of work distribution between processing units anyway. So you might as well make your processing units run as fast as possible, which means running each of them on a single thread. Using a certain amount of synchronization between a group of independently-running threads is fine, for example to share common immutable data, but it's best to think hard before introducing shared mutable state between threads, or too much locking, or too much inter-thread messaging. By staying low-level, Stakker forces you to think about that. What is easy is fast, and what is slower takes more effort.

Stakker's approach won't suit all applications, but that is fine. There are other crates to handle the other scenarios. Stakker concentrates on being efficient within its own niche.

Why single-threaded?

Consider the levels of locality that different actor systems operate over, along with the different restrictions on message contents for each. (Note that if actors are intended to migrate, then the restrictions apply to actor state as well.)

Here are some levels with example types to illustrate this:

LocalityMessageActor refPass data by refShare mutably
Thread'staticRcBox or VecRc<Cell> or Share
ProcessSendArcBox or Vec + SendArc<Mutex>
DistributedserializableIDn/an/a

(Note that 'static means "no active borrows to stuff that might go away", which is just Rust's borrow checker making sure you don't crash your process.)

If your whole actor system is going to have a uniform interface, you need to pick your level, commit to it and optimise for it. Doing half of one and half of another might give some of the benefits of both, but it also brings the restrictions of both. This results in implementations having to take "opinionated" positions.

For a multi-threaded actor runtime, you pay the cost of synchronization and then must hope that maxing out as many cores as possible makes up for those costs in the application area of interest. For a distributed actor runtime, you accept the limitations and costs of everything being serializable in order to gain the benefit of distributed execution. However if we choose to limit ourselves to a single thread, then we can avoid those restrictions entirely and stay low-level and fast, free to use non-Send types and so on.

Stakker's design decisions means it naturally fits the fully-committed single-thread approach, which it takes full advantage of. Allowing seamless migration of actors between threads and redirection of queued calls to other threads would mean abandoning a lot of the single-thread performance and distinctive features of Stakker. So for now Stakker concentrates on the low-level goal of fast operation within a single thread.

Maybe in the future it might be possible to add some kind of a distributed or inter-thread message-passing layer above existing Stakker actors. There are some questions, though:

  • What kind of application scenario are we targetting?
  • Should we enable actors to migrate? To other threads? To other machines?
  • Should there be proxy actors to redirect calls, or some kind of direct message-sending mechanism?
  • How can we protect local same-thread calls from these overheads?
  • How should we represent references to actors that are on another thread or another machine? With the existing actor-reference types and proxies, or with new types?

There's not just the question of whether it can be done (which obviously it can), but whether it can be done efficiently, and whether the ergonomics can be made natural and comfortable for the coder.

This might be an interesting thing to investigate at some point.

Timer queue

The timer queue contains a list of FnOnce calls to execute at specific instants. What we need from this queue is:

  • To give us the next expiry time
  • To execute items from the front of the queue when time advances
  • To add and delete items

A BinaryHeap might be good for this queue, except for the problem of deletions being O(N). So instead for the moment a BTreeMap is used. The map is partitioned to split off the items to execute when time advances. This should scale much better than a binary heap, especially considering deletions. However a BTreeMap generates a lot of code, so it is likely a combined N-ary heap tuned to cache line size and Vec to support deletion might perform better. So the underlying implementation will probably change at some point, once various scenarios have been benchmarked.

Apart from fixed timers, there are also "max" and "min" timers, that can be adjusted very cheaply (just a memory write), without having to add and delete timers from the timer queue most of the time.

For fixed timers, the BTreeMap maps a 64-bit key (32-bit wrapping expiry time + 31-bit unique value) to a boxed FnOnce. For min/max timers, the 64-bit key in the map consists of a 32-bit provisional expiry time and a 31-bit slot number. The actual current expiry time which is updated by the calls is kept in a separate array. When the timer expires, the current expiry time is checked, and another timer added back in if necessary.

Deferrer

A Deferrer is an object that allows a call to be submitted to the main queue. Mostly the coder does not need to worry about this, because there are ways to submit a call from almost everywhere:

  • In all actor calls, there is the cx context available, which allows calls to be submitted directly to the runtime

  • Each actor reference has access to a Deferrer kept in the actor's external data, which means that it is always possible to submit a call if you have an Actor or ActorOwn reference.

However in the case that you need to submit a call from a drop handler, and that drop handler does not have any actor references available, you may need to obtain a Deferrer and store it in the struct that will be dropped.

Note that a Deferrer takes zero bytes unless either the crate feature "multi-stakker" or "inline-deferrer" is enabled. This is because if there is only ever a single Stakker instance running in the whole process (or thread), we can optimise a Deferrer to use a global variable (or thread-local). Only in the case of needing multiple Stakker instances in a single thread (a much rarer case) does the Deferrer need to be a direct reference, in which case it consumes one usize. So you don't pay the cost unless you need it.

Actor Prep state

An actor may be in one of three states: Prep, Ready and Zombie. The purpose of the Prep state is to allow time for an actor to set itself up before accepting calls. Actors are created instantaneously, but the initialisation call is made asynchronously, so for even the simplest actor, there will be some time where the actor is in the Prep state.

Since actor initialisation is asynchronous and has no time bound, the actor may do quite complex operations in the Prep state, for example attempting to make a connection to a remote server, or calling and receiving responses back from other actors. In this case remaining in the Prep state means that the actor is signalling to the runtime that it is "not yet ready" to accept normal actor calls.

Any actor calls made to it are queued until it enters the Ready state. This simplifies the logic of actors since otherwise the actor would be forced to do its own queuing of requests if it was not yet ready to service them.

To support doing asynchronous calls to other actors in the Prep state, it's possible to create Ret and Fwd instances that call back to Prep state methods instead of normal actor methods.

Signatures of actor methods

Methods that handle calls made to an actor in the Ready state have the following signature, where "...args..." indicates 0 or more additional arguments:

fn method(&mut self, cx: CX![], ...args...) {...}

A &mut self or &self argument gives access to the actor state, and cx gives access to the runtime via the Cx type. Methods that handle calls in the Prep state use the following signature:

fn method(cx: CX![], ...args...) -> Option<Self> {...}

These do not have a self argument because in the Prep state the actor does not yet have a Self value as it is not yet initialised. If the Prep method is ready to put the actor into the Ready state and start handling normal actor calls, it should return a Self value as Some(...). If it is not yet ready then it should return None and make sure that some external callback or timer is active that will guarantee that another Prep method will run in due course, to continue with the preparation of the actor, or to initialise it, or to terminate it with a failure.

Note that methods with any other signature to the ones above are not callable through the actor system. Normal Rust visibility rules apply to the methods, so if these calls need to be accessed outside of the module, they should be marked as pub, pub(super), etc as necessary.

Alternatives to the actor method signature

An actor method in Stakker has this signature:

fn method(&mut self, cx: CX![], ...args...) {...}

There are four things that need to be passed into an actor method:

  • A &mut self reference, to allow direct access to the actor's state
  • A context to allow stopping and failing the actor and getting an Actor<Self> reference
  • A reference to the runtime to allow adding timers, deferring calls and to support borrowing Share instances and so on
  • The arguments to the call

So this is handled as (&mut self, cx: CX![], ...args...), where cx gives access to both the actor's specific context Cx and by auto-deref to the runtime Core. Note that cx: CX![] is used to avoid boilerplate and expands to cx: &mut Cx<'_, Self>.

However, some alternative approaches were considered:

  • Pass just (&mut self, ...args...) and include cx in Self as self.cx. This means storing an extra 8 bytes in every actor struct, wasting memory and forcing a write to memory just for the short time that Cx is required during a call. This seems like a bad idea.

  • Require the coder to put actor methods into separate impl Prep<MyActor> {...} and impl Ready<MyActor> {...} sections, where the Ready wrapper is effectively (&mut MyActor, &mut Cx<MyActor>). If the method self argument is mut self or &mut self then it can be made to auto-deref to &mut MyActor so that the actor state is directly accessible through self as normal, and also offer access to the other functions of Cx through for example self.stop() or self.core.

    The most immediate problem with this is that Rust currently does not permit that impl when Ready and MyActor are in different crates, with the error "cannot define inherent impl for a type outside of the crate where the type is defined". I could find no workaround for this that didn't bring along its own issues.

    This approach gives shorter argument lists and conveniently separates Prep, Ready, and instance methods (making the actor API clearer), at the cost of having two impl sections, and a possible additional overhead for accessing actor state. Also it is less obvious what is happening behind the scenes, since self is overloaded for two (or three) different purposes.

  • Using procedural macros, it would be possible to write the calls any way we want, and transform them into the right form for the compiler. However Stakker intentionally avoids this kind of thing because it is not transparent, i.e. the coder can't see what is going on. Procedural macros in general can generate a huge amount of code behind the scenes without the coder realizing. Really you're no longer writing Rust in this case. So the preference is to keep things explicit and transparent, and use macros only for small regions, where they are necessary to keep things clear, and not to wrap large regions of code.

So, the cx: CX![] approach is kept because it is more explicit, low-level and unabstracted. Everything is exactly what you see: Self access is direct, and self and cx can be used independently as necessary. It's more Rust's style to make things explicit in the code.

Data kept alongside the actor internal state

Due to the borrowing approach, the actor state is split into two parts. The first part is outside the actor cell, and is accessible to any runtime call that has an actor reference:

  • Weak reference count
  • Strong reference count
  • Actor state: Prep, Ready, Zombie
  • Termination notifier Ret<StopCause>
  • A Deferrer instance

The Deferrer is required in order to support dropping the last ActorOwn. It is also used for Fwd and Ret instances calling the actor, and to support call! when only the actor is mentioned. It also means that any drop handler that has access to an actor reference also has a Deferrer available.

Note that Rc is not used to handle the weak and strong references, because we need to keep some data outside the cell that is accessible to a weak reference even if the strong reference count has gone to zero. Also we need to be able to terminate the actor even when there are still strong references.

The second part is inside the actor cell, and so is only accessible when there is a &mut Stakker reference available. So this is not accessible within calls to other actors, where the &mut Stakker reference is occupied by the borrow that enables access to the actor cell for that call. The actor cell contains:

  • For Prep: FnOnce queue to store calls attempted before the actor is Ready
  • For Ready: Self value for the actor

Overhead of an actor

With default features on a 64-bit platform, an actor requires one allocation of 48 + Self bytes for the actor, and a second allocation of 8 bytes for the termination notifier. The details follow:

The table below shows the sizes requested from malloc, so include the internal data used by the reference-counting implementation.

Features1000-byte actorOverheadNotifier0-byte actor
default104848872
no-unsafe105656880
all features1080808104

The Overhead column shows the bytes used above the actor's own Self instance size.

The Notifier column shows the bytes required for a simple Ret instance created using ret_to! to notify a parent of the termination of this actor. This is a separate allocation because it is variable-sized in general. Note that the Notifier overhead is optional, as you can use ret_nop! which avoids that allocation, but normally it will be required.

Regarding the 0-byte actor column: The actor's Self structure is in a union with a FnOnce queue, so Self structures smaller than 24 bytes still consume the minimum 24 bytes.

Design of macro argument structure

The design of the macro argument structure, e.g. for call! or fwd_to! required several attempts before the syntax felt comfortable. One aim was for the syntax within the macro call to be valid if interpreted as Rust syntax so that rustfmt would format it automatically. Another was for the structure to be reasonably intuitive to understand without having to constantly refer back to documentation. It is much too easy to end up with a list of anonymous fields to fill in, or confusing arguments that appear in some places but not others.

So to give it more structure, the destination being addressed was put in brackets, for example [cx] or [self.other_actor] or [fwd], and things were made to look like actual methods being called as far as possible.

For fwd_to!, it ends up looking something like currying, with the constant arguments given first, and the variable argument types following in a tuple after as, which is also valid Rust syntax to introduce a type, encouraging IDEs to help with type completion if they support that.

Another aspect of the macros is that a lot of tuning was done on the order of evaluation of the arguments. Whereas in plain Rust code, you'd often get a borrow-checker error due to mentioning the same variable more than once, in the macro you will often find that you can get away with it due to the macro's internal order of evaluation. All arguments are evaluated in the caller's context before the call is deferred, to keep the code intelligible.

Dropping things to clean up

Stakker maintains the Rust convention of easy clean-up by simply dropping things. If dropping something in the Stakker API doesn't clean things up correctly, then that is probably a bug.

So for example if you drop the last ActorOwn referring to an actor, then the actor will be terminated and the actor's drop handler called. Or if you drop a Ret instance, then None is sent back, which indicates that the message containing the Ret wasn't replied to. Or if you drop a Waker in another thread, the wake handler is informed and the slot released.

The intention is that if you keep to certain simple conventions, then you can rely on drop-based cleanup to take care of all problem situations. For example if you use ActorOwn links in a DAG (e.g. a tree of actors with parents and children), then when one actor fails, the whole tree of actors that it owns will also be cleaned up correctly. This also means that if anything goes wrong in an actor, then calling fail or fail_str should always be a safe way to bail out. The actor and all its children will clean up, and the parent actor will be informed of the failure.

However, if you decide to try and implement some more complicated form of inter-actor ownership that isn't a DAG, perhaps with ActorOwn loops and manual kill calls to do cleanup, then it's your responsibility to make sure that clean-up occurs correctly in all failure modes.

Another issue occurs when your actor allocates internal resources to service another actor's request, and you wish to know if that actor fails in order to release those resources. This can be solved by creating a droppable "guard" object which is passed to the associated actor for it to store. If the actor dies, then the drop handler of your guard runs, which can send a message back to your actor to clean up, via an Actor reference to your actor that it holds.

Cargo features and safety

By default Stakker uses some unsafe for efficiency. This means it uses implementations that require less memory or less CPU time. However, if you wish, you can enable the "no-unsafe" Cargo feature, and the whole crate will be compiled with #[forbid(unsafe_code)], and fully safe implementations are used instead. You lose some efficiency, but for many applications that would make little difference, so use this if minimising unsafe is important for your project.

There are other cargo features that switch in and out different underlying implementations to optimise for different cases. However the external API of the crate does not change when features are changed. The API should operate identically.

Note that since Cargo features are additive, it's necessary that when more than one crate in a build uses Stakker, the Stakker build should be compatible with all of them, i.e. provide the lowest common denominator. So this typically means using the less optimal implementation.

Note that crates should not enable Stakker cargo features unless they really need them. It should be up to the top-level application to add features as required. Note that even if the top-level application does not use Stakker directly, it is still possible to list the crate as a dependency in order to select cargo features.

Why not allow detached actors?

A detached actor would be one without any ActorOwn owner, which stays alive only so long as another actor references it using Actor, or an outstanding Ret or Fwd callback is active.

The problem is that with only weak Actor references keeping it alive, and no owner to enforce cleanup, it's too easy to create reference cycles. Maybe they can be avoided by awareness during initial implementation, but it is too easy to add a few references or callbacks after that during maintenance that create a reference cycle meaning that the actor will never be cleaned up, even outliving the Stakker runtime.

So it seems like a feature that would cause foot-guns down the line, so unless a really strongly-motivating case comes along, for now detached actors will not be allowed.

For some of the cases where you might want a detached actor, e.g. a listener spawning child actors to handle incoming connections, this is better handled using ActorOwnSlab, which will handle cleanup correctly when the parent actor terminates.

Why use actors?

If you find that your code has to accept events from more than one direction, and has to react correctly to each event based on the current state, and has to deal with a lot more variations than a "happy path, or fail", then actors provide a convenient way to manage that complexity. There is a reason why a lot of networking is based around state machines! An actor is essentially the implementation of a state machine and the transitions between those states in response to events.

In addition, dividing a larger problem into a set of asynchronously interacting actors means that each small part of the problem can be analysed and understood clearly in isolation, and tested asynchronously independently of the rest of the system.

Also, the abstraction provided by actors naturally allows interacting actors to be separated and run remotely if necessary. Since inter-actor calls are asynchronous, no actor can depend on synchronous responses, so distributed and remote operation comes naturally. The only thing required is some glue to pass the inter-actor calls over a protocol.

However for a long sequence of asynchronous operations that either advance to the next or fail, a sequential "actor coroutine" style might make the code clearer than an event-driven actor style. So adding async/await or generators on top of the actors, to provide something like a coroutine that can be driven forward by actor events is a possible future direction for Stakker, if it can be made efficient.

Stakker Guide

This section provides more practical information on coding with Stakker.

Getting used to the actor way of thinking

If you're accustomed to sequential-style coding, e.g. async/await or Go, maybe the switch to actor-based coding will seem very strange. Despite its simplicity it may take some getting used to.

Remember that in an actor system there are only two principal things to consider:

  • The actor's own state (i.e. the contents of its struct and any other data it has synchronous access to)
  • The incoming messages (aka "events" or "incoming calls")

When you write a behaviour for an actor, i.e. an actor method that handles an incoming "message" or "call", then your focus is very narrow. You consider how the actor should react to the incoming event, considering only the current state of the actor.

Then your reaction to the event will likely be one or more of the following:

  • Update the actor's state
  • Send one or more messages to other actors
  • Set or update timers
  • Call methods outside of the actor system (e.g. send I/O)

Once you have completed your handling of that event, the job is done and you return. Anything that you need to remember to do later will either have been encoded into the state of the actor, or else already put in motion as a timer or an actor call.

This is in some ways similar to some approaches to personal time-management. By keeping the focus narrow, it makes it easier to reason about the problem, one small step at a time.

So this is the essence of how an actor works within an actor system: It simply makes a sequence of small decisions as events come in, one small step at a time. However this is a surprisingly powerful way of handling complexity.

When events may be coming in from various directions, and when using a sequential model of thinking that focuses on the "happy path", it is hard to be sure that all the combinations of asynchronous error conditions and event orderings have been handled correctly. However in the actor model, each event is evaluated in the context of the actor state as it exists at the moment of the event's arrival. The "happy state" for that event has equal weight to all the "error states" in the consideration of the coder.

The other thing that an actor system allows you to do is to have multiple requests active (i.e. "in flight" or "in progress") with various other actors at the same time, if that is required. Those responses might come back quickly or slowly, and will be dealt with one by one as they arrive. Maybe other operations will be triggered as some of them come back, and complete before or after the other responses arrive. All this complexity is handled easily by just considering each incoming message one at a time in the context of the actor state. To do something similar in async/await-style code would be impossibly complicated. That level of asynchronicity is just not easy to handle in a sequential model, but it comes completely naturally in the actor model.

This is not to say that there is no value in sequential coding styles. There are "painfully sequential" problems which can't be broken down into concurrent operations, which can be written very naturally and simply in the sequential style. But when the asynchronous complexity rises, the actor model scales much better to handle it. It is no accident that many low-level network protocols are specified as state machines, and that's essentially what an actor is: The concrete implementation of a state machine.

Finding Stakker-compatible crates

Here are some considerations to help decide whether a crate is usable with Stakker:

  • Non-blocking: If it is going to be called directly from an actor, the crate must never block or sleep, because that would hold up all the work of the whole thread.

  • Data processing only: If it just does data processing when called, and doesn't require any I/O or timers or anything, then that would be straightforward to use. (For example regex crate.)

  • Event-loop independent: If it requires I/O but says it can run on top of any event loop, then that is a very good sign. Even crates that may appear dependent on a particular I/O system (mio, Tokio, etc) might still be usable if the protocol handling can be called independently. (For example tungstenite)

  • Event-loop providers: If a crate provides an event loop (or the basis for an event loop), for example SDL or mio, then Stakker can almost certainly be run on top of it, so long as the underlying crate can guarantee that Stakker is always called from the same thread.

Where a required crate can't be run on top of Stakker, for example a crate that depends on Tokio, then you can still run it in the same process by running Tokio in another thread and communicating with the Stakker thread via channels.

A possible future extension to Stakker would be to add a simple async/await executor, one that allows a subset of async/await crates to run, for example ones that can process futures or streams passed from the actor runtime, and that don't require direct I/O themselves.

Difference between Ret and Fwd

Both of these types allow specifying a callback or call-forward, usually to an actor method but possibly to some other destination. But there are some important differences:

  • Ret can be called only once, and Fwd may be called many times
  • Ret cannot be cloned, but Fwd has ref-counting and there may be many references to the same Fwd callback
  • Ret can capture "move" types
  • Ret is consumed when it is called
  • Ret notifies the callback if it is dropped without being called

Whilst the names suggest uses of returning or forwarding data, there are no restrictions about where the data is sent. So a Ret may 'return' data back to some other actor than the caller if required.

Detecting delivery failure using Ret

A Ret callback is guaranteed to always be called eventually, even if it is lost and dropped. So this means that if a call is made to an actor that terminates before the call is serviced, any Ret included in the arguments to that call will be dropped and a None response will be sent back.

So this means that any call where there needs to be an action if the message cannot be handled needs to include a Ret in its arguments, even if it is just a Ret<()> which is called with no arguments on successfully processing the call. The ret_to! macro supports this scenario.

However where it is not important to handle the case where the call is lost, the ret_some_to! macro unwraps the value and ignores the None (dropped) case.

Handling expected and unexpected actor termination

There are five ways that an actor can terminate. These are enumerated by the StopCause enum:

  • Stopped: Successful termination
  • Failed: Actor terminated itself due to some problem
  • Killed: Actor was killed by another entity
  • Dropped: The last ActorOwn referencing this actor was dropped
  • Lost: This indicates that the connection to a remote actor has been lost. Remote actors are not implemented yet.

Both Failed and Killed have an associated boxed Error. When an actor is created, a notification handler Ret<StopCause> is normally provided. This receives the reason for the actor's termination when it terminates. So usually a parent actor will keep a hold of the ActorOwn for the child actor within its own state, and will receive the termination notification. However other patterns are also possible.

On receiving a termination notification, the parent actor might choose to restart the child, or terminate itself, or take some other action. The parent actor has a free choice on how to handle it. It is possible to downcast the Error to take a different action depending on the type of failure if necessary.

Actor<dyn Trait>

If you need to have a group of different actors that all implement the same interface and that can be used interchangeably behind that standard interface, there are several options available. However Actor<dyn Trait> is not one of them, for reasons that will be explained below!

Use a trait on the actor side: Actor<Box<dyn Trait>>

There is a macro actor_of_trait! to support this. This all looks clean and minimal in the source. On the caller side, all they see is a standard-looking actor interface. However compared to a non-trait actor, this adds an extra indirection to all calls due to the Box. Here's an example:

use stakker::*;
use std::time::Instant;

// Trait definition
type Animal = Box<dyn AnimalTrait>;
trait AnimalTrait {
    fn sound(&mut self, cx: CX![Animal]);
}

struct Cat;
impl Cat {
    fn init(_: CX![Animal]) -> Option<Animal> {
        Some(Box::new(Cat))
    }
}
impl AnimalTrait for Cat {
    fn sound(&self, _: CX![Animal]) {
        println!("Miaow");
    }
}

struct Dog;
impl Dog {
    fn init(_: CX![Animal]) -> Option<Animal> {
        Some(Box::new(Dog))
    }
}
impl AnimalTrait for Dog {
    fn sound(&mut self, _: CX![Animal]) {
        println!("Woof");
    }
}

pub fn main() {
    let mut stakker = Stakker::new(Instant::now());
    let s = &mut stakker;

    let animal1 = actor_of_trait!(s, Animal, Cat::init(), ret_nop!());
    let animal2 = actor_of_trait!(s, Animal, Dog::init(), ret_nop!());

    let mut list: Vec<Actor<Animal>> = Vec::new();
    list.push(animal1.clone());
    list.push(animal2.clone());

    for a in list {
        call!([a], sound());
    }
    s.run(Instant::now(), false);
}

Use a trait on the caller side: Box<dyn Trait>

This involves wrapping the actors in a trait that forwards calls, and then boxing it to make it dynamic. So this also adds an indirection, but on the caller side. This is more verbose than doing it on the actor side, and the calls don't look like other actor calls. Here's an example:

use stakker::*;
use std::time::Instant;

// External interface of all Animals
trait Animal {
    fn sound(&self);
}

// A particular animal, wraps any actor that implements AnimalActor
struct AnAnimal<T: AnimalActor + 'static>(ActorOwn<T>);
impl<T: AnimalActor + 'static> Animal for AnAnimal<T> {
    fn sound(&self) {
        call!([self.0], sound());
    }
}

// Internal interface of animal actors
trait AnimalActor: Sized {
    fn sound(&self, cx: CX![]);
}

struct Cat;
impl Cat {
    fn init(_: CX![]) -> Option<Self> {
        Some(Self)
    }
}
impl AnimalActor for Cat {
    fn sound(&self, _: CX![]) {
        println!("Miaow");
    }
}

struct Dog;
impl Dog {
    fn init(_: CX![]) -> Option<Self> {
        Some(Self)
    }
}
impl AnimalActor for Dog {
    fn sound(&self, _: CX![]) {
        println!("Woof");
    }
}

fn main() {
    let mut stakker = Stakker::new(Instant::now());
    let s = &mut stakker;

    let animal1 = AnAnimal(actor!(s, Dog::init(), ret_nop!()));
    let animal2 = AnAnimal(actor!(s, Cat::init(), ret_nop!()));

    let mut list: Vec<Box<dyn Animal>> = Vec::new();
    list.push(Box::new(animal1)); // <- dyn coercion occurs here
    list.push(Box::new(animal2)); // <- dyn coercion occurs here

    for a in list {
        a.sound();
    }
    s.run(Instant::now(), false);
}

Use Fwd and ActorOwnAnon

Instead of using a trait, it's also possible to use Fwd to capture the entry point of an arbitrary actor, and to pass that to other actors that only care about the forwarding interface. The extra indirection is also present in this solution, since the call must pass via the Fwd handler. However this is a lot more flexible than traits.

Where you want another actor to not only have a Fwd instance but also to hold the owning reference to the actor, then you can use ActorOwnAnon. That way if that actor dies, the referenced actor dies too. This allows owning the actor without being exposed to the type. So you can keep a Vec<ActorOwnAnon> pointing to different kinds of actors for example.

Why Actor<dyn Trait> can't be supported

Rc<dyn Trait> can be done, so why isn't Actor<dyn Trait> possible?

To enable dyn Trait requires the actor runtime to be changed to use A: ?Sized, where A is the actor's Self type. Unfortunately Rust does not support ?Sized values inside an enum, apparently due to it inhibiting layout optimisations, and Stakker requires an enum to enable switching between the three actor states (Prep, Ready and Zombie). Maybe Rust could have a #[repr(unsizable)] for enums to support this one day, but it doesn't right now.

In addition CoerceUnsized is still unstable at the time of writing. This is the approved way to do the "dyn coercion" which converts an Rc<impl Trait> to an Rc<dyn Trait>. However that can be worked around, I believe. So that isn't the blocker.

Looking at alternative approaches, it seemed like implementing a custom enum in unsafe code might be possible using union, but that is also a dead end due to union only supporting Copy types on stable at present. I have an unsized_enum crate which I believe is sound and could be the basis for Actor<dyn Trait> in Stakker, but I don't want to force it on all Stakker users. I'd like to be able to offer a safe alternative as well. (Update: As of Feb-2021 'union' supports ManuallyDrop which allows ?Sized, so that might offer a better way, although it still requires unsafe.)

So unfortunately it's not possible to do Actor<dyn Trait> right now, and one of the alternatives must be used instead.

Top-level actor template

The following template may be helpful for writing big top-level actors that need to accept configuration and need to be connected up to other actors through Fwd instances.

/// `Widget` actor configuration.  Includes serde deserialization
/// support.
#[derive(Deserialize, Clone)]
#[serde(deny_unknown_fields)]
pub struct WidgetConf {
    // ... configuration values ...
}

/// `Widget` instance callbacks
pub struct WidgetFwds {
    // ... `Fwd` and `Share` values that the actor needs to talk to
    //   other actors and to access any shared resources ...
}

/// `Widget` actor
pub struct Widget {
    conf: WidgetConf,
    fwds: WidgetFwds,
    // ... actor state ...
}

impl Widget {
    /// Initialise the Widget actor
    pub fn init(cx: CX![], conf: WidgetConf, fwds: WidgetFwds) -> Option<Self> {
       // ...
       Some(Self {
           conf,
           fwds,
           // ...
       })
    }

    //... all other actor methods ...
}

Inter-thread communication with Waker

The foundation for inter-thread communication in Stakker is the Waker mechanism. This uses an atomic bitmap-tree, which means that many wakeup events from other threads will be accumulated into a single I/O event on the Stakker thread, using only a small number of atomic operations to recover them, keeping the load on the Stakker thread low.

A Waker will normally be paired with a channel or some other shared mutable state, e.g. data within a Mutex. So the Waker is used to notify a handler in the Stakker thread that it needs to examine the shared state and respond to whatever it finds there.

PipedThread is an example of this, combining a thread, two message pipes and a Waker. This provides a convenient way to handle some very simple scenarios for running heavy or blocking calls in another thread. However it should be straightforward to build other inter-thread communication methods on top of Waker.

One thing to note is that when using channels such as crossbeam, ideally we'd want to only notify the Waker when the channel is empty at the instant that the message is added. However the channel APIs typically don't give us a way to detect this condition. (Perhaps it's not even possible to detect this in some cases due to how the channel is implemented.) Attempting to detect it with an is_empty() call on the channel before adding the message is doomed to intermittent failure due to races. So this means that another thread adding something to a channel for the Stakker thread to pick up must notify the Waker on every message sent. So Waker is designed to make this as cheap as possible. After the first time, it will take only one atomic operation.

Roadmap

Currently Stakker is performing well for the applications it is being used in, so there is no urgent need for new features. However that does not mean it is "done". Things will be added gradually as new requirements are discovered and refined.

Here are some near-term plans (6 months):

  • More fully test running on top of a C++ language event loop within another application, e.g. using Stakker in a library with a C++ interface

  • Benchmark timer-queue performance, and then optimise, e.g. changing to an N-ary heap

Waiting for new Rust language features:

  • If Rust one day supports enums or unions containing a DST on one variant (like an Option<dyn Trait> or a MaybeUninit<dyn Trait>), then it would be possible to do Actor<dyn Trait> instead of Actor<Box<dyn Trait>> as at present (see actor_of_trait!).

  • If Rust had better support for working with VTables and fat pointers, then a lot of unsafe code could be eliminated from the flat FnOnceQueue. See: https://github.com/rust-lang/rfcs/pull/2580

  • If Rust supported passing borrows and lifetimes into the generator resume function, and generators were stabilized, then they could be used to implement actor coroutines. See: https://github.com/rust-lang/rust/issues/68923

Further ahead:

  • Write more varied applications with Stakker, and see if that suggests any more features that need adding. If other people start using Stakker, that may also suggest more features.

  • Maybe switch all macros over to procedural macros to allow cx to be picked up automatically from the context, to make code more concise. (macro_rules! hygiene normally forbids this, but procedural macros have different rules.)

  • Look at ways to proxy calls between actors on different threads or different machines, i.e. where a local actor acts as a proxy for a remote actor and forwards calls and responses

  • Investigate writing crates to allow Stakker to be layered on top of other runtimes, e.g. tokio or async_std.

  • Investigate writing a simple async executor on top of Stakker to interface directly to async/await style code in the same thread. However if this is to be done at all it needs to be very low-level and efficient, ideally avoiding any synchronisation, memory fences, atomic operations, mutexes and so on. Otherwise it might be better to run one of the existing executors in another thread instead, and communicate data with channels to keep the Stakker thread lean.

  • Possibly look at making use of generators or async/await to allow writing sequential-style actor coroutines within an actor. This would allow making a call to another actor and receiving the Ret directly in the code. The main difficulty is passing the Cx (with its lifetime) up into the coroutine at each resume, and having it drop at each yield. Because of the low-level nature of Stakker, this needs to be efficient to be a good fit. This will probably require a good deal of experimentation and tweaking to find the right ergonomics, and the best low-level fit. There's no sense in rushing it.

  • Support off-loading CPU-intensive or I/O work to a threadpool. If actor coroutines are implemented, then we could simply mark a block of code as offload! to move it to a threadpool, which would be very convenient.