About Stakker
Stakker is a lightweight low-level single-threaded actor runtime for Rust. Some features:
- It is designed to be layered on top of whatever event loop the user prefers to use, including ones from other languages.
- Asynchronous calls are addressed directly to individual methods within an actor, rather like Pony behaviours.
- All calls and argument types are known and statically checked at compile-time, which is very efficient and gives the optimiser a lot of scope.
- It provides a timer queue for timeouts or delayed calls, a lazy queue to allow batching recent operations, and an idle queue for running a call when nothing else is outstanding.
- Uses
unsafe
by default for added efficiency at runtime. However, enabling the "no-unsafe" cargo feature switches to a fully safe implementation, and builds with#[forbid(unsafe_code)]
to guarantee that no unsafe code is present.
Resources:
Questions not covered in the FAQ can be asked in the GitHub discussion area. Please raise bugs and other issues on the GitHub issues page.
In future if more people come to use Stakker, it would be good to have some means of notifying those people if new features are being considered, to include them in discussions. This could be a forever-open GitHub issue, or some other low-tech channel. I'm open to suggestions if someone has done this before.
Related crates:
Let me know if you have a crate you'd like added here.
Stakker FAQ
To submit a question, please raise an issue on the Stakker GitHub page.
The cargo features seem confusing. Which ones do I need?
Remember that changing the cargo features does not change the API of
the Stakker crate. Rather the features change the implementations
that back the API, to support different ways of using Stakker. If
you are only using a single Stakker instance in your app, the
default features will be fine. If you plan to run more than one
instance, then check the features documentation on the front page of
the Stakker rustdoc on docs.rs
, or else you'll get a panic.
How can I create an ActorOwn<dyn Trait>
and call it?
See the actor_of_trait!
macro for details of the best current
solution, and the Traits Design Notes for alternative
solutions and the reasons why Actor<dyn Traits>
is not possible
right now.
Stakker Design Notes
This section explains the reasoning behind some of the design choices of the Stakker crate.
Origin
The Stakker crate didn't start out as an actor library. Rather it started out as an exploration of how to deliver events to the correct contexts for them to be handled, and doing that in a way that is fully compatible with "the Rust way", i.e. that takes full advantage of Rust's approach to borrowing and ownership.
So effectively this is a low-level actor crate that evolved directly and organically from Rust's fundamental principles, from the ground up.
Addressing events to objects
The most natural way to organize objects or components in core Rust is in tree-like graphs with single ownership and no cross-references or back-references within the tree. However to manage an event system, events must be delivered to specific objects or components. So some form of references to those objects are required to support the event system.
In Rust's standard library, multiple ownership or long-lived direct
references within a heterogeneous set of objects means using Rc
.
However this also means giving up compile-time borrowing checks and
reverting to RefCell
, which does run-time checks instead.
Immediately we've lost one of Rust's most important compile-time
checks. Investigating how to regain this compile-time check resulted
in the qcell crate, which re-enables zero-cost compile-time
checks of ownership behind Rc
references.
So this means that we can now have both Rc
references and safe
mutable access to the Rc
contents without any run-time checks or
run-time cost.
Delivering events
The next problem is how to deliver events. One approach is to deliver an event to the destination as soon as it occurs, as a direct method call. However this results in nested callbacks on the stack, i.e. one event causes an event handler to be called on another object, which in responding to that event causes other events to be generated, calling event handlers in other objects, all before the original event handler has completed. The main problem with this approach is that it can lead to re-entrant calls to the same object, which in other languages can result in very hard to understand bugs.
However in Rust there is a bigger problem, because a re-entrant call
will require borrowing the same object twice. If we use RefCell
,
we'll only discover this bug if we're lucky enough to come across it
in testing. However if we use qcell instead, there is
absolutely no way to construct a program that will execute a
re-entrant call on an object. So letting Rust do compile-time
borrowing checks on data behind Rc
pays off, because it forces us
to adopt a design that has none of the problems that are easy to
produce in other languages accidentally.
So to avoid borrowing problems and re-entrant calls, event sending and
event delivery must be separated, which means that a queue is
necessary. The most fundamental queue would be one that stores a list
of FnOnce
closures to execute later. It's possible to make this
efficient by storing the closures in a flat Vec<u8>
, which means
that no allocations are required to send a message, so long as the
queue buffer has grown big enough.
This also demonstrates another pattern in Rust, that Rust's rules seem to lead to shallow call-stacks. This is because when a borrow is active, that often restricts access to other things. To get access to those things again means dropping back down the stack again. Also when borrows are passed deeper and deeper into the code, they seem to become more and more invasive and restrictive, as you end up having to annotate more and more functions and structures with lifetimes.
Using a FnOnce
queue to defer operations untangles all of this and
means that each operation is run directly from the main loop with the
minimum restrictions from the borrow checker.
Becoming an actor system
So at this point we have:
-
Components which can be addressed from anywhere, with ref-counting references to them
-
No access to component state and no possibility to make direct calls to a component from another component, all guaranteed by the borrow checker
-
A queue that allows calls to methods on other components to be deferred to run later
This is so close to an actor system, that we might as well formalize it as one to make it easier to reason about.
However, compared to other actor systems, there are no per-actor
message queues, and the 'messages' are actually FnOnce
closures
which call a method directly on the actor, rather than some arbitrary
message structure that needs to be interpreted. So it is much closer
to actors as provided by the Pony language, rather than some classical
actor system where all the messages are visible and dealt with by
hand.
Effectively each actor method on a Stakker actor takes the role of a message type, and the arguments of that method take the role of the data of that message.
Actor guarantees
Stakker is not a 100% strict actor system. Due to Rust's interior
mutability (e.g. Rc<Cell>
, Rc<RefCell>
, Arc<Mutex>
, etc), it is
hard to totally forbid shared mutable state between actors. Also
there are cases when shared mutable state may be useful for
performance reasons. So Stakker accepts the existence of shared
mutable state and also offers the Share
type which provides
zero-cost compile-time-borrow-checked shared mutable data between
actors.
Sharing mutable state breaks the actor model and so must be used with care (if used at all), as it might complicate the coder's ability to reason cleanly about their code. The danger is about the same as using IPC shared memory between processes, or shared buffers between threads. However there are no locking concerns as Rust's borrow checker ensures that the coder has exclusive access whilst they hold a mutable reference to shared state.
The external interface of a group of actors bound together by shared mutable data still follows the actor model. So whilst the actor model is broken within the group, the larger system is still easy to reason about. It's only necessary to take care about interactions within that group.
So this means that it's possible to break up a single actor into many smaller actors where it is cleaner in the code to receive events directly to those actors, and yet they may still work together using shared mutable state if necessary for efficiency reasons.
Remember that the purpose of actors in Stakker is only to provide a context that is capable of receiving events. You can create complex trees and tables of logical components within a single actor, and that is fine. Not everything needs to be an actor. However as soon as an individual component needs to receive and handle events itself (including timers), it's likely that the code will be simplified by splitting it out as a separate actor. The cost of doing so in Stakker is small.
Making Stakker an actor system by default means that most of the code is easy to reason about locally, but those parts that need some extra performance through shared mutable state can still be written efficiently.
So, in Stakker:
-
Rust's guarantees always apply
-
The actor-model guarantees always apply regarding forbidding access to another actor's state or methods. (Not even interior mutability can break this.)
-
But the actor-model assumption of no shared mutable state only applies by default, until the coder intentionally passes sharable data between two actors.
Privilege levels
Another way of looking at the statically-checked cell borrowing approach used by Stakker (i.e. qcell-based borrowing) is to consider all the code associated with the actor system as belonging to one of three privilege levels.
Level 0: Code in the highest level of privilege has access to a
&mut Stakker
. This is the code outside of the actor system (the
main loop and any associated external code that the actor system
interfaces to) and the code called by the main loop that handles a
queue-deferred action (actor call, return, forward, etc). In terms of
statically-checked borrowing, both the Actor
-owner and
Share
-owner are available.
Level 1: Actor methods all run in this level. They have a &mut Core
(maybe via a &mut Cx
), but have no access to Stakker
methods. In terms of statically-checked borrowing, the
Actor
-owner is unavailable (because it was used to get access to
the actor state), but the Share
-owner is still free.
Level 2: Methods called on Share
objects run in this level, as
well as Drop
handlers. In fact any code which doesn't accept a
&mut Core
or &mut Cx
argument runs in this level. Neither of the
cell owners are available. In the case of a method call on a
Share
object (e.g. share.rw(cx).method(args)
), the &mut Core
isn't available because it was used to get access to the share.
If you take a snapshot of the callstack at any point in time you'd
find code running in one or more of these levels. At the base of the
stack would be the main loop code, in level 0. If an actor method is
running then this would be code in level 1. If the actor is calling
out to Share
code or arbitrary external libraries, this would be
code in level 2. The levels form bands on the callstack.
Here are some examples of different things you can do at different levels:
Level: | 0: main loop | 1: actor methods | 2: share methods, drop handlers |
---|---|---|---|
Available borrow: | &mut Stakker | &mut Core or&mut Cx | none |
Run the queues | Yes | - | - |
Run an actor call | Yes | - | - |
query! an actor | Yes | - | - |
lazy! , idle! | Yes | Yes | - |
after! , at! , etc | Yes | Yes | - |
Access a Share | Yes | Yes | - |
ret! data to Ret | Yes | Yes | Yes |
fwd! data to Fwd | Yes | Yes | Yes |
call! an actor | Yes | Yes | Yes |
defer using Deferrer | Yes | Yes | Yes |
Remember that these levels are statically enforced by the compiler (without any runtime overhead), so there is no way around it in safe code. However note that even in level 2, you can defer an operation, or forward data elsewhere. So the system of "privileges" only stops you doing things right now. It doesn't stop you doing them a little bit later. So it blocks synchronous operations when those could potentially cause issues, but doesn't stop you doing those same things asynchronously.
Also note that even in the internal code of Stakker it's
impossible to break these rules. The rules are enforced by the Rust
compiler and a tiny bit of code in qcell. If a borrow is
performed to get access to an actor's state, then the &mut Stakker
borrow is locked up until that actor borrow is released. Similarly a
share borrow locks up the &mut Cx
or &mut Core
. See qcell
documentation for more details. So this provides a very strong
guarantee of correctness.
Low-level, not medium- or high-level
The aim is that the cost of a deferred inter-actor call should be on the order of a normal direct inter-object call. Obviously we can never as low as a direct call, but we want to get as close as possible. This means that Stakker is not aiming to replace multi-threaded asynchronous runtimes, multi-threaded asynchronous message systems, inter-thread channels, distributed systems, or anything else like that.
Rather Stakker's aim is to replace a mess of Rc<RefCell<...>>
and a tangle of direct and indirect inter-object calls (or any other
improvised collection of communicating components in the same thread)
with a nice ordered, well-behaved set of actors, easy to reason about
and maintain.
Higher-level inter-thread load balancing, work distribution and message passing can be layered on top of Stakker as necessary.
Fast message execution
When an actor call is deferred to the FnOnce
queue, for example
with the call!
macro, the type and target method and any constant
arguments are fully known to the Rust compiler. The only variables
are the target actor's address and the remaining arguments. Rust can
inline and optimise this closure, effectively specializing it to the
type and method and the constant arguments provided. So this means
that the closure might never even branch to the actor's code, since
that might all have been inlined and optimised down. So the queue can
execute much faster than any kind of traditional actor messaging
system. The call effectively bypasses all the message creation and
interpretation, and directly calls (or inlines) the actual actor code
that needs to be run.
Similarly, where an arbitrary callback is required (using Fwd
or
Ret
), for example where the type of the target actor is not known
to the calling actor, this is handled as two closures. The first
closure accepts the arguments in the Fwd
signature, and pushes the
second closure to the FnOnce
queue. The second closure is just
like a normal call!
, so can be fully optimised down to a specific
type and method. The first closure is just some glue that assembles
the variable arguments ready for the second closure. So again, this
is as direct as you can get, with no superfluous activity.
So again, aligning with Rust's strengths and making full use of Rust's compile-time checks and compile-time knowledge pays off.
Queue execution behaviour
The normal pattern is to introduce one or more events into the system as actor calls pushed onto the main queue, and then to run the queue to completion. This means that the queue is run repeatedly until it is empty. You can imagine that a queued call to the actor may trigger other calls as a consequence, and those might cause other calls, but eventually all the necessary changes will finish propagating through the system of actors, and then things will go quiet again.
Stakker always runs the main queue to completion before doing anything else. This means that if you wish to avoid saturating the CPU, it's necessary to regulate fetching input data or accepting input events. Load regulation does not occur within the actor system.
To aid with this, two additional queues are provided:
-
The first is the "lazy" queue. This is run when the main queue is fully exhausted, but before checking for new I/O events. As an example, it can be used to help batch output together into a single big flush. For example when writing to TCP, executing the main queue might mean several chunks of data being written to a stream's output buffer. Flushing after each write would be inefficient, and flushing after a delay would be too slow. Instead an actor can put the flush call on the lazy queue, and do one big flush once the current batch of processing is complete.
-
The second is the "idle" queue. This is executed only when the thread becomes idle, i.e. the main and lazy queues are empty and there are no new external events that need handling. This may be used to apply back-pressure on an incoming stream, by fetching more data only when there is nothing else to do. When there are several items on the idle queue, they will execute in a round-robin fashion, assuming each call pushes a new call back onto the queue when it executes.
Time virtualization
Actors should use cx.now()
to get the current Instant
. The
current time is provided to Stakker by the external code that is
running the event loop.
This has several consequences. For one, a batch of processing will
all occur at the same logical time. Another is that the overhead of
constantly calling Instant::now()
throughout the code is avoided.
(Instant::now()
uses a Mutex
on some platforms.) In addition,
when it's necessary to integrate Stakker into non-Rust code, the
current time from that code can be used instead of Rust's idea of
time.
However a more interesting aspect of this is that it allows time to be virtualized. So you can make time appear to go faster or slower than realtime. You can skip over long sleeps when testing your application, or trigger timeouts when testing an individual actor, without consuming real time to do so. If you have a suitable external tool communicating with your main loop, you can coordinate a group of processes to skip over common sleeps to accelerate any testing that isn't CPU-bound. The Stakker runtime and the application won't know the difference.
Event source independence
The core actor runtime crate of Stakker has no dependency on any event system. An interface to mio is provided as the stakker_mio crate, but it should be possible to integrate Stakker into any event system, even ones written in other languages.
All that Stakker requires is that the external event system provides it with one timeout (to wake it when the next Stakker timer expires), and that it delivers events to it as calls pushed onto the Stakker queue.
Maximised single-threaded performance
Each Stakker instance is oriented around running as fast as
possible within its own thread, avoiding any synchronization
completely unless specifically requested. So a single Stakker
instance and all of its associated actors are intentionally limited to
running on a single thread. Unless something uses a Waker
, there
should be no code in Stakker that will cause execution of any kind
of a CPU memory fence or other synchronization primitive. So the CPU
core can run at full speed.
This means that inter-actor calls can be very fast. With the default
features, making or executing an actor call does not require a memory
allocation. A closure is added to the end of a queue stored in a flat
preallocated buffer, and is then later executed straight out of that
buffer. The average call cost varies by load, but overall it is
roughly similar to calling things through Rc<RefCell>
, once code has
been added to handle reentrant calls in a Rc<RefCell>
-based
solution.
Some example deployment scenarios:
-
When the workload can be comfortably handled by a single core, Stakker works more efficiently than a multi-threaded runtime because there are no synchronization costs. This fits the traditional efficient "main thread" select/epoll non-blocking I/O model used by many internet servers and GUI applications.
-
When the event-driven workload requires more than one core, several Stakker instances (each with their associated actors) may be run, on different cores and/or on different machines. Workload must then be distributed between them at a higher level using sharding or some other form of load balancing.
-
In both cases when there is heavy processing to do, then that processing could be offloaded to a threadpool. In this case the synchronization costs are small compared to the saving in processing costs in the event-driven thread, and typically there will be no contention with other threads as the background processing runs.
Essentially if you're going to scale beyond one machine, you're going to have to solve the problem of work distribution between processing units anyway. So you might as well make your processing units run as fast as possible, which means running each of them on a single thread. Using a certain amount of synchronization between a group of independently-running threads is fine, for example to share common immutable data, but it's best to think hard before introducing shared mutable state between threads, or too much locking, or too much inter-thread messaging. By staying low-level, Stakker forces you to think about that. What is easy is fast, and what is slower takes more effort.
Stakker's approach won't suit all applications, but that is fine. There are other crates to handle the other scenarios. Stakker concentrates on being efficient within its own niche.
Why single-threaded?
Consider the levels of locality that different actor systems operate over, along with the different restrictions on message contents for each. (Note that if actors are intended to migrate, then the restrictions apply to actor state as well.)
Here are some levels with example types to illustrate this:
Locality | Message | Actor ref | Pass data by ref | Share mutably |
---|---|---|---|---|
Thread | 'static | Rc | Box or Vec | Rc<Cell> or Share |
Process | Send | Arc | Box or Vec + Send | Arc<Mutex> |
Distributed | serializable | ID | n/a | n/a |
(Note that 'static
means "no active borrows to stuff that might go
away", which is just Rust's borrow checker making sure you don't crash
your process.)
If your whole actor system is going to have a uniform interface, you need to pick your level, commit to it and optimise for it. Doing half of one and half of another might give some of the benefits of both, but it also brings the restrictions of both. This results in implementations having to take "opinionated" positions.
For a multi-threaded actor runtime, you pay the cost of
synchronization and then must hope that maxing out as many cores as
possible makes up for those costs in the application area of interest.
For a distributed actor runtime, you accept the limitations and costs
of everything being serializable in order to gain the benefit of
distributed execution. However if we choose to limit ourselves to a
single thread, then we can avoid those restrictions entirely and stay
low-level and fast, free to use non-Send
types and so on.
Stakker's design decisions means it naturally fits the fully-committed single-thread approach, which it takes full advantage of. Allowing seamless migration of actors between threads and redirection of queued calls to other threads would mean abandoning a lot of the single-thread performance and distinctive features of Stakker. So for now Stakker concentrates on the low-level goal of fast operation within a single thread.
Maybe in the future it might be possible to add some kind of a distributed or inter-thread message-passing layer above existing Stakker actors. There are some questions, though:
- What kind of application scenario are we targetting?
- Should we enable actors to migrate? To other threads? To other machines?
- Should there be proxy actors to redirect calls, or some kind of direct message-sending mechanism?
- How can we protect local same-thread calls from these overheads?
- How should we represent references to actors that are on another thread or another machine? With the existing actor-reference types and proxies, or with new types?
There's not just the question of whether it can be done (which obviously it can), but whether it can be done efficiently, and whether the ergonomics can be made natural and comfortable for the coder.
This might be an interesting thing to investigate at some point.
Timer queue
The timer queue contains a list of FnOnce
calls to execute at
specific instants. What we need from this queue is:
- To give us the next expiry time
- To execute items from the front of the queue when time advances
- To add and delete items
A BinaryHeap
might be good for this queue, except for the problem
of deletions being O(N). So instead for the moment a BTreeMap
is
used. The map is partitioned to split off the items to execute when
time advances. This should scale much better than a binary heap,
especially considering deletions. However a BTreeMap
generates a
lot of code, so it is likely a combined N-ary heap tuned to cache line
size and Vec
to support deletion might perform better. So the
underlying implementation will probably change at some point, once
various scenarios have been benchmarked.
Apart from fixed timers, there are also "max" and "min" timers, that can be adjusted very cheaply (just a memory write), without having to add and delete timers from the timer queue most of the time.
For fixed timers, the BTreeMap
maps a 64-bit key (32-bit wrapping
expiry time + 31-bit unique value) to a boxed FnOnce
. For min/max
timers, the 64-bit key in the map consists of a 32-bit provisional
expiry time and a 31-bit slot number. The actual current expiry time
which is updated by the calls is kept in a separate array. When the
timer expires, the current expiry time is checked, and another timer
added back in if necessary.
Deferrer
A Deferrer
is an object that allows a call to be submitted to the
main queue. Mostly the coder does not need to worry about this,
because there are ways to submit a call from almost everywhere:
-
In all actor calls, there is the
cx
context available, which allows calls to be submitted directly to the runtime -
Each actor reference has access to a
Deferrer
kept in the actor's external data, which means that it is always possible to submit a call if you have anActor
orActorOwn
reference.
However in the case that you need to submit a call from a drop
handler, and that drop handler does not have any actor references
available, you may need to obtain a Deferrer
and store it in the
struct that will be dropped.
Note that a Deferrer
takes zero bytes unless either the crate
feature "multi-stakker" or "inline-deferrer" is enabled. This is
because if there is only ever a single Stakker
instance running in
the whole process (or thread), we can optimise a Deferrer
to use a
global variable (or thread-local). Only in the case of needing
multiple Stakker
instances in a single thread (a much rarer case)
does the Deferrer
need to be a direct reference, in which case it
consumes one usize
. So you don't pay the cost unless you need it.
Actor Prep state
An actor may be in one of three states: Prep, Ready and Zombie. The purpose of the Prep state is to allow time for an actor to set itself up before accepting calls. Actors are created instantaneously, but the initialisation call is made asynchronously, so for even the simplest actor, there will be some time where the actor is in the Prep state.
Since actor initialisation is asynchronous and has no time bound, the actor may do quite complex operations in the Prep state, for example attempting to make a connection to a remote server, or calling and receiving responses back from other actors. In this case remaining in the Prep state means that the actor is signalling to the runtime that it is "not yet ready" to accept normal actor calls.
Any actor calls made to it are queued until it enters the Ready state. This simplifies the logic of actors since otherwise the actor would be forced to do its own queuing of requests if it was not yet ready to service them.
To support doing asynchronous calls to other actors in the Prep
state, it's possible to create Ret
and Fwd
instances that call
back to Prep state methods instead of normal actor methods.
Signatures of actor methods
Methods that handle calls made to an actor in the Ready state have the following signature, where "...args..." indicates 0 or more additional arguments:
fn method(&mut self, cx: CX![], ...args...) {...}
A &mut self
or &self
argument gives access to the actor state, and
cx
gives access to the runtime via the Cx
type. Methods that
handle calls in the Prep state use the following signature:
fn method(cx: CX![], ...args...) -> Option<Self> {...}
These do not have a self
argument because in the Prep state the
actor does not yet have a Self
value as it is not yet initialised.
If the Prep method is ready to put the actor into the Ready
state and start handling normal actor calls, it should return a Self
value as Some(...)
. If it is not yet ready then it should return
None
and make sure that some external callback or timer is active
that will guarantee that another Prep method will run in due
course, to continue with the preparation of the actor, or to
initialise it, or to terminate it with a failure.
Note that methods with any other signature to the ones above are not
callable through the actor system. Normal Rust visibility rules apply
to the methods, so if these calls need to be accessed outside of the
module, they should be marked as pub
, pub(super)
, etc as
necessary.
Alternatives to the actor method signature
An actor method in Stakker has this signature:
fn method(&mut self, cx: CX![], ...args...) {...}
There are four things that need to be passed into an actor method:
- A
&mut self
reference, to allow direct access to the actor's state - A context to allow stopping and failing the actor and getting an
Actor<Self>
reference - A reference to the runtime to allow adding timers, deferring calls
and to support borrowing
Share
instances and so on - The arguments to the call
So this is handled as (&mut self, cx: CX![], ...args...)
, where cx
gives access to both the actor's specific context Cx
and by
auto-deref to the runtime Core
. Note that cx: CX![]
is used to
avoid boilerplate and expands to cx: &mut Cx<'_, Self>
.
However, some alternative approaches were considered:
-
Pass just
(&mut self, ...args...)
and includecx
inSelf
asself.cx
. This means storing an extra 8 bytes in every actor struct, wasting memory and forcing a write to memory just for the short time thatCx
is required during a call. This seems like a bad idea. -
Require the coder to put actor methods into separate
impl Prep<MyActor> {...}
andimpl Ready<MyActor> {...}
sections, where theReady
wrapper is effectively(&mut MyActor, &mut Cx<MyActor>)
. If the method self argument ismut self
or&mut self
then it can be made to auto-deref to&mut MyActor
so that the actor state is directly accessible throughself
as normal, and also offer access to the other functions ofCx
through for exampleself.stop()
orself.core
.The most immediate problem with this is that Rust currently does not permit that
impl
whenReady
andMyActor
are in different crates, with the error "cannot define inherentimpl
for a type outside of the crate where the type is defined". I could find no workaround for this that didn't bring along its own issues.This approach gives shorter argument lists and conveniently separates Prep, Ready, and instance methods (making the actor API clearer), at the cost of having two
impl
sections, and a possible additional overhead for accessing actor state. Also it is less obvious what is happening behind the scenes, sinceself
is overloaded for two (or three) different purposes. -
Using procedural macros, it would be possible to write the calls any way we want, and transform them into the right form for the compiler. However Stakker intentionally avoids this kind of thing because it is not transparent, i.e. the coder can't see what is going on. Procedural macros in general can generate a huge amount of code behind the scenes without the coder realizing. Really you're no longer writing Rust in this case. So the preference is to keep things explicit and transparent, and use macros only for small regions, where they are necessary to keep things clear, and not to wrap large regions of code.
So, the cx: CX![]
approach is kept because it is more explicit,
low-level and unabstracted. Everything is exactly what you see:
Self
access is direct, and self
and cx
can be used independently
as necessary. It's more Rust's style to make things explicit in the
code.
Data kept alongside the actor internal state
Due to the borrowing approach, the actor state is split into two parts. The first part is outside the actor cell, and is accessible to any runtime call that has an actor reference:
- Weak reference count
- Strong reference count
- Actor state: Prep, Ready, Zombie
- Termination notifier
Ret<StopCause>
- A
Deferrer
instance
The Deferrer
is required in order to support dropping the last
ActorOwn
. It is also used for Fwd
and Ret
instances
calling the actor, and to support call!
when only the actor is
mentioned. It also means that any drop handler that has access to an
actor reference also has a Deferrer
available.
Note that Rc
is not used to handle the weak and strong references,
because we need to keep some data outside the cell that is accessible
to a weak reference even if the strong reference count has gone to
zero. Also we need to be able to terminate the actor even when there
are still strong references.
The second part is inside the actor cell, and so is only accessible
when there is a &mut Stakker
reference available. So this is not
accessible within calls to other actors, where the &mut Stakker
reference is occupied by the borrow that enables access to the actor
cell for that call. The actor cell contains:
- For Prep:
FnOnce
queue to store calls attempted before the actor is Ready - For Ready:
Self
value for the actor
Overhead of an actor
With default features on a 64-bit platform, an actor requires one
allocation of 48 + Self
bytes for the actor, and a second allocation
of 8 bytes for the termination notifier. The details follow:
The table below shows the sizes requested from malloc
, so include
the internal data used by the reference-counting implementation.
Features | 1000-byte actor | Overhead | Notifier | 0-byte actor |
---|---|---|---|---|
default | 1048 | 48 | 8 | 72 |
no-unsafe | 1056 | 56 | 8 | 80 |
all features | 1080 | 80 | 8 | 104 |
The Overhead column shows the bytes used above the actor's own
Self
instance size.
The Notifier column shows the bytes required for a simple Ret
instance created using ret_to!
to notify a parent of the
termination of this actor. This is a separate allocation because it
is variable-sized in general. Note that the Notifier overhead is
optional, as you can use ret_nop!
which avoids that allocation,
but normally it will be required.
Regarding the 0-byte actor column: The actor's Self
structure is in
a union with a FnOnce
queue, so Self
structures smaller than 24
bytes still consume the minimum 24 bytes.
Design of macro argument structure
The design of the macro argument structure, e.g. for call!
or
fwd_to!
required several attempts before the syntax felt
comfortable. One aim was for the syntax within the macro call to be
valid if interpreted as Rust syntax so that rustfmt
would format it
automatically. Another was for the structure to be reasonably
intuitive to understand without having to constantly refer back to
documentation. It is much too easy to end up with a list of anonymous
fields to fill in, or confusing arguments that appear in some places
but not others.
So to give it more structure, the destination being addressed was put
in brackets, for example [cx]
or [self.other_actor]
or [fwd]
,
and things were made to look like actual methods being called as far
as possible.
For fwd_to!
, it ends up looking something like currying, with the
constant arguments given first, and the variable argument types
following in a tuple after as
, which is also valid Rust syntax to
introduce a type, encouraging IDEs to help with type completion if
they support that.
Another aspect of the macros is that a lot of tuning was done on the order of evaluation of the arguments. Whereas in plain Rust code, you'd often get a borrow-checker error due to mentioning the same variable more than once, in the macro you will often find that you can get away with it due to the macro's internal order of evaluation. All arguments are evaluated in the caller's context before the call is deferred, to keep the code intelligible.
Dropping things to clean up
Stakker maintains the Rust convention of easy clean-up by simply dropping things. If dropping something in the Stakker API doesn't clean things up correctly, then that is probably a bug.
So for example if you drop the last ActorOwn
referring to an
actor, then the actor will be terminated and the actor's drop handler
called. Or if you drop a Ret
instance, then None
is sent back,
which indicates that the message containing the Ret
wasn't replied
to. Or if you drop a Waker
in another thread, the wake handler is
informed and the slot released.
The intention is that if you keep to certain simple conventions, then
you can rely on drop-based cleanup to take care of all problem
situations. For example if you use ActorOwn
links in a DAG
(e.g. a tree of actors with parents and children), then when one actor
fails, the whole tree of actors that it owns will also be cleaned up
correctly. This also means that if anything goes wrong in an actor,
then calling fail
or fail_str
should always be a safe way to
bail out. The actor and all its children will clean up, and the parent
actor will be informed of the failure.
However, if you decide to try and implement some more complicated form
of inter-actor ownership that isn't a DAG, perhaps with ActorOwn
loops and manual kill
calls to do cleanup, then it's your
responsibility to make sure that clean-up occurs correctly in all
failure modes.
Another issue occurs when your actor allocates internal resources to
service another actor's request, and you wish to know if that actor
fails in order to release those resources. This can be solved by
creating a droppable "guard" object which is passed to the associated
actor for it to store. If the actor dies, then the drop handler of
your guard runs, which can send a message back to your actor to clean
up, via an Actor
reference to your actor that it holds.
Cargo features and safety
By default Stakker uses some unsafe
for efficiency. This means
it uses implementations that require less memory or less CPU time.
However, if you wish, you can enable the "no-unsafe" Cargo feature,
and the whole crate will be compiled with #[forbid(unsafe_code)]
,
and fully safe implementations are used instead. You lose some
efficiency, but for many applications that would make little
difference, so use this if minimising unsafe
is important for your
project.
There are other cargo features that switch in and out different underlying implementations to optimise for different cases. However the external API of the crate does not change when features are changed. The API should operate identically.
Note that since Cargo features are additive, it's necessary that when more than one crate in a build uses Stakker, the Stakker build should be compatible with all of them, i.e. provide the lowest common denominator. So this typically means using the less optimal implementation.
Note that crates should not enable Stakker cargo features unless they really need them. It should be up to the top-level application to add features as required. Note that even if the top-level application does not use Stakker directly, it is still possible to list the crate as a dependency in order to select cargo features.
Why not allow detached actors?
A detached actor would be one without any ActorOwn
owner, which
stays alive only so long as another actor references it using
Actor
, or an outstanding Ret
or Fwd
callback is active.
The problem is that with only weak Actor
references keeping it
alive, and no owner to enforce cleanup, it's too easy to create
reference cycles. Maybe they can be avoided by awareness during
initial implementation, but it is too easy to add a few references or
callbacks after that during maintenance that create a reference cycle
meaning that the actor will never be cleaned up, even outliving the
Stakker runtime.
So it seems like a feature that would cause foot-guns down the line, so unless a really strongly-motivating case comes along, for now detached actors will not be allowed.
For some of the cases where you might want a detached actor, e.g. a
listener spawning child actors to handle incoming connections, this is
better handled using ActorOwnSlab
, which will handle cleanup
correctly when the parent actor terminates.
Why use actors?
If you find that your code has to accept events from more than one direction, and has to react correctly to each event based on the current state, and has to deal with a lot more variations than a "happy path, or fail", then actors provide a convenient way to manage that complexity. There is a reason why a lot of networking is based around state machines! An actor is essentially the implementation of a state machine and the transitions between those states in response to events.
In addition, dividing a larger problem into a set of asynchronously interacting actors means that each small part of the problem can be analysed and understood clearly in isolation, and tested asynchronously independently of the rest of the system.
Also, the abstraction provided by actors naturally allows interacting actors to be separated and run remotely if necessary. Since inter-actor calls are asynchronous, no actor can depend on synchronous responses, so distributed and remote operation comes naturally. The only thing required is some glue to pass the inter-actor calls over a protocol.
However for a long sequence of asynchronous operations that either advance to the next or fail, a sequential "actor coroutine" style might make the code clearer than an event-driven actor style. So adding async/await or generators on top of the actors, to provide something like a coroutine that can be driven forward by actor events is a possible future direction for Stakker, if it can be made efficient.
Stakker Guide
This section provides more practical information on coding with Stakker.
Getting used to the actor way of thinking
If you're accustomed to sequential-style coding, e.g. async/await or Go, maybe the switch to actor-based coding will seem very strange. Despite its simplicity it may take some getting used to.
Remember that in an actor system there are only two principal things to consider:
- The actor's own state (i.e. the contents of its
struct
and any other data it has synchronous access to) - The incoming messages (aka "events" or "incoming calls")
When you write a behaviour for an actor, i.e. an actor method that handles an incoming "message" or "call", then your focus is very narrow. You consider how the actor should react to the incoming event, considering only the current state of the actor.
Then your reaction to the event will likely be one or more of the following:
- Update the actor's state
- Send one or more messages to other actors
- Set or update timers
- Call methods outside of the actor system (e.g. send I/O)
Once you have completed your handling of that event, the job is done and you return. Anything that you need to remember to do later will either have been encoded into the state of the actor, or else already put in motion as a timer or an actor call.
This is in some ways similar to some approaches to personal time-management. By keeping the focus narrow, it makes it easier to reason about the problem, one small step at a time.
So this is the essence of how an actor works within an actor system: It simply makes a sequence of small decisions as events come in, one small step at a time. However this is a surprisingly powerful way of handling complexity.
When events may be coming in from various directions, and when using a sequential model of thinking that focuses on the "happy path", it is hard to be sure that all the combinations of asynchronous error conditions and event orderings have been handled correctly. However in the actor model, each event is evaluated in the context of the actor state as it exists at the moment of the event's arrival. The "happy state" for that event has equal weight to all the "error states" in the consideration of the coder.
The other thing that an actor system allows you to do is to have multiple requests active (i.e. "in flight" or "in progress") with various other actors at the same time, if that is required. Those responses might come back quickly or slowly, and will be dealt with one by one as they arrive. Maybe other operations will be triggered as some of them come back, and complete before or after the other responses arrive. All this complexity is handled easily by just considering each incoming message one at a time in the context of the actor state. To do something similar in async/await-style code would be impossibly complicated. That level of asynchronicity is just not easy to handle in a sequential model, but it comes completely naturally in the actor model.
This is not to say that there is no value in sequential coding styles. There are "painfully sequential" problems which can't be broken down into concurrent operations, which can be written very naturally and simply in the sequential style. But when the asynchronous complexity rises, the actor model scales much better to handle it. It is no accident that many low-level network protocols are specified as state machines, and that's essentially what an actor is: The concrete implementation of a state machine.
Finding Stakker-compatible crates
Here are some considerations to help decide whether a crate is usable with Stakker:
-
Non-blocking: If it is going to be called directly from an actor, the crate must never block or sleep, because that would hold up all the work of the whole thread.
-
Data processing only: If it just does data processing when called, and doesn't require any I/O or timers or anything, then that would be straightforward to use. (For example regex crate.)
-
Event-loop independent: If it requires I/O but says it can run on top of any event loop, then that is a very good sign. Even crates that may appear dependent on a particular I/O system (
mio
, Tokio, etc) might still be usable if the protocol handling can be called independently. (For example tungstenite) -
Event-loop providers: If a crate provides an event loop (or the basis for an event loop), for example SDL or
mio
, then Stakker can almost certainly be run on top of it, so long as the underlying crate can guarantee that Stakker is always called from the same thread.
Where a required crate can't be run on top of Stakker, for example a crate that depends on Tokio, then you can still run it in the same process by running Tokio in another thread and communicating with the Stakker thread via channels.
A possible future extension to Stakker would be to add a simple async/await executor, one that allows a subset of async/await crates to run, for example ones that can process futures or streams passed from the actor runtime, and that don't require direct I/O themselves.
Difference between Ret
and Fwd
Both of these types allow specifying a callback or call-forward, usually to an actor method but possibly to some other destination. But there are some important differences:
Ret
can be called only once, andFwd
may be called many timesRet
cannot be cloned, butFwd
has ref-counting and there may be many references to the sameFwd
callbackRet
can capture "move" typesRet
is consumed when it is calledRet
notifies the callback if it is dropped without being called
Whilst the names suggest uses of returning or forwarding data, there
are no restrictions about where the data is sent. So a Ret
may
'return' data back to some other actor than the caller if required.
Detecting delivery failure using Ret
A Ret
callback is guaranteed to always be called eventually, even if
it is lost and dropped. So this means that if a call is made to an
actor that terminates before the call is serviced, any Ret
included
in the arguments to that call will be dropped and a None
response
will be sent back.
So this means that any call where there needs to be an action if the
message cannot be handled needs to include a Ret
in its arguments,
even if it is just a Ret<()>
which is called with no arguments on
successfully processing the call. The ret_to!
macro supports this
scenario.
However where it is not important to handle the case where the call is
lost, the ret_some_to!
macro unwraps the value and ignores the
None
(dropped) case.
Handling expected and unexpected actor termination
There are five ways that an actor can terminate. These are
enumerated by the StopCause
enum:
Stopped
: Successful terminationFailed
: Actor terminated itself due to some problemKilled
: Actor was killed by another entityDropped
: The lastActorOwn
referencing this actor was droppedLost
: This indicates that the connection to a remote actor has been lost. Remote actors are not implemented yet.
Both Failed
and Killed
have an associated boxed Error
. When
an actor is created, a notification handler Ret<StopCause>
is
normally provided. This receives the reason for the actor's
termination when it terminates. So usually a parent actor will keep a
hold of the ActorOwn
for the child actor within its own state, and
will receive the termination notification. However other patterns are
also possible.
On receiving a termination notification, the parent actor might choose
to restart the child, or terminate itself, or take some other action.
The parent actor has a free choice on how to handle it. It is
possible to downcast the Error
to take a different action
depending on the type of failure if necessary.
Actor<dyn Trait>
If you need to have a group of different actors that all implement the
same interface and that can be used interchangeably behind that
standard interface, there are several options available. However
Actor<dyn Trait>
is not one of them, for reasons that will be
explained below!
Use a trait on the actor side: Actor<Box<dyn Trait>>
There is a macro actor_of_trait!
to support this. This all looks
clean and minimal in the source. On the caller side, all they see is
a standard-looking actor interface. However compared to a non-trait
actor, this adds an extra indirection to all calls due to the Box
.
Here's an example:
use stakker::*;
use std::time::Instant;
// Trait definition
type Animal = Box<dyn AnimalTrait>;
trait AnimalTrait {
fn sound(&mut self, cx: CX![Animal]);
}
struct Cat;
impl Cat {
fn init(_: CX![Animal]) -> Option<Animal> {
Some(Box::new(Cat))
}
}
impl AnimalTrait for Cat {
fn sound(&self, _: CX![Animal]) {
println!("Miaow");
}
}
struct Dog;
impl Dog {
fn init(_: CX![Animal]) -> Option<Animal> {
Some(Box::new(Dog))
}
}
impl AnimalTrait for Dog {
fn sound(&mut self, _: CX![Animal]) {
println!("Woof");
}
}
pub fn main() {
let mut stakker = Stakker::new(Instant::now());
let s = &mut stakker;
let animal1 = actor_of_trait!(s, Animal, Cat::init(), ret_nop!());
let animal2 = actor_of_trait!(s, Animal, Dog::init(), ret_nop!());
let mut list: Vec<Actor<Animal>> = Vec::new();
list.push(animal1.clone());
list.push(animal2.clone());
for a in list {
call!([a], sound());
}
s.run(Instant::now(), false);
}
Use a trait on the caller side: Box<dyn Trait>
This involves wrapping the actors in a trait that forwards calls, and then boxing it to make it dynamic. So this also adds an indirection, but on the caller side. This is more verbose than doing it on the actor side, and the calls don't look like other actor calls. Here's an example:
use stakker::*;
use std::time::Instant;
// External interface of all Animals
trait Animal {
fn sound(&self);
}
// A particular animal, wraps any actor that implements AnimalActor
struct AnAnimal<T: AnimalActor + 'static>(ActorOwn<T>);
impl<T: AnimalActor + 'static> Animal for AnAnimal<T> {
fn sound(&self) {
call!([self.0], sound());
}
}
// Internal interface of animal actors
trait AnimalActor: Sized {
fn sound(&self, cx: CX![]);
}
struct Cat;
impl Cat {
fn init(_: CX![]) -> Option<Self> {
Some(Self)
}
}
impl AnimalActor for Cat {
fn sound(&self, _: CX![]) {
println!("Miaow");
}
}
struct Dog;
impl Dog {
fn init(_: CX![]) -> Option<Self> {
Some(Self)
}
}
impl AnimalActor for Dog {
fn sound(&self, _: CX![]) {
println!("Woof");
}
}
fn main() {
let mut stakker = Stakker::new(Instant::now());
let s = &mut stakker;
let animal1 = AnAnimal(actor!(s, Dog::init(), ret_nop!()));
let animal2 = AnAnimal(actor!(s, Cat::init(), ret_nop!()));
let mut list: Vec<Box<dyn Animal>> = Vec::new();
list.push(Box::new(animal1)); // <- dyn coercion occurs here
list.push(Box::new(animal2)); // <- dyn coercion occurs here
for a in list {
a.sound();
}
s.run(Instant::now(), false);
}
Use Fwd
and ActorOwnAnon
Instead of using a trait, it's also possible to use Fwd
to capture
the entry point of an arbitrary actor, and to pass that to other
actors that only care about the forwarding interface. The extra
indirection is also present in this solution, since the call must pass
via the Fwd
handler. However this is a lot more flexible than
traits.
Where you want another actor to not only have a Fwd
instance but
also to hold the owning reference to the actor, then you can use
ActorOwnAnon
. That way if that actor dies, the referenced actor
dies too. This allows owning the actor without being exposed to the
type. So you can keep a Vec<ActorOwnAnon>
pointing to different
kinds of actors for example.
Why Actor<dyn Trait>
can't be supported
Rc<dyn Trait>
can be done, so why isn't Actor<dyn Trait>
possible?
To enable dyn Trait
requires the actor runtime to be changed to use
A: ?Sized
, where A
is the actor's Self
type. Unfortunately Rust
does not support ?Sized
values inside an enum
, apparently due to
it inhibiting layout optimisations, and Stakker requires an enum
to enable switching between the three actor states (Prep,
Ready and Zombie). Maybe Rust could have a
#[repr(unsizable)]
for enums to support this one day, but it doesn't
right now.
In addition CoerceUnsized
is still unstable at the time of writing.
This is the approved way to do the "dyn coercion" which converts an
Rc<impl Trait>
to an Rc<dyn Trait>
. However that can be worked
around, I believe. So that isn't the blocker.
Looking at alternative approaches, it seemed like implementing a
custom enum
in unsafe code might be possible using union
, but that
is also a dead end due to union
only supporting Copy
types on
stable at present. I have an unsized_enum crate which I believe
is sound and could be the basis for Actor<dyn Trait>
in Stakker,
but I don't want to force it on all Stakker users. I'd like to be
able to offer a safe alternative as well. (Update: As of Feb-2021
'union' supports ManuallyDrop
which allows ?Sized
, so that might
offer a better way, although it still requires unsafe
.)
So unfortunately it's not possible to do Actor<dyn Trait>
right now,
and one of the alternatives must be used instead.
Top-level actor template
The following template may be helpful for writing big top-level actors
that need to accept configuration and need to be connected up to other
actors through Fwd
instances.
/// `Widget` actor configuration. Includes serde deserialization
/// support.
#[derive(Deserialize, Clone)]
#[serde(deny_unknown_fields)]
pub struct WidgetConf {
// ... configuration values ...
}
/// `Widget` instance callbacks
pub struct WidgetFwds {
// ... `Fwd` and `Share` values that the actor needs to talk to
// other actors and to access any shared resources ...
}
/// `Widget` actor
pub struct Widget {
conf: WidgetConf,
fwds: WidgetFwds,
// ... actor state ...
}
impl Widget {
/// Initialise the Widget actor
pub fn init(cx: CX![], conf: WidgetConf, fwds: WidgetFwds) -> Option<Self> {
// ...
Some(Self {
conf,
fwds,
// ...
})
}
//... all other actor methods ...
}
Inter-thread communication with Waker
The foundation for inter-thread communication in Stakker is the
Waker
mechanism. This uses an atomic bitmap-tree, which means
that many wakeup events from other threads will be accumulated into a
single I/O event on the Stakker thread, using only a small number
of atomic operations to recover them, keeping the load on the
Stakker thread low.
A Waker
will normally be paired with a channel or some other
shared mutable state, e.g. data within a Mutex
. So the Waker
is
used to notify a handler in the Stakker thread that it needs to
examine the shared state and respond to whatever it finds there.
PipedThread
is an example of this, combining a thread, two message
pipes and a Waker
. This provides a convenient way to handle some
very simple scenarios for running heavy or blocking calls in another
thread. However it should be straightforward to build other
inter-thread communication methods on top of Waker
.
One thing to note is that when using channels such as crossbeam
,
ideally we'd want to only notify the Waker
when the channel is
empty at the instant that the message is added. However the channel
APIs typically don't give us a way to detect this condition. (Perhaps
it's not even possible to detect this in some cases due to how the
channel is implemented.) Attempting to detect it with an is_empty()
call on the channel before adding the message is doomed to
intermittent failure due to races. So this means that another thread
adding something to a channel for the Stakker thread to pick up
must notify the Waker
on every message sent. So Waker
is
designed to make this as cheap as possible. After the first time, it
will take only one atomic operation.
Roadmap
Currently Stakker is performing well for the applications it is being used in, so there is no urgent need for new features. However that does not mean it is "done". Things will be added gradually as new requirements are discovered and refined.
Here are some near-term plans (6 months):
-
More fully test running on top of a C++ language event loop within another application, e.g. using Stakker in a library with a C++ interface
-
Benchmark timer-queue performance, and then optimise, e.g. changing to an N-ary heap
Waiting for new Rust language features:
-
If Rust one day supports enums or unions containing a DST on one variant (like an
Option<dyn Trait>
or aMaybeUninit<dyn Trait>
), then it would be possible to doActor<dyn Trait>
instead ofActor<Box<dyn Trait>>
as at present (seeactor_of_trait!
). -
If Rust had better support for working with VTables and fat pointers, then a lot of unsafe code could be eliminated from the flat FnOnceQueue. See: https://github.com/rust-lang/rfcs/pull/2580
-
If Rust supported passing borrows and lifetimes into the generator resume function, and generators were stabilized, then they could be used to implement actor coroutines. See: https://github.com/rust-lang/rust/issues/68923
Further ahead:
-
Write more varied applications with Stakker, and see if that suggests any more features that need adding. If other people start using Stakker, that may also suggest more features.
-
Maybe switch all macros over to procedural macros to allow
cx
to be picked up automatically from the context, to make code more concise. (macro_rules!
hygiene normally forbids this, but procedural macros have different rules.) -
Look at ways to proxy calls between actors on different threads or different machines, i.e. where a local actor acts as a proxy for a remote actor and forwards calls and responses
-
Investigate writing crates to allow Stakker to be layered on top of other runtimes, e.g. tokio or async_std.
-
Investigate writing a simple async executor on top of Stakker to interface directly to async/await style code in the same thread. However if this is to be done at all it needs to be very low-level and efficient, ideally avoiding any synchronisation, memory fences, atomic operations, mutexes and so on. Otherwise it might be better to run one of the existing executors in another thread instead, and communicate data with channels to keep the Stakker thread lean.
-
Possibly look at making use of generators or async/await to allow writing sequential-style actor coroutines within an actor. This would allow making a call to another actor and receiving the
Ret
directly in the code. The main difficulty is passing theCx
(with its lifetime) up into the coroutine at each resume, and having it drop at each yield. Because of the low-level nature of Stakker, this needs to be efficient to be a good fit. This will probably require a good deal of experimentation and tweaking to find the right ergonomics, and the best low-level fit. There's no sense in rushing it. -
Support off-loading CPU-intensive or I/O work to a threadpool. If actor coroutines are implemented, then we could simply mark a block of code as
offload!
to move it to a threadpool, which would be very convenient.