Edit: a previous version of this post contained a description of a naive sparse set implementation, whereas actual implementations use more optimal solutions. I changed the post to reflect this.
If you are using an ECS (Entity Component System) for the development of a game, chances are that at some point you had to ask yourself whether you wanted to store the state machine in ECS. I have, and I reached the conclusion that it was a bad idea, for more than one reason. In this post I will go over why this is a bad idea, why having state machines in ECS would be nice nonetheless, and how I overcame the issues with them in Flecs.
First let’s talk about those issues. At first state machines don’t seem like a big problem. Most ECS implementations allow applications to define tags, which are essentially components without data. A state in a state machine, it seems, could easily be mapped to a tag. When we want to move an entity between states, we remove one tag and add another. Seems easy enough.
It turns out that each ECS flavor has unique challenges when trying to implement state machines this way. Let’s start with …
Archetype-based ECS implementations store entities that have the same components together in an “archetype”. They are known for their fast iteration speeds, as components are always stored in contiguous arrays. The more entities are stored in a single archetype, the better these ECS implementations perform. Examples of Archetype-based implementations are Unity DOTS, Flecs and Legion.
State machines though can easily cripple an archetype-based ECS. The first reason is that state machines introduce fragmentation. That is a fancy way of saying that they increase the number of archetypes. Why is this a bad thing? Each archetype switch introduces cache misses, which makes data retrieval from RAM less efficient.
Consider a game where we use two state machines, one for tracking movement and one for the current action of an entity. For Movement we have the following states:
For Action we have the following states:
Because in an archetype-based system the combination of components is what counts, these two state machines create 5x5 = 25 archetypes. In other words, an archetype that previously had 1000 entities, is now split up over 25 archetypes with only 40 entities (assuming equal distribution). That is 25 cache misses * the number of components in the archetype. And that is for a trivial example with two small state machines; a real game likely has dozens if not hundreds of individual states. Ouch.
It doesn’t stop there. Because of the way archetypes work, each time a component is added or removed, all components of an entity need to be copied from one archetype to another. If an entity has lots of components (several dozen), switching states becomes a very expensive operation, and if you have a state machine that switches states rapidly, which is not uncommon, this can become a serious performance bottleneck.
Okay, not great, but archetypes are not the only way to implement an ECS. What about…
A sparse-set based ECS stores components in sparse sets, which are a clever data structure for storing arrays that aren’t 100% populated. Sparse set-based ECS implementations generally don’t have the same iteration performance as archetype-based implementations, but perform better when it comes to adding/removing components, which doesn’t require copying data. Sparse set-based implementations also do not suffer from fragmentation, as they do not split up the sparse set per component combination: there is just a single sparse set per component. An example of a sparse-set based implementation is EnTT
So far it sounds like a sparse set addresses all of the problems that archetype-based implementations suffer from. So why can’t we use them for building state machines?
The problem is that they create a sparse set for every component. If we take the previous Movement and Action state machines, we would add 10 sparse sets to our application. That doesn’t seem so bad though, why is this a problem? Well, a real world game doesn’t just have 10 states, but can have dozens or even hundreds of them, which would create just as many sparse sets. In a naive sparse-set implementation this could balloon memory utilization, as the entity id is used to index the sparse array, which means that the sparse array needs to grow so it can contain the largest id stored in the set.
Smarter implementations of sparse sets don’t have this problem, as they can use paging to reduce the memory overhead. However, having lots of sparse sets introduces another problem, which is that of deleting entities.
When an entity is deleted from a sparse-set based ECS, it needs to be deleted from all the sparse sets that it is stored in. There are various approaches to address this. In EnTT this happens by iterating every single sparse set in the registry, checking if it contains the entity, and deleting it if it does. This means that the performance of a delete decreases linearly with the total number of sparse sets in the registry, which doesn’t scale well if that number is large.
So storing stats as tags in sparse sets isn’t ideal either, and if you’re using a sparse-set based ECS, you’d be better off implementing a different approach.
Bitset based ECS implementations store components in a big matrix where the entities are stored as rows, and components as columns. A bitset indicates whether a component is set or not. There are numerous variations on this theme, but they more or less boil down to this idea. An example of a bitset-based implementation is EntityX.
We can keep it short here- the disadvantages of a bitset are very similar to that of a naive sparse set, where we need to grow the sparse arrays to hold the maximum entity id. Whether stored as a matrix or not, a bitset implementation is likely to have large arrays that are extremely sparse, and will ultimately end up wasting memory as well.
Darn. What about…
Class-based ECS implementations (sometimes called OOP-style ECS) store components on an entity class. There are no global data structures, sparse sets or big arrays, and as a result these implementations tend to be a lot simpler, but are often considerably slower than their counterparts.
Except when implementing state machines, where class-based ECS implementations actually hold their own reasonably well, at least when compared with the “smarter” approaches. There are no operations that approach their degenerate performance, no massive memory wasting, so have we found a solution?
Not really. Class-based ECS implementations weren’t very fast to begin with, and the fact that a state machine doesn’t slow them down doesn’t make them faster. Specifically, a downside of a class-based ECS implementation is that there is no fast way to query all components for a particular state.
Additionally, class-based ECS implementations are fragmented by default which is partly why their performance isn’t great. However, the framework overhead of switching between two entities is likely to be lower than switching between archetypes, which is why this approach may actually be faster than a heavily fragmented archetype-based implementation.
Still, switching to a class-based ECS design just for state machines doesn’t feel right, especially considering that it’s not particularly fast.
Okay, what if we forget for a moment about using builtin ECS features, and we just build it on top of ECS. We could just define a component (one for every state machine) in which we store an integer value that represents our state and be done with it. Good?
Of course that is not ideal either. Now our ECS framework has no idea what the state of the entity is, which means we can’t filter on it, nor can we do clever things to optimize for the use case. But all things considered, this is probably the best solution if you’re using any of the above implementations.
This could have been the conclusion of this post, as I could have just resigned to the fact that not everything has to be stored in ECS. But that didn’t sit well with me. So:
Why Storing State Machines in ECS is not a bad idea.
ECS means many things for many different people. What we all more or less seem to agree on however is this:
In an ECS you have entities, you can add stuff to those entities, and you can query on the stuff.
It’s not my best attempt at a definition for ECS, but it’s surprisingly robust. And as it turns out, state machines fit this definition just fine. States are “stuff” that I can add to entities. I would like to be able to query on the states, so that I can implement state-specific behavior.
I think it is fair to say that it is not the definition of ECS that makes state machines so hard to implement, but it is the ECS data structures that people have come up with so far that aren’t equipped for the task. And that is something we can fix. What if we were to invent a new data structure that is designed to handle this particular use case? What if we could make it really fast and queryable? What if we could seamlessly integrate it with an existing ECS framework?
That is what I ended up building in Flecs. State machines are a common enough use case, and the question comes up often enough to provide builtin support for it. So how did I do it? First let me show an API example in C++:
// Declare cases
auto Standing = flecs::entity(world, "Standing");
auto Walking = flecs::entity(world, "Walking");
auto Running = flecs::entity(world, "Running"); // Declare switch
auto Movement = flecs::type(world, "Movement",
"Standing, Walking, Running"); // Create entity with Walking
auto e = flecs::entity(world)
.add_case(Walking); // Add case Running, replace Walking
A few things happen here. First I define a number of entities, which represent my states. Then I create a “type” that contains those entities, which represents all of the states in my state machine. Then I add the type as a “switch” and the state as a “case”.
The last statement is where it gets interesting. When I add the “Running” state, I automatically remove the “Walking” state.
The data structure that powers the “switchable” cases, let’s call it a “switch list”, is chosen so that individual cases do not fragment the archetypes (Flecs is an archetype-based solution), but still allow for iteration over individual states. An application can create a regular system that subscribes for sleepwalking entities like so:
flecs::system<>(world, "Walk", "CASE | Walking, CASE | Sleeping")
The “switch list” data type takes advantage of the fact that an entity can only have one active case per switch. This allows for much more efficient storage, as we do not need a sparse set (or similar) per state to keep track of which entity has which state. Rather, every switch (state machine) adds a single component array to the entity with the case id, which scales much better.
To enable iteration over a single case, the data structure contains a number of interleaved linked lists, one for each case, that contains all the entities for that state. This makes finding all the entities for a specific state quite fast at the cost of some iteration performance, as we cannot iterate data for a single case as a contiguous array.
And that’s it! I hope this gave you some ideas on how to implement your state machines, and in general opened the aperture a bit on what ECS implementations could do with a bit of tinkering here and there.
If you’d like to know more about the specifics of the implementation, feel free to join the discussion on Discord!
Join the flecs Discord Server!
Check out the flecs community on Discord - hang out with 166 other members and enjoy free voice and text chat.
Of course, if you feel compelled to try out Flecs after reading this, far be it from me to withhold the repository link: https://github.com/SanderMertens/flecs