Why Storing State Machines in ECS is a bad idea.
Disclaimer: I am the author of Flecs, a C/C++ Entity Component System. If you’d like to know more about ECS, see this FAQ.
If you are using an ECS (Entity Component System) for the development of a game, chances are that at some point you had to ask yourself whether you wanted to store the state machine in ECS. I have, and it turned out to be a lot harder than expected, for more than one reason. In this post I will go over the challenges I ran into, why having state machines in ECS would be nice nonetheless, and which solution I came up with to address the issues.
First let’s look at some of the features and operations that are common when working with state machines:
- Replace an entity’s state with another state
- Have entities participate in multiple state machines
- Get current entities for a state
- Get current state for an entity/state machine
- Get current entities for a state machine
- Change the list of states that are part of a state machine
- Get a list of all the states in a state machine
A state machine implementation should not only be able to provide efficient implementations for these, but should also allow for code that is readable, while minimizing the amount of refactoring when state machines change.
Before getting into details, I want to put special emphasis on the importance of readable, maintainable code. It can be tempting to only focus on the performance of a particular solution. As this post will show, state machines introduce a number of design challenges that when left unaddressed can become a source of technical debt.
There is one intuitive way in which a state machine could be implemented in ECS. Let’s take a look at it, and see how it scores on the above list.
A tag per state
A tag in ECS is a component that has no data, and we could create a single tag for each state. This approach performs generally well when it comes to querying for all entities for a given state, since finding all entities for a given component/tag is what ECS implementations are usually good at.
How about the other operations? For starters, to change the state, we would have to add one tag, and remove another. In pseudo code this is what it would look like to change the state of an entity from Walking
to Running
:
e.add<Running>()
e.remove<Walking>()
Even though the code seems simple at first glance, there are a few real issues here. The first one is that in order to remove Walking
, an application first has to know that the entity was in fact in the Walking
state, which is only one of the potentially many states for a particular state machine. So how could this be solved? There isn’t really a good way to do this without adding more complexity. One approach is to store the current state in another component:
Movement& m = e.get<Movement>();
if (m == MovementStateWalking) {
e.remove<Walking>();
} else if ( /* test other states ... */) {
}m = MovementStateRunning;
e.add<Running>()
Another approach would be to write a utility function that tests for states and use it in place of the state machine component:
MovementState get_movement(entity e) {
if (e.has<Walking>()) return MovementStateWalking;
if (e.has<Running>()) return MovementStateRunning;
/* test other states ... */
}
Neither approach is particularly appealing, and adds maintainability and performance issues to parts of the code that use state machines. Each time a state machine changes, the corresponding utilities would have to be updated as well.
The same problem arises when an application needs to get the current state for a particular state machine. This would again rely on a component to store the current state or on a state machine specific function.
Getting the list of entities for a particular state machine is also not possible without creating additional utilities. An application could for example create a query for all states in a state machine that has to be updated whenever the state machine changes:
query<Sitting, Standing, Walking, Running, ...>();
Depending on what kind of ECS is used, such a query could end up slowing down as more states are added to the query. The alternative approach would again rely on a component per state machine.
Let’s also take a look at storage implications. There are many different ways to implement an ECS, with two popular approaches being archetypes and sparse sets. The tag-based approach can put a lot of stress on either storage implementation. Let’s take a look why:
Archetypes
An archetype implementation stores entities with the same sets of components together in a table. Without getting into details, this provides cache-efficient component iteration as multiple components can always be iterated as contiguous arrays. This does however come at a cost.
To keep entities with the same set of components together, entities need to move between tables when components are added or removed. This also applies to tags. As a result, frequently adding/removing tags associated with states gets very expensive as all components for an entity need to be copied for each state transition (in fact, this requires two copies per component because of storage specifics).
Another challenge is that when an entity participates in multiple orthogonal state machines, each combination of states can cause a combinatorial explosion of tables, which increases fragmentation and can, depending on the ECS implementation, slow down other parts of the code as well.
This is nothing short of a deal breaker for state machines.
What about…
Sparse Sets
In a sparse-set based ECS each component and tag has its own sparse set which contains the entities that have that specific component. Adding and removing to a single sparse set is generally fast, so sparse sets do not have the same problems that archetype storages have. There are other issues though.
The first problem that arises is that because each state is stored in a separate sparse set, the number of sparse sets can grow very rapidly as games can have hundreds if not thousands of individual states. This causes issues as some operations in a sparse ECS (such as an entity delete or visit) need to iterate all sparse sets in an ECS world to test if an entity is a member of the sparse set. With hundreds of sparse sets, such operations can become very expensive.
An additional problem is memory utilization. Sparse ECS implementations typically use a so-called “paged sparse set” that allocates memory in pages. While this typically acts as a memory-conserving strategy (without paging a sparse set would have to allocate an array the size of the largest entity id), in the case of state machines this still can cause significant overhead.
To put some numbers on this, let’s consider a game with 10.000 entities that has a total of 100 states. Many of those entities will get recycled and used for different purposes, so we can assume that over time entity ids are more or less randomly distributed across state machines. Let’s also assume a page size of 4096 entities, which typically provides an optimal balance between not creating too many pages, and not allocating too much memory.
In this scenario we would have on average 100 entities stored per sparse set (only counting the states). Because the ids of the entities are randomly distributed, each sparse set will have at least allocated 3 pages, which adds up to space for 12288 entities in total, per sparse set. We have 100 sparse sets, so the total amount of memory for this setup is more than 100x larger than what we would strictly need for 10.000 entities! This is cringy, especially when considering that many of these states are mutually exclusive.
With 4-byte entity identifiers that would amount to around 5MB memory utilization which at face value is not the end of the world, but the fact that data is stored in a very sparse way would destroy cache efficiency as randomly testing for states (which would happen a lot) would ensure that data won’t stay in L1/L2 caches for long.
To conclude, even though initially the tag based approach seems intuitive, it requires the application to write utility code for each state machine, and all operations except getting the entities for a current state incur overhead that in most cases scales linearly with the number of states. Both archetype storages and sparse set storages struggle with this approach (though to be fair archetypes are a non-starter, whereas sparse sets are merely inefficient).
This is not great.
It is not all bad news though. Archetype and sparse storages are advancing at a rapid pace, and some of the storage issues I mention are being addressed as we speak. Those improvements however still do not address the more fundamental “designy” problems that we saw earlier. I believe that those design issues are at least as important as the storage/performance ones, and are much more likely to make the lives of the average game developer miserable.
We could of course cop out and argue that the state machine logic should not be implemented in ECS at all. This is a valid argument on paper, but in practice having multiple systems for dealing with iterating and running logic over entities is all but guaranteed to introduce code smell.
So I decided to take a look at whether all of these problems are solvable, and it turns out they are.
Why Storing State Machines in ECS is not a bad idea.
The solution I’ll describe is the one I implemented in Flecs, which is an archetype-based ECS. The used approach is however not only applicable to an archetype solution, and could be implemented in any ECS.
State machines have a unique property that none of our previous solutions took advantage of: states are mutually exclusive. This property can heavily be exploited with the creative use of a few data structures.
For starters, we want to avoid having to move the entity between archetypes. This means that as a result we have to be able to store entities with any state in a single archetype. Let’s first consider the simplest approach, which is to create a single component per state machine that contains an integer with the state for an entity, as mentioned earlier.
This would allow us to quickly change the state (just overwriting an integer value), which has the neat benefit that an entity cannot be in more than one state at the same time. It would also let us iterate all states for a state machine (just iterate its component) and get the current state for an entity efficiently. Not bad for a start.
What this approach does not let us do is iterate all entities for a state efficiently. We would have to manually skip the entities that do not belong to the state we are looking for. This would likely lead to “god” systems that handle every state, or code that does inefficient dispatching. We need something better than that.
The solution here is to use a linked list per state that is stored adjacent to the state machine component array with the state identifiers. To iterate an entity for a specific state, one simply has to find the head for the corresponding linked list, and follow the nodes. While this is not as fast as regular component iteration, it is still a very rapid way to query, and also does not significantly increase the overhead of changing a state (a linked list remove+insert).
The linked list itself can be stored efficiently as another adjacent array of integers, where the integer points to the next node in the array. Because states are mutually exclusive, the lists for different states can be nicely “braided” together in a single array which can be stored alongside the regular component array.
The only thing left is to define a way to tell the ECS storage which tags belong to which state machine. For that I added a few small extensions to the API that allow an application to add a “switch” to an entity, which contains the list of state identifiers, and a “case” which is the state itself. Adding a switch adds the state machine component. Adding a case replaces the state for an entity.
In code it looks like this:
// Declare cases
auto Standing = flecs::entity(world, "Standing");
auto Walking = flecs::entity(world, "Walking");
auto Running = flecs::entity(world, "Running"); // Declare switch
auto Movement = flecs::type(world, "Movement",
"Standing, Walking, Running"); // Create entity with Walking
auto e = flecs::entity(world)
.add_switch(Movement)
.add_case(Walking); // Add case Running, replace Walking
e.add_case(Running);
This has the additional benefit that the ECS store is now aware of the states that belong to a state machine, which comes in handy for the development of tools.
News & noteworthy
Both archetype & sparse set ECS implementations have come up with ways to get around some of the previously described edge cases where performance starts to degrade.
Some archetype implementations are implementing a storage variant that separates an archetype from a table. In this approach a table only stores the (data) components of an entity, whereas the archetype stores the components + tags. Multiple archetypes can point to the same table. With this in place, an application can add & remove tags (and thus states) in constant time, which is a massive improvement over having to copy all components. The idea was first proposed by the author of Bevy, and currently both Bevy ECS and Flecs are working towards this solution.
An additional improvement that has been finding its way into archetype implementations (Unity DOTS & Flecs) is the usage of bitset arrays to indicate whether a component is “enabled” or not. This provides the ability to quickly turn on or off components by flipping a bit, without actually having to move the entity between tables. The tradeoff is that iteration is a bit slower as queries have to find contiguous sets of enabled bits, but this can be implemented efficiently. The approach could be applied to state machines, where each state gets its own bitset array.
Sparse set implementations have also been improving towards approaches that do not rely on iterating all sparse sets for delete/visit operations. One such approach is to “borrow” the idea of an archetype where the archetype itself only stores the list of component identifiers. When archetypes are organized in a graph, changing archetypes and finding the total set of components can be done in a single cheap O(1) operation. Alternative solutions are also being worked on, as the author of EnTT is also working on an approach that achieves similar benefits.
An optimization that a sparse set ECS could make to reduce memory overhead (at the cost of increasing allocations) is freeing up pages when they become empty. I would probably not do this unless I’m strapped for memory, but is likely to provide a significant decrease in memory utilization.
Last but not least, there are a number of ECS frameworks that have adopted a multi-storage approach where an application can choose to store components either in a sparse set or in a table (or table-like way). For archetype solutions storing states in sparse sets can reduce the performance overhead of switching states, but still suffers from some of the other problems outlined above.
And that’s it! I hope this gave you some ideas on the issues with “naive” state machine implementations, and, if you’re building your own ECS, how you can address them. For anyone who’s interested in implementing one, the implementation of the braided linked list can be found here: https://github.com/SanderMertens/flecs/blob/master/src/switch_list.c
If you’d like to know more about the specifics of the implementation, feel free to join the discussion on Discord!
More reading: