Why Vanilla ECS Is Not Enough

Image for post
Image for post

Disclaimer: I am the author of Flecs, an Entity Component System for C99. Discord: https://discord.gg/ZSSyqty

When I started writing my first ECS a year ago, I was excited. It seemed to do something unique by offering more flexibility and performance to a developer at the same time. Also, having high-level design primitives that translate well to cache- and vectorization friendly code sounded great.

This has all proven to be true, at least for me. And yet.

First I have to preface this blog with something. There are two ways you can look at ECS. One is that it is a data container, like a vector or a hashmap. The other is that of a design pattern. The line between the two is often blurred as ECS heavily relies on data, and this data needs to be stored somewhere. This is why ECS implementations put a lot of emphasis on how data is stored.

Much has been said (and will be said) about what are the best and most performant ways to store your data in ECS. In this blog I want to spend some time reflecting on the design aspects, and in particular where I think ECS as a design pattern is lacking.

If you‘re short on time, scroll to the end of this blog to see the changes I would make to the definition of vanilla ECS.

To know why ECS is lacking, we need to know what its purpose is. This is already a contentious question. When approaching this question from the “ECS as a data structure” perspective, there isn’t necessarily a single purpose, besides offering a performant way to store and retrieve data. When we treat ECS as a design pattern though, we naturally need to ask the question, “a design pattern for what?”.

Most often ECS is mentioned as a pattern for implementing game logic, where if done right, it produces code that is easier to extend, refactor, and maintain (and yes, it will probably run faster too). But what about a game engine? Could we implement core engine features like an input manager or renderer in ECS? What about a user interface? Or a low-level data structure like a quad tree? If not, why not?

Over the past year I have experimented with implementing several things such as an HTTP wrapper and REST endpoint, a reflection framework, a metrics collection backend and several core engine systems in ECS. So far I am not unhappy with the results. There is one big takeaway from these projects, and also one big caveat.

Let’s start with the takeaway: coding these features in ECS made them easier to extend, easier to build and made them plug & play, which I will talk about later. It also made the code much more data-driven, which I believe helped in keeping things simple and readable. Did it make the code faster? Possibly, but that wasn’t the goal, as ECS was not in the critical path for any of the projects.

Now the caveat: I did not just use “vanilla ECS”. Vanilla ECS can be summarized in the following four rules:

  • An entity is a unique identifier
  • A component is a plain old datatype
  • An entity can have 0 .. N components
  • A system is logic matched with entities based on their components

These four rules proved to be insufficient for my purposes. So I started tweaking ECS. Why bother, you may ask, and not just implement them in a different way? In short: because I like the style of ECS, and I think it has measurable benefits over non-ECS code. With a few tweaks I was able to at least overcome some of the shortcomings of vanilla ECS, which are:

  • Some data cannot be instantiated in a scalable or performant way
  • The semantics of systems are underspecified

Ok, enough high-level fluff, let’s get into some specifics.

Hierarchies

One of the first things any developer is going to run into when starting with ECS is, “how do I create my scene hierarchy”. Many game engines provide the ability to organize objects in a game hierarchically, where child objects have local coordinates relative to their parent. Before rendering, these coordinates need to be translated from local space to world space, and this requires iterating the hierarchy from top to bottom.

Despite how common hierarchies in games are, vanilla ECS provides no facilities out of the box for specifying hierarchies. To make matters worse: it is actually impossible to implement a performant hierarchy in vanilla ECS, where by “performant” I mean an approach that lets us iterate over coordinates in a contiguous array with code that can be vectorized (if you think otherwise and you have a solution, do let me know ;).

Component sharing

It is not uncommon that two or more entities need to share a component. A typical example is a set of entities that share a bounding box for quick AABB evaluation. Another example could be entities that share a set of static common component values (like a mesh) to conserve memory. Many ECS frameworks support component sharing in some way or another, often in a way that is tightly coupled with the underlying storage. For a feature as common as this it would be nice if component sharing relationships could be expressed in the language of ECS itself, rather than relying on a framework-special.

Multiple component instances

Vanilla ECS states that a component can be added once to an entity. In many scenario’s this is fine (why would you want to have two positions for a single entity) but when an application needs it, this can quickly explode application code. Consider creating a “Timer” component that removes a component after N seconds, for example to remove a buff from an entity. An entity can have multiple buffs at the same time, yet I can only add the Timer component once. Solutions to this problem get unwieldy (storing a map in the component value or creating a Timer component for each buff component). This is a serious limitation when applications want to implement generic systems.

Runtime Tags

ECS frameworks often provide the ability to add tags to an entity, where a tag is essentially a component without the datatype. Tags make it easy to query for a subset of entities. With a generous interpretation of the rules of vanilla ECS you could argue that a tag is a specialization of a component, just with an empty datatype. However, this glosses over one thing, which is that components often have to be defined in advance, and an application can’t just add new components on the fly.

This would not be a problem if we never have to query subsets of entities that are known before we start the game, but this is not the case. What if a game allows you to create platoons dynamically, and you want to get all of the entities for a platoon? What if you have different teams in your game and you want to tag the players in each team? This is only possible if a game can create tags at runtime, which is not possible in vanilla ECS.

State machines

State machines are yet another thing that are very common in games, yet are not straightforward to implement in vanilla ECS. The most straightforward way to assign state to an entity is to define a tag for each state, and add and remove tags as the entity moves from one state to another.

This is a poor mans approach for a few reasons. First of all, there is no inherent relationship between the tags associated with a state, which makes the state machine implicit and error prone. Secondly, there is no way to prevent that two states are added to an entity, or that the entity is in no state at all. This is all a step back from a state machine that does not rely on ECS, where we can group states as constants in an enumeration. Yet it would be nice to have the state machine represented in ECS, as this likely impacts the kinds of systems we want to run on our entities.

Extreme declarative programming

ECS seems to be a good fit for declarative programming, where behavior is driven not by imperative statements (“do this”) but by declarations (“there shall be this”). After all, we create entities and assign components to them, and systems get executed as a result. Imperative code is only found in systems, and everything is great.

Or is it? Vanilla ECS says notoriously little about this. The most common interpretation of a system is that it is a function that runs periodically in the main loop. But what if I want a system that runs when I set the component? What about a system when I unset the component? Consider this example:

entity.set<Window>({.width = 800, .height = 600});

This is a declarative statement (“there shall be a component Window on this entity with this value”) that is an example of what I’ve coined “extreme declarative programming”. The code clearly wants there to be a window, but in vanilla ECS no such thing would happen. Ideally there would be a well-defined construct in ECS so that we can describe these kinds of interactions, and have our window be created.

System execution order

If there is one thing that is sorely missing in ECS, it is the ability to define execution order of systems. In vanilla ECS a system is logic (a function) that is matched with entities that have a certain number of components. No mention is made about when it should run though, and without this information we cannot provide a formal description of its functionality that is correct, e.g. will work in any situation.

For all the promises ECS makes about code reusability, none of that will come to pass if we cannot specify execution order. If a “Move” system progresses Position with Velocity, but Velocity isn’t set yet for the current frame, the system will not work.

Any design language that does not allow me to specify the preconditions for something to work correctly is in my humble opinion flawed. ECS implementations have gotten “around” this problem by either ignoring it or providing overly complex or broken solutions. That’s strong language, so let’s qualify this a bit more.

ECS promises that systems that are decoupled through components. Specifying direct dependencies between systems is in direct violation of this. It makes code fragile as systems are prone to change during refactoring. So don’t do that.

Another approach is to let the game developer specify system order. This may work for small projects, but what if you just imported 100 systems from an asset store that you did not write yourself? Long story short: this doesn’t work either.

So how should system dependencies be defined? Just like with everything else in ECS, we should return to the data. Our Move system needs to run after Velocity is set, not after the system that writes Velocity (there can be many). In order to schedule a Move system, we need to be able to identify a part of our frame where we can safely assume that Velocity has been set, and assign our system to that part of the frame.

A new definition for ECS

Enough ranting, let’s look at how we can improve the vanilla definition of ECS to fix the aforementioned issues. Without further ado, here is the new definition as it relates to the storage:

  • An entity is a unique identifier
  • A component can optionally be associated with a plain old datatype
  • A component identifier is an entity
  • An entity can have 0 .. N components
  • A component can be annotated with a role
  • An <entity, component> tuple can have 0 .. N components

I did a few things here. Entities are still just simple unique identifiers. Components can now optionally be associated with a datatype, which releases them from the constraint of having to be defined in advance.

The next part is where it gets interesting: “A component identifier is an entity”. This allows us to treat entities and components in the same way in many cases, and more importantly, it lets us add entities to entities. The first problem this solves is that we can now generate tags on the fly, and we can add tags to entities for things that aren’t known in advance.

The next part “A component can be annotated with a role” is a catch-all mechanism that lets us specify what the “role” of a component (entity) is for an entity. Here we can specify things like, this entity is a parent, or I want to share components from this entity.

The last rule lets us add components multiple times. The aforementioned “Timer” component could be added to “entity,HealthBuff” and “entity,StaminaBuff”. Because a component is simply an identifier, we could even do things like, add “Timer” to “entity, 1000” and “entity, 1001”.

It is a bit harder to provide a definition for systems that is as tight as the ones for entities and components, but let me try anyway:

  • A system is logic matched with entities based on their components
  • A system is invoked as result of an event
  • A component mutation is an event
  • Computing a simulation frame is an event
  • A frame is divided into N phases
  • Each system is assigned to a phase

This is a much more concise definition than the original one. It recognizes that systems can be ran as a result of the simulation progressing (what many people would consider the default) and as a result of “mutations”, what essentially means adding, removing and setting components.

The “phase” construct is introduced which splits up the frame into several different parts. This idea is not new, as many engines have similar concepts that have proven to work well. Each of these phases are associated with a specific state the frame is in, and this provides the right kind of context to ensure the preconditions for our systems are met.

Conclusion

The definition of ECS is unlikely to change, so don’t treat the above as a call to action and start editing the Wikipedia page (though it definitely can be improved, so feel free to do that). The goal of the blog is to open the aperture a bit on what is possible if we only slightly tweak the original ECS definition.

Do I think that an ECS framework has to conform to all of the above points in order to be ECS? Obviously not, nor do I think this should ever be the case. I can write code in C that uses inheritance, even though the language doesn’t support it and still call it inheritance. Ultimately what is going to move ECS adoption forward is establishing a set of patterns, and for this to emerge it helps to have a richer set of primitives than what vanilla ECS provides.

As a final note, all of this is solely based on my own experiences of the last year, and may be completely different from your own. If you have insights to share, feel like I misrepresented things or just feel like having a more in-depth discussion on any of this, feel free to join the Flecs discord:

If you’re curious about what an implementation of these concepts looks like, check out Flecs v2, which is about to launch: https://github.com/SanderMertens/flecs/tree/v2_in_progress

More reading:

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store