Building an ECS #2: Archetypes and Vectorization

Flecs Discord: https://discord.gg/MRSAZqb

This is the second in a series of posts about the guts of Flecs, an Entity Component System for C and C++. Each post will cover a different part of the design and turn it inside out, so that at the end of each post you will have enough information to implement it yourself.

The blog is intended both for Flecs users as well as people interested in ECS and game design. I won’t explain ECS in great depth as there are many resources that already do a great job at this, such as the ECS FAQ or the ECS back & forth series.

The first entry explained how we can efficiently keep track of which components an entity has. In this second entry we will explore one way we can store the component data. ECS is often touted for its performance benefits, and after reading this entry you will know why.

Arrays, array, arrays

As a rule of thumb, if you want to get things done fast, use arrays. In data intensive applications the amount of data that needs to be streamed from RAM to the CPU can quickly become a bottleneck, as RAM access is comparatively slow. Storing data in plain arrays ensures that data retrieval happens as efficiently as possible. I won’t go into the details of the why and how, but if you’re interested I can highly recommend watching this talk from Mike Acton: https://www.youtube.com/watch?v=rX0ItVEVjHc.

Arrays have one other massive benefit, which is that code written for arrays can be vectorized. Vectorization, in plain English, is the ability of a CPU to execute multiple operations in the time it takes to do a single operation, using what are so called “SIMD” instructions (Single Instruction, Multiple Data). Well written vectorized code can run anywhere from 2x to 16x faster than the same, non-vectorized code.

When building our ECS, we of course want users to be able to leverage both fast streaming and vectorization. To accomplish this, we need to not just store data in arrays, but arrays need to be contiguous as well which is a fancy way of saying they should not contain any unused elements, and are stored as a single block of memory. Arrays with unused elements not only cause us to stream more data than needed, but also prevent us from using SIMD instructions, which only work on contiguous arrays.

The ABC problem

Unfortunately storing all data in perfect contiguous arrays is not trivial. To see why, let’s look at a simple example first. Imagine we have an application with 3 entities that all have component . We can store this in a single array:

A a[3];

We can use the array index for entity id, so that gives us the value of for entity 2. Now imagine that two of these entities also have component . For this we need to add an array for B, but here we run into a problem. We could create an array with the same size of A, like this:

B b[3];

This is easy, as it still allows us to use the entity id as the index (A and B for entity 2 are stored in A[2] and B[2]). But, because we have an array of 3 and only two entities with B, we cannot guarantee the array is contiguous! This means that we cannot vectorize our code, which is no good.

Instead, let’s try to only store as many B’s as we need by creating an array with just two elements:

B b[2];

This is a contiguous array for which we can write vectorized code, but now we run into a different problem. Imagine our entities look like this:

0: [A B]
1: [A ]
2: [A B]

Even though both A and B are stored as contiguous arrays, the entity ids don’t match up with the array indices! This means that we cannot write vectorized code that requires both A and B. This is still no good, but we can use a trick here, which is to store all entities with B at the top of array A:

0: [A B]
2: [A B]
1: [A ]

Now A and B line up nicely. We will have to come up with a mapping from entity ids to array indices, since they no longer are the same, but that is something we can live with as long as we can write fast code. This approach even works if some entities only have B:

3: [  B]
0: [A B]
2: [A B]
1: [A ]

Problem solved? Unfortunately not. Let’s see what happens if we add another component into the mix:

0: [  B  ]
1: [ B C]
2: [A B C]
3: [A B ]
4: [A ]
5: [A C]
6: [ C]

There is no way we can order our entities so that we get three perfectly contiguous arrays for A, B and C (try it!). Did we chase an impossible goal? Not quite, but we need a relaxation in order to support vectorization.

Archetypes

What if we stop trying to create a single array per component, but instead create an array per component, per type? Imagine we have 2 entities with , 2 entities with and 2 entities with . We could store them like this:

// Type [A]
A a[2];
// Type [A, B]
A a[2];
B b[2];
// Type [A, C]
A a[2];
C c[2];

Our entities now look like this:

0: [A]
1: [A]
2: [A B]
3: [A B]
4: [A C]
5: [A C]

We now have 3 separate arrays for which introduces a little bit of overhead for when we only want to iterate , but other than that we fixed all of our previous problems. All of the arrays are contiguous, and the array indices correspond with a single entity. If we wanted to iterate , we could do so with a loop that looks like this (and can be vectorized):

for (int i = 0; i < 10; i ++) {
a[i] += b[i];
}

This way of storing data is often referred to as the “archetypes” approach. An ECS framework internally groups all entities with the same type together in a set of packed arrays which we call an “archetype”. Archetypes are typically created automatically when a combination of components first occurs in an application.

Archetypes are not only good for fast iterations. They also make matching systems with entities a lot easier. Imagine we have a 1000 entities with and another 1000 with . If we create a system that matches with we don’t need to evaluate 2000 entities. Instead, we only need to look at our total set of archetypes, and test which ones match.

Let’s create a data structure that can represent an archetype:

struct ComponentArray {
void *elements; // vector<T>
int size;
};
struct Archetype {
Type type;
vector<ComponentArray> components;
int length;
};

Note that we use a since our ECS framework doesn’t know in advance which component types it will be storing. The “type” contains the list of components that are stored in the archetype (see the first post for the definition of a Type). To keep things easy, the order of the components in the type will be the same as that of the component arrays.

This design lets us write fast, vectorizable loops. Let’s take a look at a some of the other operations in an archetype-based ECS.

Finding entities and components

Now that we have a way to store component data for our entities and we can iterate them, we still need to be able to find a single entity. We can do this by taking the entity index from the previous post, and modify it to store a “record” instead of simply a type:

struct Record {
Archetype *archetype;
int row;
};
unordered_map<EntityId, Record> entity_index;

Here, the row is the array index in the component arrays. If we now want to find component for entity 10, we can do this:

Record& r = entity_index[10];
Type& type = r.archetype->type;
for (int i = 0; i < type.size(); i ++) {
if (type[i] == A) {
return r.archetype->components[i].elements[r.row];
}
}

Note that we took a shortcut here, as we can’t actually use the array operator to get the row from as it is of type . As a result the actual code looks a bit more convoluted, but essentially this is what is happening.

We also want to be able to access our entity ids when iterating over the component arrays. For that we can simply add an additional array to the archetype:

struct Archetype {
Type type;
vector<EntityId> entity_ids; // Entities stored in archetype
vector<ComponentArray> components;
int length;
vector<Edge> edges;
};

This is the big trade off of archetypes: we gain iteration speed at the cost of copying components when adding (and removing) components.

Adding components

If a component is added to an entity, its type changes. Because the type determines in which archetype an entity is stored, this means that we need to move the entity from one archetype to another. This is the big trade off of archetypes: we gain iteration speed at the cost of copying components when adding (and removing) components.

Let’s walk through an example that shows how to add to an entity in archetype . First we need to find archetype , which is our destination. There are various ways to do that, which we will discuss in a bit. Once we have the destination archetype, we need to insert a new row into it. We then copy components and into the new archetype, and remove the entity from the previous archetype. In summary:

  • Find destination archetype
  • Insert entity into component arrays of destination
  • Copy overlapping components from source to destination
  • Remove entity from component arrays of source

This sequence is the same for removing components.

Finding an archetype

An important factor in how well add / remove operations perform is how fast an ECS framework can find an archetype. A simple approach would be to hash the component ids of an archetype, and then use a hash map to lookup an archetype:

using TypeHash = uint64_t;
unordered_map<TypeHash, Archetype*> type_index;

This approach is neither slow nor is it particularly fast:

First we need to create our destination type. If we take our previous example where we add to , we need to create . In our case, where a type is stored as a vector of component ids this is an O(n) operation (to avoid adding duplicates). Then we need to compute the hash, which is another O(n) operation, followed by a hash table lookup which is O(1) on average, and O(n) worst case.

This seems like a lot of work for one of the most common ECS operations.

A faster approach is to build a graph, where each archetype represents a node in the graph, and the components to add / remove are the edges. Consider this example:

An example archetype graph

Each arrow to the right represents an “add” operation, and each arrow to the left represents a “remove” operation. Now, to move from archetype to we simply follow the edge, which is an O(1) operation. When we follow an edge that does not have an archetype yet, we create it.

Let’s add the graph to our previous archetype data structure:

struct Archetype;struct ComponentArray {
void *elements; // vector<T>
int size;
};
struct Edge {
Archetype *add; // traverse right
Archetype *remove; // traverse left
};
struct Archetype {
Type type;
vector<EntityId> entity_ids;
vector<ComponentArray> components;
int length;
vector<Edge> edges;
};

We can now write this function to find our next archetype:

Archetype *node = root;
for (int i = 0; i < type.size(); i ++) {
Edge *edge = &node->edges[type[i]];
if (!edge->add) {
edge->add = create_archetype(node, type[i]);
}

node = edge->add;
}

Note how we use a type to traverse the graph. If we only add a single component to an entity, the type will only contain a single element, but by writing the function this way, we can quickly traverse multiple nodes.

AoS vs. SoA

So far we assumed that an archetype stores an array per component. This is usually referred to as the “struct of arrays” (SoA) approach. The reason it is named like this is because the way data is stored resembles that of a struct with array members:

struct SoA_Archetype {
A a[100];
B b[100];
C c[100];
};

It turns out that SoA works really well in combination with ECS. This is because a system is usually not interested in all components of an entity, and the SoA approach lets us load only the arrays into the CPU cache that we need. There is another valid approach though which is called “array of structs” (AoS). AoS can be represented like this:

struct EntityType {
A a;
B b;
C c;
};
EntityType AoS_ArcheType[100];

This example stores the same data, but it is laid out differently in memory. The biggest difference between AoS and SoA is that with AoS, all data of an entity is loaded into the CPU cache. This may slow down applications when only iterating a few components. AoS can however perform better when randomly accessing components. Flecs uses the SoA approach.

Summary

Our ECS framework is starting to shape up nicely. Let’s combine the data structures of our archetype with those of the previous post:

// Entities, types and entity index
using EntityId = uint64_t;
using Type = vector<EntityId>;
unordered_map<EntityId, Type> entity_index;
// Type flags
const EntityId INSTANCEOF = 1 << 63;
const EntityId CHILDOF = 2 << 62;
// Component data
struct ComponentArray {
void *elements; // vector<T>
int size;
};
// Archetype graph
struct Archetype;
struct Edge {
Archetype *add;
Archetype *remove;
};
struct Archetype {
Type type;
vector<EntityId> entity_ids;
vector<ComponentArray> components;
int length;
vector<Edge> edges;
};

Archetypes are a popular approach that provide good general performance, but they are not the only way to implement an ECS. I hope this gave you some background on their pro’s and con’s.

If you liked what you read, consider checking out the Flecs repository (https://github.com/SanderMertens/flecs) and the Flecs Discord channel!

More reading:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store