Friday, October 3, 2014

Building a Data-Oriented Entity System (Part 3: The Transform Component)

In the last post, I talked generally about the design of components. Today I will focus on a specific real-world component, the TransformComponent.

The Transform Component

The purpose of the TransformComponent is to allow entities to be positioned in the world.

It handles positioning of entities in the world and child-parent linking. For example, you may want to link a “wheel” entity to a “car” entity, so that the wheel follows the car around when it moves.

In that sense, the TransformComponent is what forms the scene graph of an engine world.

Design Decisions

Should every entity have a transform component?

In some engines, every entity has to have a transform component, even if it is just a purely “logical” entity that doesn’t really have a position in the world.

To me it seems strange to force an entity to have a position when that has no real meaning. I also want entities to be as cheap as possible. So it seems better to have the transform component optional, just as any other component. An entity that doesn’t have a transform component doesn’t have any position in the world.

Actually, talking about the world is a bit of misnomer. The Bitsquid engine does not have a single World where everything has to live. Instead you can create multiple worlds, each populated with its own objects. So you might have one world for your “main game”, one world for the “inventory screen”, one world for the “loading screen”, etc.

This is a lot better than having an “inventory room” at some secret hidden place in the main game world.

Each world has its own TransformComponent manager, and an entity can create transform components in several of these managers, if it so desires. So the same entity can exist and be positioned at different places in different game worlds. Since a MeshComponent manager also exists in each world, the entity can have different graphical representations in each world.

This is a bit esoteric, and I don’t expect many entities to make use of this, but there are situations when it could be interesting. For example, a player’s pickaxe could exist both in the “game world” and in the “inventory world” and still be managed as the same entity.

Entity scene graphs and model scene graphs

In the entity system there are really two kinds of “scene graphs” that we need to deal with.

The first is the one we have already talked about, the graph formed by entities and their linked child entities.

The second is the graph of nodes within an entity. For example, a character entity may have a model with hundreds of bones that can be individually animated.

What should the relationship be between these two graphs?

In previous engine code, I have always treated these two graphs as parts of the same system. The model scene graphs were linked to nodes in the entity scene graphs and computed their transforms in world space. This creates an update order dependency. We can’t compute the world positions in the model scene graph until we have computed the world position in the entity scene graph. This limits what kinds of things we can do in parallel.

For the entity system I’ve decided to decouple these two concepts. The model scene graph won’t compute world space poses, instead it will compute poses relative to the entity pose. This means that we can evaluate the animations and compute the model pose without knowing anything about the entity pose. (Ignoring world space constraints, of course, but they will be handled in a later pass.)

Of course it also requires us to multiply the model node transforms with the entity transform to get the actual world position of the model nodes.

I have not completed the design of the model scene graph component yet, but maybe I’ll get a chance to return to this in a future post.

Immediate or deferred updates

In previous engines I have always used deferred updates of the world transforms. I.e., changing the local transform of a node would not immediately update its world transform (or the world transforms of its children). Instead it would simply set a “dirty” flag in the entity. Later, I would compute the world transforms of all the dirty nodes (and their children) as a single step.

This has the advantage that we never have to compute the world transform of a node more than once.

Consider the worst case scenario, a long chain of nodes:

[ node_1 ] ---> [ node_2 ] ---> [ node_3 ] ---> ... ---> [ node_n ]

With a deferred update, changing the local pose of every node will still just require O(n) computations to compute all the world transforms. With an immediate update, where we compute the world transforms of all children as soon as the parent transform changes, we will need O(n^2) computations.

On the other hand, there is a drawback to using deferred updates. Whenever we ask for an object’s world position we won’t get its actual world position, but its world position from the last frame (unless we ask after the world transform update). This can lead to a lot of confusion and subtle bugs. Solving them often requires ugly hacks, such as forcing graph updates at different times.

So what should we choose?

I think that with the decision to decouple the model scene graphs from the entity scene graphs the performance problems of immediate updates are a lot less serious. Long chains of nodes that are all moving can certainly exist in the model scene graph. (Consider an animation of a character swinging a whip.) But I would guess that long chains of objects that are all moving at once are a lot less common in the entity scene graph.

Note that the performance problems do not appear if it is just the root entity that is moving. In that case, both the immediate and the deferred update will be O(n). It is only when the parent and the children are moving that the immediate update does worse.

I don’t expect there to be very long chains of entities (n <= 5 ???) and I don’t expect all of the objects in those chains to be moving simultaneously. So I have decided to go with immediate updates so that we always have accurate world transforms.

Note: If we run into performance problems as a result of this, we can always create an API function that allows us to set multiple local transforms at once while doing a single world transform update, thus getting back the O(n) performance.

A side note on deferred updates

Note that if you want to do deferred updates, you want to keep the entity array sorted so that parents always appear before children. That way you can just walk the array from beginning to end and compute the transforms and be sure that the world transform of a parent has been computed before you compute the world transform of its children.

Also, you don’t want to loop over the entire array to look for dirty objects:

for (int i=0; i<n; ++i) {
    if (dirty[i])

Typically, in a scene, only a small percentage of the objects are moving at any one time (maybe as little as 1 %). So looping over all objects, even just to check a flag, can waste a lot of time.

A better solution is to sort all the dirty objects to the end of the array, so we can loop over just them:

for (int i=first_dirty; i<n; ++i)

Since we only need a partial sorting of the array, we don’t have to run an expensive O(n log n) sorting algorithm. (It would kind of defeat the purpose to run an O(n log n) sort to avoid an O(n) update.) Instead, we can achieve this by judicious swapping.

When a node becomes dirty we move it to the start of the dirty list by swapping it with the element before the dirty list and decreasing first_dirty:

                                 =============== dirty ==============
|   |   |   | D |   |   |   | X |   |   |   |   |   |   |   |   |   |

                             ================= dirty ================
|   |   |   | X |   |   |   | D |   |   |   |   |   |   |   |   |   |

We do the same for all children of the node and the children’s children, etc.

As we process the items in the dirty array, whenever we find a child that has its parent at a later position in the array, we swap the child and the parent.

                             ================= dirty ================
|   |   |   |   |   |   |   |   |   |   | C |   |   | P |   |   |   |

                             ================= dirty ================
|   |   |   |   |   |   |   |   |   |   | P |   |   | C |   |   |   |

This guarantees that parents are always processed before their children.

We also need a way to move items off the dirty list, or it will continue to grow indefinitely. We could clear the list every frame, but that might lead to a lot of swapping as items are moved in and out of the list. A better approach might be to check if an item hasn’t moved in five frames or so, and in that case we move it off the dirty list. This avoids swapping those items which are always moving.

When using the immediate update strategy, sorting the list is not as important, but we can employ similar swapping strategies to make sure that a parent node and its children are kept close together in the array, so that the immediate update is cache friendly.


With the design thought through, there is really not that much to the implementation.

Just as in the last post, we store the transform component data for all instances in a single big memory block:

struct Instance {int i;};

/// Instance data.
struct InstanceData {
    unsigned size;              ///< Number of used entries in arrays
    unsigned capacity;          ///< Number of allocated entries in arrays
    void *buffer;               ///< Raw buffer for data.

    Entity *entity;             ///< The entity owning this instance.
    Matrix4x4 *local;           ///< Local transform with respect to parent.
    Matrix4x4 *world;           ///< World transform.
    Instance *parent;           ///< The parent instance of this instance.
    Instance *first_child;      ///< The first child of this instance.
    Instance *next_sibling;     ///< The next sibling of this instance.
    Instance *prev_sibling;     ///< The previous sibling of this instance.

The parent, first_child, next_sibling and prev_sibling arrays all store instance indexes. We can find all the children of a particular entity by following the first_child link and then the next_sibling links of that link.

We can use that to do the immediate transform update:

void TransformComponent::set_local(Instance i, const Matrix4x4 &m)
    _data.local[i.i] = m;
    Instance parent = _data.parent[i.i];
    Matrix4x4 parent_tm = is_valid(parent) ?[ parent.i ] :
    transform(parent_tm, i);

void TransformComponent::transform(const Matix4x4 &parent, Instance i)
{[i.i] = _data.local[i.i] * p;

    Instance child = _data.first_child[i.i];
    while (is_valid(child)) {
       transform([i.i], child);
       child = _data.next_sibling[child.i];

Note: I’ve written this as a recursive function for easier reading, but you might want to rewrite it as an iterative function for better performance.

Note that when you swap two instances in the array (to do swap-erase or to sort the array as described above), in addition to swapping the entries in the array you also need to take care to keep all the parent, first_child, next_sibling and prev_sibling references intact. This can get a little hairy, especially when you are changing references and trying to walk those lists of references at the same time. My suggestion when you want to swap two instances [A] and [B] is to use the element at the end of the array [size] as a temporary storage slot and instead of trying to do everything at once, use three steps:

// Move element at A (and references to it) to size.
[size] <--- [A]

// Now nothing refers to A, so we can safely move element at B (and references
// to it) to A.
[A] <--- [B]

// And finally move the element at size to B.
[B] <-- [size]

In the next post I’ll look at compiling entities into resource files.


  1. Great post.

    Are you rewriting old engine code to fit this system (eg: physics actors are now a component that can be added to any entity)?

    Looking forward to the next post!

    1. Yes, physics and everything else is being moved into this system.

    2. How do you synchronize position of an entity between physics and transform systems? As far as I understood, in order to get position of an entity we need to do a lookup in hashmap and then in array. It may be not that fast if do this every frame for every entity.
      I mean, if a system (transform or rendering) needs an entity position in order to, for instance, draw it, the system should do the following, right?

      for every entity:
      lookup transformation in transform system;
      use transformation;

      How do you handle this case?


    3. There are two things here imo:

      1) When you change the transform component you also need to update the associated physics body. Otherwise the raycasts or volume queries will not work as expected. This is usually implemented through some query system that channels all query requests. If a transform is changed the underlying body is added (e.g. through some callback system) to a dirty list. Then when you trigger a raycast the broadphase is updated so everything is at the right location.

      2) For updating entities after a physics tick you usually maintain a list of active entities (e.g. not sleeping). You register a callback with the physics system when a body wakes up and add it to the list. Then in the post tick function you synchronize the transform of the active bodies and remove those that are now sleeping. The number of active bodies is usually small and you want to control that number based on performance of the client machine.

    4. As Dirk says... usually the number of things that change is a lot smaller than the total number of things. I.e. there may be 100 000 objects in a level but only 1000 that are currently moving. So key to getting good performance is to only transmit the things that change.

      This means that the renderer (in your example) should not query the position of all objects every frame. Instead the renderer should cache the positions internally. Then every frame it should ask "give me a list of the things that have moved since last frame and their new positions". Depending on how the transform component is implemented, this can be a super fast operation (i.e. it could keep this list already prepared and ready to send all the time, by updating it as objects move).

      // Niklas

    5. Let's follow the rendering<->transform example: that's clear that a Transform and/or Physics system can give a list of "transformed" entities but the Renderer system has to do a lookup in its own entity->component map, right? Because the transformed entity does not even have a renderer component and the ids can be different.

  2. Hey, cool post!

    It's really frustrating in UE and Unity when you are forced to have useless transform information on your entities which don't need physical location in the game world. It just seems like a way to make bugs for no reason. (Also agree that it's odd to keep everything in the same world space. You can avoid some of the problems with this in UE and Unity by using visibility and collisions channels/layers but it feels a bit strange.)

    One thing I've been thinking about recently is the necessity of including scale as a base parameter for game world transform. Most engines have it. But for many types of games, it doesn't get used at all. And for some types of entities, it doesn't make sense to have it. For example, a particle system entity. Location and rotation, that usually makes sense. (Scale? What is the behavior supposed to be for particle systems? What about scale on point and direction lights? Should scale affect the mass of rigid body physics components? What does it even mean?)

    Even with meshes and stuff, it's not too common to be dynamically adjusting scale. And in some rendering pipelines, it has negative performance consequences if you touch the scale parameter.

    So, what happens in Bitsquid when scale is changed on particle systems and lights? And for meshes, does it affect instancing/batching behavior?

    1. Scale doesn't effect instancing/batching. I don't think scale should affect particle systems or lights either, but you could argue that either way.

  3. Great post, Niklas! Thanks for sharing.

    I think two question come up:
    1) How about scale? You use matrices so you allow for non-uniform scale? How about transform decomposition?

    2) How do you deal with attachments? E.g. how do you attach a sword entity to the hand of a character entity? In your terminology you would attach a world hierarchy object into the model hierarchy

    1. 1) Actually I've simplified things a bit for this example. (Because I wanted to focus on a single concept.) For the local transform we actually don't use a matrix, but separate position, rotation and scale values. But we support non-uniform scaling (along the principal axes).

      2) That part isn't completely designed yet. But one way of doing it would be to see that as a "constraint" that is applied after the model/animation update has finished. We would then process all those constraints and
      move the attached objects to the desired positions.

  4. When sorting any component data(or for eg on render queues) , don't you lost the instance index?
    (ps. in my case i´m using a more c++ oriented code, and std::sort)

    1. Yes, the instance indices will change when you move things around. That's why I said you must take care to keep references intact when you swap items. So using std::sort won't work.

  5. Hi, what is the point in having all the local/world transforms in one array? I can not think of any usage, where transforms are accessed in a linear way, so cache is not an issue.

    1. How would you rather organize them? In my opinion an array is the best default option, unless you have strong reasons to pick something else. And cache can certainly be an issue even if the access is not strictly linear. (Such as when walking the list of parent/child objects.)

    2. At the moment I have them in one array too, but it was somehow an automatic decision as everything in my engine is that way. I am playing with an idea to completely remove "transform component" because everyone has its own copy anyway. Rendering store its own copy, animtion system store its own copy, PhysX too. Basically everybody who needs it, caches it. There is only one system that uses transform component - the editor. Anyway, it's just an idea at the moment.

  6. Hi Niklas,
    First, thank you very much for this blog. It is a very precious information source for me. I learn a lot reading it, so again thanks to share all these articles with us.
    I've implemented a transform component system based on your articles. I've been confronted to a crash happening randomly only in Release. It turns out that my Matrix4 structure needed a memory alignment which was not handle by the single big allocation done by the transform manager.
    So my solution was to allocate a bit more memory, using an offset when settings the first array pointer, and using the first arrays to store matrix.
    _data.local_matrix = _data.buffer + offset
    _data.world_matrix = _data.local_matrix + _data.capacity
    That's the first time I'm faced to alignement issue. I'm glad to have found the problem and the solution, but I was wondering how you are handling that.
    I guess your solution is more elegant.

  7. This comment has been removed by the author.

  8. First of all, good article,
    but I would like to ask a design question:

    In case the user set the world transform directly, how do you behave if that transform has a parent? how do you deal with the "world matrix accumulation" if then the user move the transform's parent? the target world transform is overridden? there is a flag to avoid world-overriding? you disconnect the two transform?


    1. We don't allow setting the world transform directly... you can only set the world transform by setting the local transform so that parent * local = the world transform you want.

  9. Firstly, I'm a big fan of this blog. Thank you for continuing to write.

    A common problem I find is how to efficiently process components when they have dependencies on other components. Lets say our LocalSteeringComponent needs its own instance data, the corresponding TransformComponent data and the RigidBodyComponent data during its update.

    A naive approach would be to linearly walk the LocalSteeringComponents and look up the corresponding components it needs. This suffers from the fact the look ups for the corresponding components may thrash cache if there is no ordering coherency or if there is a big multiplicity disparity.

    An improvement would be request a copy of all the required component data from the corresponding component mangers. This still suffers from needing to make "jumps" but given the full required set to the component manager, it may be able to optimize the data aggregation.

    What techniques have you found best mitigates this issue?

    1. Haven't thought that much about this, but two approaches I can think of:

      1) Have each component own data that it needs... i.e. the RigidBodyComponent would have a position. It would only need to synchronize with the TransformComponent when the position changed.

      2) Keep component data sorted by instance ID for each component... that way, we will access data from other components linearly (with jumps) without needing a "gather" step.

    2. Actually, didn't you mean Entity ID instead of instance ID?


  10. Hi Niklas,
    how about reorder the components as depth-first and access them with a for-loop instead of jumping around using hierarchy-info? then jumps occurs only to collect parent's data and reordering needs to be recomputed only when the hierarchy changes, have you try a similar solution? what do you think about? keeping data in depth-first format can be so time consuming than jump here and there result in a better approach/performance (especially when child-parent data are close together in memory)?


  11. Great post, Niklas! Nice to see discussions that actually thinks one step further than just keeping pointers or whatnot. I do have some concerns that I can't wrap my head around though.

    1. Many people seem to imply that systems iterate entities containing the required components. These components could possibly be actual objects of the entity, or they could reside in a manager; which one is used is of no importance, the systems still iterate them. The first variant is obviously not cache friendly, while the second one is, provided the component does not depend on any other component.

    Now, let's consider an audio emitter, which depends on the transform (it has a physical location). An audio system works with entities containing both an emitter as well as a transform. For each emitter, the system would need to find the transform corresponding to the owning entity. Doesn't this kill cache coherency? While the emitters are stored in a linear fashion, the corresponding transforms might be located anywhere.

    The same people that discuss designs not taking cache or memory layout into consideration also seem to
    not consider notifications a problem, arguing that after a translation system is finished, the soon-to-execute audio system would work with updated positions (assuming one goes the route of finding dependent components). However, as soon as multi threading enters the picture, this becomes a problem, because we can no longer be guaranteed that the translation system is finished.

    Considering these points, I think it would be appropritate for components to store everything they need, and have dependent data synchronize with "the one true data". How could one go about implementing this sync without coupling components too much?

    2. I don't quite understand how child components are supported using this transform component. I do understand that it links to other transform components, updating them when updated itself, but this does not in any way take other components into consideration, does it? Is the idea to create two entities -- each with their own transform -- and link those transforms together, letting the two entities have their respective mesh for instance?

    Take a police vehicle as an example (just to try to bind as many parts together as possible). It has a body represented by a mesh, and a translation. On the roof we have a siren, which has a mesh, an audio emitter, and a rotating light, all of which depend on a transform relative to the car. How would this hierarchy be built?

    This has turned into quite a lengthy question, but I believe it deals with several key points that glues it all together.

  12. 1. You have a choice whether you should mirror the data in the component or read it from the transform. The first option gives better cache locality for component updates but requires an extra effort to synchronize the data. You have to choose in each case what you prefer.

    2. In this case I would probably do car -> siren -> light, so that the light can be rotated independently. The mesh and audio would be components on the siren entity and the light would be a component on the light entity.

  13. This is an amazing approach. But I do have a concern about this.

    How would you go about threading this approach? A normal scene graph can easily have multiple strategies for dividing the work into the other cores. Such as the sing thread can run down a branch for a short while, then dispatch entire branches of the trees to the job system. Climb back up, and repeat.

    With the array... I can't imagine that being easy to do effectively without wasting cycles. A parent's children can easily be scattered out in the array. Threading it's update would mean having to throw back jobs into the queue if the parent has not been updated.

    Unless you did something a little more clever? Such as remove attached entities from the update list? And recursively update them?

    1. Addition onto the previous post, because this was an after thought.

      How do you handle attatching objects to a characters bone? Does the entity then become part of the Character's model space?

    2. Why do you think having the parent's children being "scattered in the array" would lead to less efficient multithreading than having them "scattered in memory" which is what you would have with a "traditional" scene graph?

      On the contrary, having an array is a lot better because you could do things such as sorting the array to have all the leaves in the end and then you can have multithreaded workers updating these leaves while accessing memory linearly.

      Anyways, since we use the "immediate" update strategy we don't have a single step where we update the entire scene graph, instead we have lots of small updates and multithreading those is probably not worth it.

    3. This comment has been removed by the author.

    4. I think I see what you are doing now.

      At the start of each entity's turn to be updated, you Update it's Physics then world transforms. And sense you have to multiply the matricies twice on the entity, you run through your logic which determines, then animations.

  14. Sorry but one final question on this topic. First, let me go ahead and say that this blog has been an incredible resource.

    I have spent more time on my projects scene graph, than I probably had on integrating the physics, rendering system for both OpenGL and Directx, and culling combined. The scene graph is relatively important to the engine's scene and model processing, as when files are loaded, it loads up branches of the graph. This is so we can translate to view-space for arbitrarily large scenes. It also acts as an area filter, as the engine allows up to eight characters in a CRPG party to be in various locations across a massive world. The prospect of a small cost paid for eliminating future bugs sounded quite nice.

    I tried implementing a "immediate" update on transforms... and well. I think I broke something. So while the math was right... I wounded up pegging my scene graph for an update transforms N times for each entity that interacts with it. So... what I see is an out of whack erratic movement with some sliding of out of sync animations.

    In an abstract answer, what was your method of approach for an immediate update?

    1. Not so complicated... when the position of a node is changed, immediately update the position of its children by recomputing their world transforms from parent transform & local transform.

  15. Hi Niklas,

    Thanks a lot for this series of articles, a really interesting reading. It's pretty clear to me the benefit of storing packed transform data to improve data access performance.

    Ignoring the dirty list, this works quite well and straightforward if the hierarchy is static.
    I have some doubt on how you handle re-parenting transforms. How do you handle this? What if a node with a possibly deep hierarchy of child changes its parent? How do you move data in such scenario in order to keep the transforms ordered but still packed?

    1. The algorithm outlined in this post still works. As you process the array, if you encounter an item that is sorted earlier than its parent, swap it with the highest of its parent's parents that is sorted after it in the list. If a deep hierarchy is reparented this might potentially lead to a lot of swaps, but that is unavoidable.

  16. Actually I was thinking about this recently again and I was wondering if entity transforms and model bones are distinct classes if you handle both separately? E.g. you cannot make a bone a child of an entity transform?

    I am really torn here. If I look at this from a performance perspective treating both graphs separately makes a lot of sense. But using just a single hierarchy makes many things so much easier and more natural to use (e.g. attachments). Now in retrospect if you compare your old approach to this new one - are you still convinced this was the right decision?

    1. It still feels right to me. I agree it is more conceptually complicated, but for performance I think it is good to differ between two distinct cases:

      * Placement of meshes (shallow hierarchy, little animation)
      * Characters (deep hierarchy, lots of animation)