Friday, March 15, 2013

What is gimbal lock and why do we still have to worry about it?

If you have ever worked with rotations and Euler angles you are probably at least somewhat familiar with the phrase "gimbal lock". But like many things concerning rotations, angles and spaces it can be tricky to visualize and get a good grasp of.

Sometimes it feels like every time I need to think about gimbal lock I have forgotten everything about it and have to go back the beginning and ask myself: OK, but what is it really that is happening?

Hopefully, this article will take care of that problem.

The Wikipedia page shows how gimbal lock can happen in a mechanical system. But it isn't necessarily self-evident how this translates to the computer game world. In the computer there are no mechanical limitations, we can rotate an object however we like. How can anything be "locked"?

Euler angles

When we are using Euler angles, we represent an object's orientation as three consecutive rotations around the object's axes. We can choose the axes and the order in which we apply the rotations arbitrarily, and depending on what we choose we get different Euler representations. So XYZ is the Euler representation where the first angle rotates the object around its X-axis, the second around its (new) Y-axis and the third around its (new) Z-axis. YZX gives us a different representation. We can even have representations with repeated axes, such as XZX.

So if we want to talk about the "Euler angles" of an object, we really must also talk about what axes we are rotating around and in what order. Otherwise we have no idea at all what we are talking about. Unfortunately, many articles about Euler angles are pretty sloppy with this and throw around terms like yaw, pitch and roll as if they had completely well-defined and unambiguous meanings. I prefer to use more wordy, but descriptive names, such as euler_xyz[0] that unambiguously state the axis rotation order and the index of the angle we are talking about.

An object has three rotational degrees of freedom and it is quite easy to see that the three Euler angles for a particular axis order (XYZ) are enough to define any possible orientation of an object. Note though that the representation is not unique. There are many possible Euler angles that represent the same orientation. For example, adding 360 degrees to any of the three angles will give us a different representation that results in the same object orientation.

So "gimbal lock" doesn't mean that there are rotations that can't be expressed as Euler angles. We can express any rotation in Euler angle form. Given an object, we can convert its orientation to Euler angles, and from that orientation we can rotate the object however we like and convert the new orientation to other Euler angles.

So what exactly is it that is "locked"? It seems we can do whatever we like.

The "lock" in gimbal lock

The term "gimbal lock" comes from the mechanical world. If the problem had originated in the world of computers, it would probably have been called something less confusing, such as "Euler angle flip" or "coordinate singularity".

Because, in the computer world, there is really nothing that gets "locked". Instead, the problem is this: when the Euler angles have particular values, there are orientations that are very similar to the current orientation which can't be achieved by just making small changes to the Euler angles. In particular, this happens when one of the angles is at 90 degrees, so that two rotation axes coincide.

So even though the orientations are "close" in the real world, they are not close in the Euler representation. In fact, at least one of the Euler angles will have to flip 180 degrees in order for us to represent the new orientation.

So one of the angles have to flip? What is the big deal? Can't we just flip it and get on with our stuff?

We can, as long as the angles only represent instantaneous "snapshots" of the object's orientation. However, if the angles represent key frames in an animation and we want to interpolate between those key frames we run into trouble. If one of the angles flips 180 degrees between two key frames and we interpolate between those values, we will see the the object animating through all those 180 degrees. In the viewport, we will see the object doing a "flip" or "roll" that shouldn't be there.

Note that it is only the interpolation that creates this unwanted behavior. If we just displayed the actual key frames and didn't interpolate between them -- everything would look right. We could work in Euler angles as much as we liked and be as close to the gimbal lock position as we wanted and no-one would ever know.

So the only thing we need to fix to get rid of gimbal lock is the interpolation. If you have done any work with 3D graphics you probably already know the answer -- to use quaternions instead of Euler angles to represent angles. Quaternions don't have the weird singularity points that Euler angles have and we can interpolate between any keyframes by just lerping the quaternions.

It doesn't matter if the animation package is using Euler angles internally, as long as we convert everything to quaternions before we do the interpolation. Note that interpolation in quaternion space is not the same as interpolation in Euler space though, so to get as close as possible to what the animator intended, we probably want to sample the animation at our target frame rate and generate our quaternion key frames from those samples, rather than directly converting the animator's key frames (which may be further apart).

Well, there is one caveat actually. If we have more than 180 degrees of rotation in a single frame we can't represent that nicely with quaternions. Quaternions always lerp the shortest path between two orientations and you can't represent several "laps" of rotation with quaternions as you can do with Euler angles (by setting one of the angles to 9000 degrees, for example). But you can fix that by sampling at a higher frame rate if you need to represent really fast rotations with quaternions.

So with that we can say good bye and good riddance to Euler angles and never have to worry about their sorry gimbal locking asses ever again.

Or so you may think...

The return of gimbal lock

I certainly thought so, until I started working on the new cutscene animation system for our level editor.

You see, animators really like to work with curves. They like to see a visual representation of what the animation will do to an object over time with key points that can be moved and handles that can be adjusted to change the slope of the curve.

Curves! Animators love them!

Quaternions are great for interpolation, but they are no good for curve editing.

Sure, you could probably draw some curves that represented a quaternion (the laziest thing would be to just draw the x, y and z components of the quaternion), but those curves wouldn't mean anything to an animator, the way the Euler angle curves do. They wouldn't be able to do anything with them.

So, animators want curves with keyframe interpolation. Curves need Euler angles. But what happens when we mix Euler angles with keyframe interpolation? Presto! Our old friend the gimbal lock is back again! Haven't we missed him.

That's it. We're stuck. Gimbal lock is here to stay and the animators will just have to work around it.

And we have to add support for all the usual tricks and workarounds that animators use to get around gimbal lock, such as changing the axis order (from XYZ, to XYZ, XZX or another of the twelve possible permutations), converting to quaternion and back again, applying an "Euler filter", etc.

But who said this game engine gig should be easy?

Friday, March 1, 2013

A Bug in Object Replication and Message Reordering

The Bitsquid network system supports a peer-to-peer model with migration of network objects -- i.e., changing the owner of a network object from one peer to another. This recently lead to an rare race condition.

To understand this bug you must first understand a little bit about how our network system works.

Background

The entire network model is based on an packet delivery system (on top of UDP) that provides ACKs for unreliable packets as well as a reliable (and ordered) packet stream between any two network endpoints. At the next layer we have implemented a remote-procedure-call service for Lua as well as an object replication system.

Games can use these services however they like, but our recommendation is to do as much as possible with the object replication system and as little as possible with RPC calls, since using explicit RPC messages tends to require more bandwidth and be more error prone.

The network be run in both client-server and peer-to-peer mode. The only difference is that in client-server mode, the server relays all messages (clients never talk directly to each other) and owns most of the network objects. (Clients can own objects in client-server mode, in that case the changes to the objects are relayed by the server.)

Objects are replicated with a message stream that looks something like this:

A: CREATE [wait for ack] UPDATE_1 UPDATE_2 ... UPDATE_n DESTROY

Here, A (the owner of the object) first sends a reliable message that creates the object. When it has received an ACK for that message, it starts sending updates, informing the other players about changes to the object. (By monitoring ACKs, it knows which changes the other players have received, so it only sends updates when necessary and it will resend updates if the messages are lost.) Finally, at some future point, the object is destroyed, through another reliable message.

The UPDATE messages are sent on the unreliable stream (for maximum performance), so they can potentially arrive before CREATE or after DELETE. But this is not a problem, because we simply ignore UPDATE messages that arrive out of order.

This approach ensures that everybody that participates in the game session will see the same set of objects with the same properties (at least eventually, keeping in mind that messages can be delayed).

Migration

Migration complicates this picture somewhat.

Migrating a network object means changing the owner of the object from one peer to another. There are a number of reasons why you might want to do that. First, if a player drops out of the game, the objects owned by that player may need to be taken over by somebody else. Second, in a peer-to-peer game we may want to load balance, so that each peer is managing about the same amount of objects. Finally, sometimes a particular player is interacting directly with a particular object (picking up a rock, etc). It can then be beneficial to make that player owner of the object, so that the interaction is not affected by network latency.

In our network, migration is implemented with a reliable MIGRATION message that tells everybody in the session about the object's new owner. The migration message is always sent by a special peer, the HOST of the game session. (To ensure that peers do not compete for the ownership of an object.)

So if we look at a message stream with migration involved, it looks something like this:

   A:  C Ua Ua Ua Ua Ua
HOST:                    M_ab
   B:                          Ub Ub Ub Ub Ub Ub Ub Ub D

If you are an experienced network programmer you should start to smell trouble at this point.

The problem is that while the message system provides an ordered stream of messages between any two endpoints, there is no ordering of messages between different endpoints.

Consider an additional network peer X. There is an ordered stream of messages A → X. There is also an ordered stream of messages B → X. But there is no guaranteed ordering between the messages sent from A and the messages sent from B and HOST. So, suppose the messages from A → X are delayed. Then X could see the following message stream:

M_ab Ub Ub Ub D C Ua Ua Ua

So X gets a request to migrate the object before it has been created. And the creation message arrives after DELETE. In other words, a complete mess.

To be sure, this only happens if the object gets migrated really close to being created or deleted and if there are asymmetric network delays on top of that. But of course, it always happens to someone.

The Fix

There are many possible ways of fixing this. Here are some:

  • We could impose a global message ordering. We could make sure that the reliable message streams are globally ordered to prevent "paradoxes" of this kind. I.e., if HOST sends M_ab after receiving C, no peer should receive M_ab before C. Unfortunately, this is not as easy as it sounds. For example, what if A dies before it has sent C to X? In that case, that failed delivery will also block the channels HOST → X and B → X, since they are not allowed to deliver any messages before X has received C.

  • We could use a migration handshake. We could do some kind of handshake procedure to make sure that everybody has received M_ab, before B takes over ownership. But this would require a lot of extra messages and temporarily put the object in limbo.

  • We could fix the ACKs. We could make it so that X doesn't ACK M_ab until C has arrived, thus forcing HOST to keep resending it, until we are ready to receive it. This would work, but would require us to implement ACKing of individual messages. Currently, we just ACK an entire UDP packet (containing many messages) on reception, which is simpler and more performant.

  • We could create an internal message queue. We could queue up migration, create and delete messages in some sort of internal queue if they arrive out of order and try to fix things up later. This is a truly horrible "solution" that increases code complexity and is likely to cause lots of confusing bugs in the future.

All these solutions are probably workable, but they all have the drawback of increasing complexity. And I really don't like to increase the complexity of network code. Reasoning about network code is hard enough as it is, we should always strive for the simplest solution possible.

So, instead, the first thing I did was to simplify the problem by eliminating the host from the equation. I simply let the new owner send out the migration message instead of the host:

   A:  C Ua Ua Ua Ua Ua
   B:                    M_ab Ub Ub Ub Ub Ub Ub Ub Ub D

This is already a lot better. Now we only have two parties to worry about (apart from X), instead of three.

We still want the host to be in charge of migration. Otherwise we run into tricky problems of what should happen if several peers try to assume ownership of an object at the same time. So we let the host initiate the migration by sending a message to the new owner (B). Then, B is responsible for notifying everybody else about this.

With this approach, we can use the same "wait for ack" trick that we used during creation to make sure that B doesn't send any updates to peers that haven't acked the migration:

   A:  C [wait] Ua Ua Ua Ua Ua
   B:                            M_ab [wait] Ub Ub Ub Ub Ub Ub Ub Ub D

We still haven't completely solved the problem, X can still see weird message orderings such as:

M_ab   C   D
M_ab   D   C

But this won't be a problem as long as we do two things:

  • We let MIGRATE act as a CREATE message, if we get MIGRATE for an object that doesn't exist.

  • We ignore "old" CREATE messages. (The C that arrives after M.)

To be able to distinguish old messages I introduced a migration counter. This is just a number that starts at zero when the object is created and is increased (by HOST) every time the object is migrated.

We tag all CREATE, DESTROY and MIGRATE messages with the migration counter and simply ignore "old" messages. With this approach, the message streams will look like this:

   A:  C_0 [wait] Ua Ua Ua Ua Ua
   B:                             M_ab_1 [wait] Ub Ub Ub Ub Ub Ub Ub Ub D_1

We can now verify that all possible message orderings that X can see work correctly:

C_0      M_ab_1  D_1  -- ok, the expected order
M_ab_1   C_0     D_1  -- ok, M_ab_1 creates the object with migration counter 1 and C_0 is ignored
M_ab_1   D_1     C_0  -- ok, M_ab_1 creates the object with migration counter 1 and C_0 is ignored

The system works equally well if there are multiple migration steps:

   A:  C_0 [wait] Ua Ua 
   B:                   M_ab_1 [wait] Ub Ub Ub
   C:                                            M_bc_2 [wait] Uc Uc Uc D_2

No matter in which order the messages arrive we will end up in the correct state.

Tuesday, February 19, 2013

Why Lua?

A question that I get asked regularly is why we have chosen Lua as our engine scripting language. I guess as opposed to more well-known languages, such as JavaScript or C#. The short answer is that Lua is lighter and more elegant than both those languages. It is also faster than JavaScript and more dynamic than C#.

When we started Bitsquid, we set out four key design principles for the engine:

  • Simplicity. (A small, manageable codebase with a minimalistic, modular design.)

  • Flexibility. (A completely data-driven engine that is not tied to any particular game type.)

  • Dynamism. (Fast iteration times, with hot reload of everything on real target platforms.)

  • Speed. (Excellent multicore performance and cache-friendly data-oriented layouts.)

Whenever we design new systems for the engine, we always keep these four goals in mind. As we shall see below, Lua does very well on all four counts, which makes it a good fit for our engine.

Simplicity in Lua

As I grow older (and hopefully more experienced) I find myself appreciating simplicity more and more. My favorite scripting language has gone from "Swiss army chainsaw" Perl (I claim youthful ignorance!) to "kitchen-drawer-esque" Ruby, to minimalistic Lua.

Lua is really small for a programming language. The entire Lua syntax fits on a single page. In fact, here it is:

chunk ::= {stat [`;´]} [laststat [`;´]]
block ::= chunk
stat ::=  varlist `=´ explist | 
     functioncall | 
     do block end | 
     while exp do block end | 
     repeat block until exp | 
     if exp then block {elseif exp then block} [else block] end | 
     for Name `=´ exp `,´ exp [`,´ exp] do block end | 
     for namelist in explist do block end | 
     function funcname funcbody | 
     local function Name funcbody | 
     local namelist [`=´ explist] 
laststat ::= return [explist] | break
funcname ::= Name {`.´ Name} [`:´ Name]
varlist ::= var {`,´ var}
var ::=  Name | prefixexp `[´ exp `]´ | prefixexp `.´ Name 
namelist ::= Name {`,´ Name}
explist ::= {exp `,´} exp
exp ::=  nil | false | true | Number | String | `...´ | function | 
     prefixexp | tableconstructor | exp binop exp | unop exp 
prefixexp ::= var | functioncall | `(´ exp `)´
functioncall ::=  prefixexp args | prefixexp `:´ Name args 
args ::=  `(´ [explist] `)´ | tableconstructor | String 
function ::= function funcbody
funcbody ::= `(´ [parlist] `)´ block end
parlist ::= namelist [`,´ `...´] | `...´
tableconstructor ::= `{´ [fieldlist] `}´
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= `[´ exp `]´ `=´ exp | Name `=´ exp | exp
fieldsep ::= `,´ | `;´
binop ::= `+´ | `-´ | `*´ | `/´ | `^´ | `%´ | `..´ | 
     `<´ | `<=´ | `>´ | `>=´ | `==´ | `~=´ | 
     and | or
unop ::= `-´ | not | `#´

The same minimalistic philosophy is applied across the entire language. From the standard libraries to the C interface to the actual language implementation. You can understand all of Lua by just understanding a few key concepts.

Lua's simplicity and size does not mean that it lacks features. Rather it is just really well designed. It comes with a small set of orthogonal features that can be combined in lots of interesting ways. This gives the language a feeling of elegance, which is quite rare in the programming world. It is not a perfect language (perfect languages don't exist), but it is a little gem that fits very well into its particular niche. In that way, Lua is similar to C (the original, not the C++ monstrosity) -- it has a nice small set of features that fit very well together. (I suspect that Smalltalk and LISP also have this feeling of minimalistic elegance, but I haven't done enough real-world programming in those languages to really be able to tell.)

As an example of how powerful Lua's minimalism can be, consider this: Lua does not have a class or object system, but that doesn't matter, because you can implement a class system in about 20 lines or so of Lua code. In fact, here is one:

function class(klass, super)
    if not klass then
        klass = {}
        
        local meta = {}
        meta.__call = function(self, ...)
            local object = {}
            setmetatable(object, klass)
            if object.init then object:init(...) end
            return object
        end
        setmetatable(klass, meta)
    end
    
    if super then
        for k,v in pairs(super) do
            klass[k] = v
        end
    end
    klass.__index = klass
    
    return klass
end

If you prefer prototype based languages -- no problem -- you can make a prototype object system in Lua too.

Smallness and simplicity makes everything easier. It makes Lua easier to learn, read, understand, port, master and optimize. A project such as LuaJIT -- created by a single developer -- would not have been possible in a more complicated language.

Flexibility in Lua

Lua is a fully featured language, and in the Bitsquid engine, Lua is not just used as an extension language, rather it has direct control over the gameplay loop. This means that you have complete control over the engine from Lua. You can create completely different games by just changing the Lua code. (Examples: First person medieval combat War of the Roses, top-down RTS Krater, beat-em-up platformer Showdown and hand-held puzzler Hamilton.)

Dynamism in Lua

Unlike C#, which only has limited support for Edit and Continue, Lua makes it possible to reload everything -- the entire program -- on all target platforms, including consoles, mobiles and tablets.

This means that gameplay programmers can work on the code, tweak constants, fix bugs and add features without having to restart the game. And they can do this while running on the real target hardware, so that they know exactly what performance they get, how the controls feel and how much memory they are using. This enables fast iterations which is the key to increasing productivity and improving quality in game development.

Speed of Lua

Measuring the performance of a language is always tricky, but by most accounts, LuaJIT 2 is one of the fastest dynamic language implementations in the world. It outperforms other dynamic languages on many benchmarks, often by a substantial margin.

On the platforms where JITting isn't allowed, LuaJIT can be run in interpreter mode. The interpreter mode of LuaJIT is very competitive with other non-JITed language implementations.

Furthermore, Lua has a very simple C interoperability interface (simplified further by LuaJIT FFI). This means that in performance critical parts of the code it is really easy to drop into C and get maximum performance.

Lua's weak points

As I said above, no language is perfect. The things I miss most when programming in Lua don't have that much to do with the actual language, but rather with the ecosystem around it. C# has spoiled me with things like an integrated debugger, Intellisense, a very active StackOverflow community and the wonderfully helpful ReSharper. Lua has no "official" debugger, and not much in the way of autocompletion or refactoring tools.

Some people would argue that this shouldn't be counted as an argument against Lua, since it doesn't really concern the language Lua. I disagree. A language is not a singular, isolated thing. It is part of a bigger programming experience. When we judge a language we must take that entire experience into account: Can you find help in online forums? Are there any good free-to-use development tools? Is the user base fragmented? Can you easily create GUIs with native look-and-feel? Etc.

The lack of an official debugger is not a huge issue. Lua has an excellent debugging API that can be used to communicate with external debuggers. Using that API you can quite easily write your own debugger (we have) or integrate a debugger into your favorite text editor. Also, quite recently, the Decoda IDE was open sourced, which means there is now a good open source debugger available.

Getting autocompletion and refactoring to work well with Lua is trickier. Since Lua is dynamically typed the IDE doesn't know the type of variables, parameters or return values. So it doesn't know what methods to suggest. And when doing refactoring operations, it can't distinguish between methods that have the same name, but operate on different types.

But I don't think it necessarily has to be this way. An IDE could do type inference and try to guess the type of variables. For example, if a programmer started to write something like this:

local car = Car()
car:

the IDE could infer that the variable car was of type Car. It could then display suitable autocompletion information for the Car class.

Lua's dynamic nature makes it tricky to write type inference code that is guaranteed to be 100 % correct. For example, a piece of Lua code could dynamically access the global _G table and change the math.sin() function so that returned a string instead of a number. But such examples are probably not that common in regular Lua code. Also, autocompletion backed by type inference could still be very useful to the end user even if it wasn't always 100 % correct.

Type inference could be combined with explicit type hinting to cover the cases where the IDE was not able to make a correct guess (such as for functions exposed through the C API). Hinting could be implemented with a specially formatted comment that specified the type of a variable or a function:

-- @type Car -> number
function top_speed(car)
    ...
end

In the example above, the comment would indicate that top_speed is a function that takes a Car argument and returns a number.

Type hinting and type inference could also be used to detect "type errors" in Lua code. For example, if the IDE saw something like this:

local bike = Bicycle()
local s = top_speed(bike)

it could conclude that since bike is probably a Bicycle object and since top_speed expects a Car object, this call will probably result in a runtime error. It could indicate this with a squiggly red line in the source code.

I don't know of any Lua IDE that really explores this possibility. I might try it for my next hack day.

Thursday, January 31, 2013

Garbage Collection and Memory Allocation Sizes

As a performance conscious programmer in a soft-realtime environment I've never been too fond of garbage collection.

Incremental garbage collectors (like the one in Lua) make it tolerable (you get rid of the horrible garbage collection stalls), but there is still something unsettling about it. I keep looking at the garbage collection time in the profiler, and I can't shake the feeling that all that time is wasted, because it doesn't really do anything.

Of course that isn't true. Garbage collection frees the programmers from a lot of busywork. And the time they gain can go into optimizing other systems, which leads to a net performance win.

It also simplifies some of the hairy ownership questions that arise when data is transferred between systems. Without garbage collection, those questions must be solved in some other way. Either by reference counting (error-prone) or by making local copies of the data to assume ownership (ugly and costly).

But still, there is that annoying performance hit.

I was pretty surprised to see that the developers Go, a language that looks well-designed and targets low-level programmers, decided to go with garbage collection rather than manual memory management. It seemed like a strange choice.

But recently I've started to see things differently.

One thing I've noticed as I delve deeper and deeper into data-oriented design is that I tend to allocate memory in much larger chunks than before. It's a natural consequence of trying to keep things continuous in memory, treating resources as large memory blobs and managing arrays of similar objects together.

This has interesting consequences for garbage collection, because when the garbage collector only has to keep track of a small number of large chunks, rather than a large number of small chunks, it can perform a lot better.

Let's look at a simple example in Lua. Say we want to write a class for managing bullets. In the non-data-oriented solution, we allocate each bullet as a separate object:

function Bullet:update(dt)
    self.position = self.position + self.velocity * dt
end

function Bullets:update(dt)
    for i,bullet in ipairs(self.bullets) do
        bullet:update(dt)
    end
end

In the data-oriented solution, we instead use two big arrays to store the position and velocity of all the bullets:

function Bullets:update(dt)
    for i=1,#self.pos do
        self.pos[i] = self.pos[i] + dt * self.vel[i]
    end
end

I tested these two solutions with a large number of bullets and got two interesting results:

  • The data-oriented solution runs 50 times faster.

  • The data-oriented solution only needs half as much time for garbage collection.

That the data-oriented solution runs so much faster shows what cache coherence can do for you. It is also a testament to how awesome LuaJIT is when you give it tight inner loops to work with.

Note that in this test, the Bullet code itself did not create any garbage. The speed-up comes from being faster at collecting the garbage created by other systems. And the reason for this is simply that with fewer, larger memory allocations, there is less stuff that the garbage collector has to trawl through. If we add in the benefit that the data-oriented solution will create fewer objects and generate less garbage, the benefits will be even greater.

So maybe the real culprit in isn't garbage collection, but rather having many small memory allocations. And having many small memory allocations does not just hurt the garbage collector, it is bad for other reasons as well. It leads to bad cache usage, high overhead in the memory allocator, fragmentation and bad allocator performance. It also makes all kinds of memory problems harder to deal with: memory leaks, dangling pointers, tracking how much memory is used by each system, etc.

So it is not just garbage-collected languages like Lua that would benefit from allocating memory in larger chunks, but manually managed languages like C++ as well.

Recently, I've come to think that the best solution to memory management issues in C++ is to avoid the kitchen-sink global memory allocator as much as possible and instead let each subsystem take a much more hands-on approach to managing its own memory.

What I mean by this is that instead of having the sound system (for example) send lots of memory requests to the kitchen-sink memory manager, it would only request a few large memory blocks. Then, it would be the responsibility of the system to divide that up into smaller, more manageable pieces that it can make practical use of.

This approach has a number of advantages:

  • Since the system knows the usage patterns for its data, it can arrange the memory efficiently. A global memory allocator has no such knowledge.

  • It becomes much easier to track memory use by system. There will be a relatively small number of global memory allocations, each tagged by system. It becomes obvious how much memory each system is consuming.

  • Memory inside a system can be easily tracked, since the system knows what the memory means and can thus give useful information about it (such as the name of the object that owns it).

  • When a system shuts down it can quickly and efficiently free all of its memory.

  • Fragmentation problems are reduced.

  • It actively encourages good memory behavior. It makes it easier to do good things (achieve cache locality, etc) and harder to do bad things (lots of small memory allocations).

  • Buffer overflows will tend to overwrite data within the same system or cause page faults, which will make them easier to find.

  • Dangling pointer access will tend to cause page faults, which will make them easier to find.

I'm tempted to go so far as to only allow whole page allocations on the global level. I.e., a system would only be allowed to request memory from the global manager in chunks of whole system pages. Then it would be up to the system to divide that up into smaller pieces. For example, if we did the bullet example in C++, we might use one such chunk to hold our array of Bullet structs.

This has the advantage of completely eliminating external fragmentation. (Since everything is allocated in chunks of whole pages and they can be remapped by the memory manager.) We can still get address space fragmentation, but using a 64-bit address space should take care of that. And with this approach using 64-bit pointers is less expensive, because we have fewer individually allocated memory blocks and thus fewer pointers.

Instead we get internal fragmentation. If we allocate the bullet array as a multiple of the page size (say 4 K), we will on average have 2 K of wasted space at the end of the array (assuming the number of bullets is random).

But internal fragmentation is a much nicer problem to deal with than external fragmentation. When we have internal fragmentation, it is one particular system that is having trouble. We can go into that system and do all kinds of things to optimize how its handling memory and solve the problem. With external fragmentation, the problem is global. There is no particular system that owns it and no clear way to fix it other than to try lots of things that we hope might "improve" the fragmentation situation.

The same goes for out-of-memory problems. With this approach, it is very clear which system is using too much memory and easy to fix that by reducing the content or doing optimizations to that system.

Dealing with bugs and optimizations on a system-by-system simplifies things enormously. It is quite easy to get a good grasp of everything that happens in a particular system. Grasping everything happens in the entire engine is a superhuman task.

Another nice thing about this approach is that it is quite easy to introduce it on a system-by-system basis. All we have to do is to change one system at a time so that it allocates its memory using the page allocator, rather than the kitchen-sink allocator.

And if we have some messy systems left that are too hard to convert to this approach we can just let them keep using the kitchen-sink allocator. Or, even better, we can give them their own private heaps in memory that they allocate from the page allocator. Then they can make whatever mess they want there, without disturbing the other systems.

Tuesday, December 11, 2012

Four meditations on bad design decisions

I've recently been doing a major rewrite of one of our core engine systems, the graph that we use for our visual scripting language Flow. Taking it from something that looks like this:

To something that looks like this:

A major rewrite like this is always a humbling experience. When you have to rewrite your own code, every bad decision you made comes back to haunt you. And you don't have anybody else to blame them on.

As if facing your own inadequacy wasn't enough -- rewriting an existing system is always harder than writing one from scratch. When you write a new system you start with a blank slate and can do whatever you want. When you rewrite, you are constrained by what the old system did -- at least if you want to maintain any kind of backwards compatibility.

In addition, a new system can be written iteratively. You can start with a very small, simple system, release early and get feedback. Based on that feedback you can tweak the system. You don't have to think about adding features until you have a good stable base.

When you are doing a rewrite you can't release the new system until it is at least as good as the old one. Otherwise, your users will question why you have spent all that time working on a system that is worse than what you had before. And they will be right.

So a rewrite forces you away from the comfortable land of early releases and quick iterations and into the ugly old waterfall model.

With the power of hindsight, I'd like to reflect a bit on four design mistakes I made when I wrote the first version of the system that made this rewrite a lot harder than it could have been.

Don't use strings for non-text things

Strings have one really good use -- to hold pieces of text that either gets displayed to or inputted by the user. All other use of strings should be regarded as suspicious.

Strings are scary because they are both ambiguous and powerful. Does "a/b.txt" and "A//b.txt" represent the same path? Hard to tell. But maybe you can use case conversion, search and replace and some regular expression monstrosity to figure that out.

If you are doing that kind of string manipulation in any part of the code that is not directly related to user input or output, it is a clear warning sign that your code might be too "stringified".

The most obvious example stringified code is the use of "stringly typed" data, for example, storing a date as the string "2012-12-09". But the problem with strings can also manifest more subtle ways.

The original version of Flow used strings to identify connectors, both internally (as a representation of the connection) and visually (to show the name of the connector):

As a consequence, a Flow node couldn't have two connectors with the same name, and a connector couldn't be renamed (even visually) without breaking all existing connections.

In retrospect, rather than having a single Name property, it would be much better to have separate Id and DisplayName properties. The Id would be a GUID that uniquely identified the property, and the DisplayName would be a (localizable) name, suitable for displaying to the end user.

Using names/strings as identifiers has bitten me in other ways as well. In one system I knew that the names had to be unique (because that is how the script would refer to the objects) so I thought it would be safe to use them as identifiers. What I didn't consider was that there could be situations when there temporarily were two objects that had the same name. For example, if the user had created a rock object, and wanted to create a rock_small object -- as she was half-way through typing that name, there would suddenly be two objects named rock. This created problems for the system.

Lesson learned, I now avoid using strings as identifiers.

When in doubt, you should opt-out

Every system acquires features over time. That is good of course. Those features make the system more powerful and easier to work with.

But among the good features there are usually a few that don't feel quite right. That don't really fit into the design of the system. You can do them of course. You can do anything.

But usually it is best not to. Most of the time when I have added a feature that didn't quite feel right, I have regretted it later. In retrospect it would have been better to try to find a different way of doing what the users wanted that was more natural to the ideas behind the system.

An example: Users of Flow wanted some way to specify the order in which events were triggered, when multiple connections are connected to the same Out connector. This is needed in some situations, for example you may want to make sure that a unit is spawned before it is used.

In the old version of Flow, this was implemented with a context menu on the connection where you could select if it should be a "Do First", "Do Last" or "Do Normal" connection.

This solution never felt 100 % right to me. It was hard to find a good intuitive way to visually represent the "Do First" and "Do Last" connections, and as a result the Flow graphs became harder to understand.

In retrospect, it would have been much better to avoid this feature and wait until I had come up with the more elegant alternative: a sequence node that triggers each of its outputs sequentially:

Be explicit or you'll miss it

Writing code where a lot of things happen implicitly feels great -- to begin with. It is amazing how much you are able to do with just a few lines of code.

But in my experience, implicit code almost always ends up more costly in the long run. It is harder to understand, harder to debug and harder to change. It tends to lock you down in a "local minimum" that can be tricky to come out of.

In Flow, a lot of things are done implicitly. The definition of a Flow node is just a simple C# class:

[Category("Animation")]
public class AnimationEvent : Node
{
    public InVariableUnit Unit;
    public StringVariable Event;
    public InEvent In;
    public OutEvent Out;
}

Through reflection, Flow finds out the members in the class and their types and automatically generates Flow nodes for them. This process involves some ugly string processing (bad decision #1), such as stripping In and Variable from the type name to find the underlying type of members. Reflection is also used to serialize the graphs.

While it is nice to be able to express a node so concisely, there are also a lot of problematic consequences. For example, since the class names get serialized, we can't change the names of classes or properties without breaking the ability to load old files. Also, we have to use some really ugly C# hacks to make sure that the reflection system always returns the members of a class in the order they are declared in the file (so that we can control the order of the connectors).

In retrospect, it would been much better to avoid all this clever reflection stuff and instead just define the node types in configuration files.

Avoid the road of complex code

There is some code that needs to be complex, because it is dealing with fundamentally tricky stuff (like computational geometry) or because it needs to run really, really fast. But in all other cases, complexity is just a cost.

If your code starts to feel complex and hard to keep track of, it is a sign that you are probably doing something wrong. And if you are not careful, you may lock yourself in, so that when you write the next version of the system, you have to recreate all that complex behavior in your new, simpler system. You have to deliberately make your code uglier.

The old version of Flow had a way of "folding" nodes. You could select a number of nodes, group them, and then "fold" the group, collapse it to a single node.

The system had a lot of really hairy code for dealing with this. The code takes a bunch of nodes and creates a new node from them, with connectors matching only the external connectors of the collapsed nodes. it also keeps track of the internal nodes and their connections so they can be recreated if the node is later "expanded".

As you might imagine, this was complicated further by the need for connector names to be unique (see bad decision #1), which meant that some of the external connectors in the new node had to be renamed (since they could come from different internal nodes that had connectors with the same name). So a mapping table was needed to keep track of these renames. Obviously a bad idea, but once you have started down the path of wrongness, it can be hard to turn around.

The new version handles this a lot better. Collapse and expansion is just a visual feature. There are no new nodes created and no other strange things happening to the data, the visualizer just chooses to draw the data in a different way when it is collapsed. In retrospect, that is a much better choice.

That is all, four simple lessons
to guide your future coding sessions
now let your code be light and merry
until its time for Charon's ferry

Wednesday, November 21, 2012

A Formal Language for Data Definitions

Lately, I've started to think again about the irritating problem that there is no formal language for describing binary data layouts (at least not that I know of). So when people attempt to describe a file format or a network protocol they have to resort to vague and nondescript things like:

Each section in the file starts with a header with the format:

4 bytes   header identifier
2 bytes   header length
0--20 bytes  extra data in header

The extra data is described below.

As anyone who has tried to decipher such descriptions can testify, they are not always clear-cut, which leads to a lot of unnecessary work when trying to coax data out of a document.

It is even worse when I create my own data formats (for our engine's runtime data). I would like to document those format in a clear and unambiguous way, so that others can understand them. But since I have no standardized way of doing that, I too have to resort to ad-hoc methods.

This whole thing reminds me of the state of mathematics before formal algebraic notation was introduced. When you had to write things like: the sum of the square of these two numbers equals the square of the previous number. Formal notation can bring a lot of benefits (just look at what it has done for mathematics, music, and chess).

For data layouts, a formal definition language would allow us to write a tool that could open any binary file (that we had a data definition) for and display its content in a human readable way:

height = 128
width = 128
comment = "A funny cat animation"
frames = [
 {display_time = 0.1 image_data = [100 120 25 ...]}
 ...
]

The tool could even allow us to edit the readable data and save it back out as a binary file.

A formal language would also allow debuggers to display more useful information. By writing data definition files, we could make the debugger understand all our types and display them nicely. And it would be a lot cleaner than the hackery that is autoexp.dat.

Just to toss something out there, here's an idea of what a data definition might look like:

typdedef uint32_t StringHash;

struct Light
{
 StringHash name;
 Vector3  color;
 float  falloff_start;
 float   falloff_end;
};

struct Level
{
 uint32_t version;
 uint32_t num_lights;
 uoffset32_t light_data_offset;

 ...

light_data_offset:
 Light lights[num_lights];
};

This is a C-inspired approach, with some additions. Array lengths can be parametrized on earlier data in the file and a labels can be used to generate offsets to different sections in the file..

I'm still tossing around ideas in my head about what the best way would be to make a language like this a reality. Some of the things I'm thinking about are:

Use Case

I don't think it would do much good to just define a langauge. I want to couple it with something that makes it immediately useful. First, for my own motivation. Second, to provide a "reality check" to make sure that the choices I make for the language are the right ones. And third, as a reference implementation for anyone else who might want to make use of the language.

My current idea is to write a binary-to-JSON converter. I.e., a program that given a data definition file can automatically convert back and forth between a binary and a JSON-representation of that same data.

Syntax

The syntax in the example is very "C like". The advantage of that is that it will automatically understand C structs if you just paste them into the data definition file, which reduces the work required to set up a file.

The disadvantage is that it can be confusing with a language that is very similar to C, but not exactly C. It is easy to make mistakes. Also, C++ (we probably want some kind of template support) is quite tricky to parse. If we want to add our own enhancements on top of that, we might just make a horrible mess.

So maybe it would be better to go for something completely different. Something Lisp-like perhaps. (Because: Yay, Lisp! But also: Ugh, Lisp.)

I'm still not 100 % decided, but I'm leaning towards a restricted variant of C. Something that retains the basic syntatic elements, but is easier to parse.

Completeness

Should this system be able to describe any possible binary format out there?

Completeness would be nice of course. It is kind of annoying to have gone through all the trouble of defining language and creating the tools and still not be able to handle all forms of binary data.

On the other hand, there are a lot of different formats out there and some of them have a complexity that is borderline insane. The only way to be able to describe everything is to have a data definition language that is Turing complete and procedural (in other words, a detailed list of the instructions required to pack and unpack the data).

But if we go down that route, we haven't really raised the abstraction level. In that case, why even bothering with creating a new language. The format description could just be a list of the C instructions needed to unpack the data. That doesn't feel like a step forward.

Perhaps some middle ground could be found. Maybe we could make language that was simple and readable for "normal" data, but still had the power to express more esoteric constructs. One approach would be to regard the "declarative statements" as syntactic sugar in a procedural language. With this approach, the declaration:

struct LightCollection
{
 unsigned num_lights;
 LightData lights[num_lights];
};

Would just be syntactic sugar for:

function unpack_light_collection(stream)
 local res = {}
 res.num_lights = unpack_unsigned(stream)
 res.lights = []
 for i=1,res.num_lights do
  res.lights[i] = unpack_light_data(stream)
 end
end

This would allow the declarative syntax to be used in most places, but we could drop out to full-featured Turing complete code whenever needed.