Friday, February 25, 2011

Managing Decoupling Part 3 - C++ Duck Typing

Some systems need to manipulate objects whose exact nature are not known. For example, a particle system has to manipulate particles that sometimes have mass, sometimes a full 3D rotation, sometimes only 2D rotation, etc. (A good particle system anyway, a bad particle system could use the same struct for all particles in all effects. And the struct could have some fields called custom_1,custom_2 used for different purposes in different effects. And it would be both inefficient, inflexible and messy.)

Another example is a networking system tasked with synchronizing game objects between clients and servers. A very general such system might want to treat the objects as open JSON-like structs, with arbitrary fields and values:

{
    "score" : 100,
    "name": "Player 1"
}

We want to be able to handle such “general” or “open” objects in C++ in a nice way. Since we care about structure we don’t want the system to be strongly coupled to the layout of the objects it manages. And since we are performance junkies, we would like to do it in a way that doesn’t completely kill performance. I.e., we don’t want everything to inherit from a base class Object and define our JSON-like objects as:

typedef std::map OpenStruct;

Generally speaking, there are three possible levels of flexibility with which we can work with objects and types in a programming language:


1. Exact typing - Only ducks are ducks


We require the object to be of a specific type. This is the typing method used in C and for classes without inheritance in C++.

2. Interface typing - If it says it’s a duck


We require the object to inherit from and implement a specific interface type. This is the typing method used by default in Java and C# and in C++ when inheritance and virtual methods are used. It is more flexible that the exact approach, but still introduces a coupling, because it forces the objects we manage to inherit a type defined by us.

Side rant: My general opinion is that while inheriting interfaces (abstract classes) is a valid and useful design tool, inheriting implementations is usually little more than a glorified “hack”, a way of patching parent classes by inserting custom code here and there. You almost always get a cleaner design when you build your objects with composition instead of with implementation inheritance.

3. Duck typing - If it quacks like a duck


We don’t care about the type of the object at all, as long as it has the fields and methods that we need. An example:

      def integrate_position(o, dt):
          o.position = o.position + o.velocity * dt

This method integrates the position of the object o. It doesn’t care what the type of o is, as long as it has a “position” field and a “velocity” field.

Duck typing is the default in many “scripting” languages such as Ruby, Python, Lua and JavaScript. The reflection interface of Java and C# can also be used for duck typing, but unfortunately the code tends to become far less elegant than in the scripting languages:

      o.GetType().GetProperty(“Position”).SetValue(o, o.GetType().
         GetProperty(“Position”).GetValue(o, null) + o.GetType().
         GetProperty(“Velocity”).GetValue(o, null) * dt, null)

What we want is some way of doing “duck typing” in C++.

Let’s look at inheritance and virtual functions first, since that is the standard way of “generalizing” code in C++. It is true that you could do general objects using the inheritance mechanism. You would create a class structure looking something like:

class Object {...};
class Int : public Object {...};
class Float : public Object{...};

and then use dynamic_cast or perhaps your own hand-rolled RTTI system to determine an object’s class.
But there are a number of drawbacks with this approach. It is quite verbose. The virtual inheritance model requires objects to be treated as pointers so they (probably) have to be heap allocated. This makes it tricky to get a good memory layout. And that hurts performance. Also, they are not PODs so we will have to do extra work if we want to move them to a co-processor or save them to disk.

So I prefer something much simpler. A generic object is just a type enum followed by the data for the object:



To pass the object you just pass its pointer. To make a copy, you make a copy of the memory block. You can also write it straight to disk and read it back, send it over network or to an SPU for off-core processing.

To extract the data from the object you would do something like:

unsigned type = *(unsigned *)o;
if (type == FLOAT_TYPE)
    float f = *(float *)(o + 4);

You don’t really need that many different object types: boolintfloatvector3quaternionstring,array and dictionary is usually enough. You can build more complicated types as aggregates of those, just as you do in JSON.

For a dictionary object we just store the name/key and type of each object:



I tend to use a four byte value for the name/key and not care if it is an integer, float or a 32-bit string hash. As long as the data is queried with the same key that it was stored with, the right value will be returned. I only use this method for small structs, so the probability for a hash collision is close to zero and can be handled by “manual resolution”.

If we have many objects with the same “dictionary type” (i.e. the same set of fields, just different values) it makes sense to break out the definition of the type from the data itself to save space:



Here the offset field stores the offset of each field in the data block. Now we can efficiently store an array of such data objects with just one copy of the dictionary type information:



Note that the storage space (and thereby the cache and memory performance) is exactly the same as if we were using an array of regular C structs, even though we are using a completely open free form JSON-like struct. And extracting or changing data just requires a little pointer arithmetic and a cast.

This would be a good way of storing particles in a particle system. (Note: This is an array-of-structures approach, you can of course also use duck typing with a sturcture-of-arrays approach. I leave that as an exercise to the reader.)

If you are a graphics programmer all of this should look pretty familiar. The “dictionary type description” is very much like a “vertex data description” and the “dictionary data” is awfully similar to “vertex data”. This should come as no big surprise. Vertex data is generic flexible data that needs to be processed fast in parallel on in-order processing units. It is not strange that with the same design criterions we end up with a similar solution.


Morale and musings

It is OK to manipulate blocks of raw memory! Pointer arithmetic does not destroy your program! Type casts are not “dirty”! Let your freak flag fly!

Data-oriented-design and object-oriented design are not polar opposites. As this example shows a data-oriented design can in a sense be “more object-oriented” than a standard C++ virtual function design, i.e., more similar to how objects work in high level languages such as Ruby and Lua.

On the other hand, data-oriented-design and inheritance are enemies. Because designs based on base class pointers and virtual functions want objects to live individually allocated on the heap. Which means you cannot control the memory layout. Which is what DOD is all about. (Yes, you can probably do clever tricks with custom allocators and patching of vtables for moving or deserializing objects, but why bother, DOD is simpler.)

You could also store function pointers in these open structs. Then you would have something very similar to Ruby/Lua objects. This could probably be used for something great. This is left as an exercise to the reader.

Friday, February 11, 2011

Managing Coupling Part 2 — Polling, Callbacks and Events


In my last post, I talked a bit about the importance of decoupling and how one of the fundamental challenges in system design is to keep systems decoupled while still allowing the necessary interactions to take place.

This time I will look at one specific such challenge: when a low level system needs to notify a high level system that something has happened. For example, the animation system may want to notify the gameplay system that the character’s foot has touched the ground, so that a footstep sound can be played.

(Note that the reverse is not a problem. The high level system knows about the low level system and can call it directly. But the low level system shouldn’t know or care about the high level system.)

There are three common techniques for handling such notifications: polling, callbacks and events.

Polling

A polling system calls some function every frame to check if the event it is interested in has occurred. Has the file been downloaded yet? What about now? Are we there yet?

Polling is often considered “ugly” or “inefficient”. And indeed, in the desktop world, polling is very impolite, since it means busy-waiting and tying up 100 % of the CPU in doing nothing.

But in game development the situation is completely different. We are already doing a ton of stuff every 33 ms (or half a ton of stuff every 17 ms). As long as we don’t poll a huge amount of objects, polling won’t have any impact on the framerate.

And code that uses polling is often easier to write and ends up better designed than code that uses callbacks or events. For example, it is much easier to just check if the A key is pressed inside the character controller, than to write a callback that gets notified if A is pressed and somehow forward that information to the character controller.

So, in my opinion, you should actually prefer to use polling whenever possible (i.e., when you don’t have to monitor a huge number of objects).

Some areas where polling work well are: file downloads, server browsing, game saving, controller input, etc.

An area less suited for polling is physics collisions, since there are N*N possible collisions that you would have to poll for. (You could argue that rather than polling for a collision between two specific objects, you could poll for a collision between any two objects. My reply would be that in that case you are no longer strictly polling, you are in fact using a rudimentary effect system.)

Callbacks

In a callback solution, the low level system stores a list of high level functions to call when certain events occur.

An important question when it comes to callbacks is if the callback should be called immediately when the event occurs, or if it should be queued up and scheduled for execution later in the frame.

I much prefer the latter approach. If you do callbacks immediately you not only trash your instruction and data caches. You also prevent multithreading (unless you use locks everywhere to prevent the callbacks from stepping on each other). And you open yourself up to the nasty bug where a callback through a chain of events ends up destroying the very objects you are looping over.

It is much better to queue up all callbacks and only execute them when the high level system asks for it (with an execute_callbacks() call). That way you always know when the callbacks occur. Side effects can be minimized and the code flow is clearer. Also, with this approach there is no problem with generating callbacks on the SPU and merging the queue with other callback queues later.

The only thing you need to worry about with delayed callbacks is that the objects that the callback refers to might have been destroyed between the time when the callback was generated and the time when it was actually called. But this is neatly handled by using the ID reference system that I talked about in the previous post. Using that technique, the callback can always determine if the objects still exist.

Note that the callback system outlined here has some similarities with the polling system — in that the callbacks only happen when we explicitly poll for them.

It is not self-evident how to represent a callback in C++. You might be tempted to use a member function pointer. Don’t. The casting and typing rules make it near impossible to use them for any kind of generic callback mechanism. Also, don’t use an “observer pattern”, where the callback must be some object that inherits from an AnimationEventObserver class and overrides handle_animation_event(). That just leads to tons of typing and unnecessary heap allocation.

There is an interesting article about fast and efficient C++ delegates at http://www.codeproject.com/KB/cpp/FastDelegate.aspx. It looks solid, but personally I’m not comfortable with making something that requires so many platform specific tricks one of the core mechanisms of my engine. 

So instead I use regular C function pointers for callbacks. This means that if I want to call a member function, I have to make a little static function that calls the member function. That is a bit annoying, but better than the alternatives.

(Isn’t it interesting that when you try to design a clean and flexible C++ API it often ends up as pure C.)

When you use C callbacks you typically also want to pass some data to them. The typical approach in the C world is to use a void * to “user data” that is passed to the callback function. I actually prefer a slightly different approach. Since I sometimes want to pass more data than a single void * I use something like this:

struct Callback16
{
  void (*f)(void);
  char data[12];
};

There aren’t a huge amount of callbacks, so using 16 bytes instead of 8 to store them doesn’t matter. You could go to Callback32 if you want the option to store even more data.

When calling the callback, I cast the function pointer to the appropriate type and pass a pointer to its data as the first parameter.

typedef void (*AnimationEventCallback)(void *, unsigned);
AnimationEventCallback f = (AnimationEventCallback)callback.f;
f(callback.data, event_id);

I’m not worried about casting the function pointer back and forth between a generic type and a specific one or about casting the data in and out of a raw buffer. Type safety is nice, but there is an awful lot of power in juggling blocks of raw memory. And you don’t have to worry that much about someone casting the data to the wrong type, because doing so will 99% of the time cause a huge spectacular crash, and the error will be fixed immediately.

Events

Event systems are in many ways similar to callback systems. The only difference is that instead of storing a direct pointer to a callback function, they store an event enum. The high level system that polls the events decides what action to take for each enum.

In my opinion, callbacks work better when you want to listen to specific notifications: “Tell me when this sound has finished playing.” Events work better when you process them in bulk: “Check all collision notifications to see if the forces involved are strong enough to break the objects.” But much of it is a matter of taste.

For storing the event queues (or callback queues) I just use a raw buffer (Vector orchar[FIXED_SIZE]) where I concatenate all events and their data:

[event_1_enum] [event_1_data] [event_2_enum] [event_2_data] …

The high level system just steps through this buffer, processing each event in turn. Note that event queues like this are easy to move, copy, merge and transfer between cores. (Again, the power of raw data buffers.)

In this design there is only a single high level system that polls the events of a particular low level system. It understands what all the events mean, what data they use and knows how to act on them. The sole purpose of the event system (it is not even much of a “system”, just a stream of data) is to pass notifications from the low level to the high.

This is in my opinion exactly what an event system should be. It should not be a magic global switchboard that dispatches events from all over the code to whoever wants to listen to them. Because that would be horrid!

Sunday, January 30, 2011

Managing Coupling

(This post has also been posted to http://altdevblogaday.com/.)


The only way of staying sane while writing a large complex software system is to regard it as a collection of smaller, simpler systems. And this is only possible if the systems are properly decoupled.
Ideally, each system should be completely isolated. The effect system should be the only system manipulating effects and it shouldn’t do anything else. It should have its own update() call just for updating effects. No other system should care how the effects are stored in memory or what parts of the update happen on the CPU, SPU or GPU. A new programmer wanting to understand the system should only have to look at the files in the effect_system directory. It should be possible to optimize, rewrite or drop the entire system without affecting any other code.
Of course, complete isolation is not possible. If anything interesting is going to happen, different systems will at some point have to talk to one another, whether we like it or not.
The main challenge in keeping an engine “healthy” is to keep the systems as decoupled as possible while still allowing the necessary interactions to take place. If a system is properly decoupled, adding features is simple. Want a wind effect in your particle system? Just write it. It’s just code. It shouldn’t take more than a day. But if you are working in a tightly coupled project, such seemingly simple changes can stretch out into nightmarish day-long debugging marathons.
If you ever get the feeling that you would prefer to test an idea out in a simple toy project rather than in “the real engine”, that’s a clear sign that you have too much coupling.
Sometimes, engines start out decoupled, but then as deadlines approach and features are requested that don’t fit the well-designed APIs, programmers get tempted to open back doors between systems and introduce couplings that shouldn’t really be there. Slowly, through this “coupling creep” the quality of the code deteriorates and the engine becomes less and less pleasant to work with.
Still, programmers cannot lock themselves in their ivory towers. “That feature doesn’t fit my API,” is never an acceptable answer to give a budding artist. Instead, we need to find ways of handling the challenges of coupling without destroying our engines. Here are four quick ideas to begin with:
1. Be wary of “frameworks”.
By a “framework” I mean any kind of system that requires all your other code to conform to a specific world view. For example, a scripting system that requires you to add a specific set of macro tags to all your class declarations.
Other common culprits are:
  • Root classes that every object must inherit from
  • RTTI/reflection systems
  • Serialization systems
  • Reference counting systems
Such global systems introduce a coupling across the entire engine. They rudely enforce certain design choices on all subsystems, design choices which might not be appropriate for them. Sometimes the consequences are serious. A badly thought out reference system may prevent subsystems from multithreading. A less than stellar serialization system can make linear loading impossible.
Often, the motivation given for such global systems is that they increase maintainability. With a global serialization system, we just have to make changes at a single place. So refactoring is much easier, it is claimed.
But in practice, the reverse is often true. After a while, the global system has infested so much of the code base that making any significant change to it is virtually impossible. There are just too many things that would have to be changed, all at the same time.
You would be much better off if each system just defined its own save() and load() functions.
2. Use high level systems to mediate between low level systems.
Instead of directly coupling low level systems, use a high level system to shuffle data between them. For example, handling footstep sounds might involve the animation system, the sound system and the material system. But none of these systems should know about the others.
So instead of directly coupling them, let the gameplay system handle their interactions. Since the gameplay system knows about all three systems, it can poll the animation system for events defined in the animation data, sample the ground material from the material system and then ask the sound system to play the appropriate sound.
Make sure that you have a clear separation between this messy gameplay layer, that can poke around in all other systems, and your clean engine code that is isolated and decoupled. Otherwise there is always a risk that the mess propagates downwards and infects your clean systems.
In the BitSquid Tech we put the messy stuff either in Lua or in Flow (our visual scripting tool, similar to Unreal’s Kismet). The language barrier acts as a firewall, preventing the spread of the messiness.
3. Duplicating code is sometimes OK!
Avoiding duplicated code is one of the fundamentals of software design. Entities should not be needlessly multiplied. But there are instances when you are better off breaking this rule.
I’m not advocating copy-paste-programming or writing complicated algorithms twice. I’m saying that sometimes people can get a little overzealous with their code reuse. Code sharing has a price that is not always recognized, in that it increases system coupling. Sometimes a little judiciously applied code duplication can be a better solution.
An typical example is the String class (or std::string if you are thusly inclined). In some projects you see the String class used almost everywhere. If something is a string, it should use the Stringclass, the reasoning seems to be. But many systems that handle strings do not need all the features that you find in your typical String class: locales, find_first_of(), etc. They are fine with just aconst char *strcmp() and maybe one custom written (potentially duplicated) three-line function. So why not use that, the code will be much simpler and easier to move to SPUs.
Another culprit is FixedArray a. Sure, if you write int a[5] instead you will have to duplicate the code for bounds checking if you want that. But your code can be understood and compiled without fixed_array.h and template instantiation.
And if you have any method that takes a const Vector &v as argument you should probably take const T *begin, const T *end instead. Now you don’t need the vector.h header, and the caller is not forced to use a particular Vector class for storage.
A final example: I just wrote a patching tool that manipulates our bundles (aka pak-files). That tool duplicates the code for parsing the bundle headers, which is already in the engine. Why? Well, the tool is written in C# and the engine in C++, but in this case that is kind of beside the point. The point is that sharing that code would have been a significant effort.
First, it would have had to be broken out into a separate library, together with the related parts of the engine. Then, since the tool requires some functionality that the engine doesn’t (to parse bundles with foreign endianness) I would have to add a special function for the tool, and probably a #define TOOL_COMPILE since I don’t want that function in the regular builds. This means I need a special build configuration for the tool. And the engine code would forever be dirtied with the TOOL_COMPILE flag. And I wouldn’t be able to rearrange the engine code as I wanted in the future, since that might break the tool compile.
In contrast, rewriting the code for parsing the headers was only 10 minutes of work. It just reads a vector of string hashes. It's not rocket science. Sure, if I ever decide to change the bundle format, I might have to spend another 10 minutes rewriting that code. I think I can live with that.
Writing code is not the problem. The messy, complicated couplings that prevent you from writing code is the problem.
4. Use IDs to refer to external objects.
At some point one of your systems will have to refer to objects belonging to another system. For example, the gameplay layer may have to move an effect around or change its parameters.
I find that the most decoupled way of doing that is by using an ID. Let’s consider the alternatives.
Effect *, shared_ptr
A direct pointer is no good, because it will become invalid if the target object is deleted and the effect system should have full control over when and how its objects are deleted. A standardshared_ptr won’t work for the same reason, it puts the life time of Effect objects out of the control of the effect system.
Weak_ptr, handle
By this I mean some kind of reference-counted, indirect pointer to the object. This is better, but still too strongly coupled for my taste. The indirect pointer will be accessed both by the external system (for dereferencing and changing the reference count) and by the effect system (for deleting the Effect object or moving it in memory). This has the potential for creating threading problems.
Also, this construct kind of implies that external systems can dereference and use the Effectwhenever they want to. Perhaps the effect system only allows that when its update() loop is not running and want to assert() that. Or perhaps the effect system doesn’t want to allow direct access to its objects at all, but instead double buffer all changes.
So, in order to allow the effect system to freely reorganize its data and processing in any way it likes, I use IDs to identify objects externally. The IDs are just an integers uniquely identifying an object, that the user can throw away when she is done with them. They don’t have to be “released” like aweak_ptr, which removes a point of interaction between the systems. It also means that the IDs are PODs. We can copy and move them freely in memory, juggle them in Lua and DMA them back-and-forth to our heart’s content. All of this would be a lot more complicated if we had to keep reference counts.
In the system we need a fast way of mapping IDs back to objects. Note that std::map is not a fast way! But there are a number of possibilities. The simplest is to just use a fixed size array with object pointers:
Object *lookup[MAX_OBJECTS];
If your system has a maximum of 4096 objects, use 12 bits from the key to store an index into this array and the remaining 20 bits as a unique identifier (i.e., to detect the case when the original object has been deleted and a new object has been created at the same index). If you need lots of objects, you can go to a 64 bit ID.
That's it for today, but this post really just scratches the surface of decoupling. There are a lot of other interesting techniques to look at, such as events, callbacks and “duck typing”. Maybe something for a future entry...

Wednesday, October 20, 2010

A* is Overrated


Open any textbook on AI and you are sure to find a description of the A*-algorithm. Why? Because it is a provably correct solution to well defined, narrow problem. And that just makes it so... scientific and... teachable. Universities teach Computer Science, because nobody knows how to teach Computer Artistry or Computer Craftsmanship. Plus science is important, everybody knows that.

But I digress.

A* is a useful algorithm. You should know it. But when we are talking about AI navigation, A* is just an imperfect solution to a tiny and not very difficult part of the problem. Here are three reasons why you shouldn't give A* so much credit:

1. Pathfinding is often not that important.

In many games, the average enemy is only on-screen a couple of seconds before it gets killed. The player doesn't even have a chance to see if it is following a path or not. Make sure it does something interesting that the player can notice instead.

2. A* is not necessarily the best solution to path finding problems.

A* finds the shortest path, but that is usually not that important. As long as the path "looks reasonable" and is "short enough" it works fine. This opens the floor for a lot of other algorithms.

For long-distance searches you get the best performance by using a hierarchical search structure. A good design of the hierarchy structure is more important for performance than the algorithm you use to search at each level.

Path-finding usually works on a time scale of seconds, which means that we can allow up to 30 frames of latency in answering search queries. To distribute the work evenly, we should use a search algorithm that runs incrementally. And it should of course parallelize to make the most use of modern hardware.

So what we really want is an incremental, parallelizable, hierarchical algorithm to find a shortish path, not cookie cutter A*.

3. Even if we use A*, there are a lot of other navigational issues that are more important and harder to solve than path finding.

Implementation, for starters. A* only reaches its theoretical performance when the data structures are implemented efficiently. You can still find A* implementations where the open nodes are kept in a sorted list rather than a heap. Ouch. Some implementation details are non-trivial. How do you keep track of visited nodes? A hashtable? (Extra memory and CPU.) A flag in the nodes themselves? (Meaning you can only run one search at a time over the nodes.)

How is the graph created? Hand edited in the editor? How much work is it to redo it every time the level changes? Automatic? What do you do when the automatic generation fails? Can you modify it? Will these modifications stick if the level is changed and the graph is regenerated? Can you give certain nodes special properties when the graph is auto-generated?

How do you handle dynamic worlds were paths can be blocked and new paths can be opened? Can you update the graph dynamically in an efficient way? What happens to running queries? How do past queries get invalidated when their path is no longer valid?

Once you have a path, how do you do local navigation? I.e. how do you follow the path smoothly without colliding with other game agents or dynamic objects? Do AI units collide against the graph or against real physics? What happens if the two don't match up? How do you match animations to the path following movement?

In my opinion, local navigation is a lot more important to the impression a game AI makes than path finding. Nobody will care that much if an AI doesn't follow the 100 % best part towards the target. To err is human. But everybody will notice if the AI gets stuck running against a wall because its local navigation system failed.

In the next blog post I will talk a bit about how local navigation is implemented in the BitSquid engine.

Monday, October 18, 2010

Time Step Smoothing

Today I'm going to argue for smoothing your update delta time, i.e. to apply a simple low pass filter to it before using it to update the game state:

float dt = elapsed_time();
dt = filter(dt);
update_game(dt);

This assumes of course that you are using a variable time step. A fixed time step is always constant and doesn't need smoothing.

Some people will say that you should always use a fixed time step, and there is a lot of merit to that. A stable frame rate always looks better and with a fixed time step the gameplay code behaves more deterministically and is easier to test. You can avoid a whole class of bugs that "only appear when the frame rate drops below 20 Hz".

But there are situations where using a fixed frame rate might not be possible. For example if your game is dynamic and player driven, the player can always create situations when it will drop below 30 fps (for example, by gathering all the physics objects in the same place or aggroing all the enemies on the level). If you are using a fixed time step, this will force the game into slow motion, which may not be acceptable.

On the PC you have to deal with a wide range of hardware. From the über-gamer that wants her million dollar rig to run the game in 250 fps (even though the monitor only refreshes at 90 Hz) to the poor min spec user who hopes that he can at least get the game to boot and chug along at 15 fps. It is hard to satisfy both customers without using a variable time step.

The BitSquid engine does not dictate a time step policy, that is entirely up to the game. You can use fixed, variable, smoothed or whatever else fits your game.

So, why am I arguing for smoothing variable time steps? A basic update loop without time smoothing looks something like this:

float dt = elapsed_time();
update_game(dt);

Each frame, we measure the time that has elapsed and use that to advance the game timer.

This seems straight forward enough. Every frame we advance the game clock by the time that has elapsed since the last frame. Even if the time step oscillates you would expect this to result in objects moving in a smooth and natural manner:



Here the dots represents the times (x-axis) at which we sample an object's position (y-axis). As you can see the samples result in a straight line for objects moving at a constant velocity, even though the time steps vary. All seems well.

Except this graph doesn't tell the entire truth. The time shown in this graph is the time when we start simulating a frame, not when that frame is drawn on screen. For each frame we simulate, there is an amount of latency from the point when we start simulating the frame until it is drawn. In fact, the variation in that latency is what accounts for the varying elapsed times that we measure. (Well not entirely, there can be GPU latency that doesn't show up in the delta time, but for the sake of simplicity, let's assume that the latency of the current frame is the delta time we measure in the next frame.)

If the take the latency into account and offset each time value in the graph above with the latency (the elapsed time of the next frame), we get a truer representation of what the player actually sees in the game:



Oops, the line is no longer straight and there is a good chance that the user will notice this as jerky motion.

If we could predict perfectly what the latency of the next frame would be, we could use that rather than the elapsed time of the last frame to advance our timers. This would compensate perfectly for the latency and the user would again see a straight line.

Unfortunately, perfectly predicting how long the next frame will take to simulate and render is impossible. But we can make a guess and that is one of the goals of our time step filter:
  • To make a better guess of how long it will take to update and draw the next frame.
When we are using the standard update loop, we are guessing that the next frame will take as long to draw as the current frame. This is a decent guess, but we can do better. For example, if we calculate the mean delta time of the last few frames and use that instead, we can reduce the average error by about 30 %. (Says my quick and dirty guesstimate calculation. If you want a real figure you have to presume a delta time probability distribution and do the math. Because I'm too lazy to.)

Here is the same graph again, but here we have smoothed the time step that we use to evaluate the object's position.



As you can see, we get a straighter line than we got by using the "raw" variable time steps.

Can we do anything to improve the prediction even further? Yes, if we know more about the data we are predicting. For example, on the PC, you can get occasional "glitch" frames with very long update times if you get interrupted by other processes. If you are using a straight mean, any time you encounter such a glitch frame you will predict that the next few frames will also take very long to update. But the glitch frame was an anomaly. You will get a better predictor by ignoring such outliers when you calculate the mean.

Another pattern in the frame rate that is quite common is a zig-zag pattern where the game oscillates between two different frame rates:



You could make a predictor for this. The predictor could detect if there is a strong correlation between the delta times for even and odd frames and if so, it could use only the even/odd frames to calculate the mean.

But I don't recommend doing that. For two reasons. First, the zig-zag pattern is usually a result of a bad setup in the engine (or, if you are unfortunate, in the driver). It is better to fix that problem and get rid of the oscillations than to work around them. Second, heavily oscillating time steps, which you would get if you tried to follow the zig-zag pattern, tend to make gameplay code behave badly.

So this gives us a second goal of time step filtering:
  • Keep the time step reasonably stable from frame to frame.
Why do I say that oscillating time steps make gameplay code behave badly? Certainly, at any good studio, the gameplay code is written to be time step independent. Whenever an object is moved, the length of the time step is taken into account, so that it moves with the same speed regardless of the frame rate:

s = s + v*dt

Yes, it is easy to make the update code for a single object frame rate independent in this manner. But once you start to have complex interactions between multiple objects, it gets a lot more difficult.

An example: Suppose you want a camera to follow behind a moving object, but you don't want it to be completely stiff, you want some smoothness to it, like maybe lerping it to the right position, or using a dampened spring. Now try to write that in a completely time step independent manner. I.e., an oscillation in the time step should not cause the camera to oscillate with respect to the object it is following.

Not so easy.

Game play code faces many similar issues. With 20+ game play programmers banging a way at the code base you can be quite certain that there are several places where oscillations in the time step lead to badly looking oscillating visuals in the game. And the best way to prevent that, in my opinion, is to smooth the time step so that it doesn't oscillate.

In summary, this is our default method for time step smoothing in the BitSquid engine (though as I said above, you are of course free to use whatever time step you want):
  • Keep a history of the time step for the last 11 frames.
  • Throw away the outliers, the two highest and the two lowest values.
  • Calculate the mean of the remaining 7 values.
  • Lerp from the time step for the last frame to the calculated mean (adding more smoothness)
Here is an example of the filter in action. The yellow line in this graph shows the raw time step. The red line is the smoothed time step.



One last thing to mention. This calculation can cause your game clock to drift from the world clock. If you need them to be synchronized (for network play for example) you need to keep track of your "time debt" -- i.e., how far your clock has drifted from the world clock and "pay off" that debt over a number of frames by increasing or decreasing your time step, until you are back at a synchronized state.

Tuesday, October 5, 2010

The Dependency Checker

Maintaining referential integrity between resources in a big game project can be challenging:

  • Someone might accidentally delete an entity that is used somewhere in a level.
  • Someone might change a texture to improve the look of one object, without knowing that that texture is shared by two other objects.
  • A resource may have a misspelled name, but no one dares change it, because it is used in too many places.
  • There may be "dead" resources in the project, that aren't used anywhere, but no one knows how to find them, so that they can be deleted.

To help with these issues, I've created a new tool for the BitSquid tool chain, the Dependency Checker. The Dependency Checker understands all BitSquid file formats and knows how they can refer to other resources. By parsing the source tree it is thus able to create the complete dependency graph of a project.

This isn't as complicated as it sounds, because we don't have that many different file formats, they are all based on SJSON and they use a standardized way of referring to other resources (type, name). The entire code for parsing and understanding all the different file formats is just 500 lines long.

Once we have the dependency graph, we can do lots of interesting things with it. The first is to find all missing and dangling resources:


Missing resources are resources that are referred somewhere, but that don't actually exist in the source tree. Dangling resources are existing resources that aren't referred anywhere.

We can click on any resource in the list (or any other resource in the project) to see its dependencies.


But the real interesting thing is the ability to patch dependencies. The Dependency Checker does not only know how to parse dependencies, it also knows how to modify them. (Yes, that is included in the 500 lines of code.) That means that it can replace, move and copy resources.

For example, we can replace the missing font texture with something else.


All files that used the font texture will be patched up to refer to the new resource.


Move is useful when we have given something a bad name and just want to clean up our resources a bit:


Copy can be used to quickly clone resources. But since the dependency checker lets you decide which (if any) references you want to redirect to the new copy, it can also be used as a quick way of splitting resources. For example if you decide that you want to use a different key entity on the himalaya level, you can make a copy of the key entity for just those levels.