Wednesday, October 20, 2010

A* is Overrated

Open any textbook on AI and you are sure to find a description of the A*-algorithm. Why? Because it is a provably correct solution to well defined, narrow problem. And that just makes it so... scientific and... teachable. Universities teach Computer Science, because nobody knows how to teach Computer Artistry or Computer Craftsmanship. Plus science is important, everybody knows that.

But I digress.

A* is a useful algorithm. You should know it. But when we are talking about AI navigation, A* is just an imperfect solution to a tiny and not very difficult part of the problem. Here are three reasons why you shouldn't give A* so much credit:

1. Pathfinding is often not that important.

In many games, the average enemy is only on-screen a couple of seconds before it gets killed. The player doesn't even have a chance to see if it is following a path or not. Make sure it does something interesting that the player can notice instead.

2. A* is not necessarily the best solution to path finding problems.

A* finds the shortest path, but that is usually not that important. As long as the path "looks reasonable" and is "short enough" it works fine. This opens the floor for a lot of other algorithms.

For long-distance searches you get the best performance by using a hierarchical search structure. A good design of the hierarchy structure is more important for performance than the algorithm you use to search at each level.

Path-finding usually works on a time scale of seconds, which means that we can allow up to 30 frames of latency in answering search queries. To distribute the work evenly, we should use a search algorithm that runs incrementally. And it should of course parallelize to make the most use of modern hardware.

So what we really want is an incremental, parallelizable, hierarchical algorithm to find a shortish path, not cookie cutter A*.

3. Even if we use A*, there are a lot of other navigational issues that are more important and harder to solve than path finding.

Implementation, for starters. A* only reaches its theoretical performance when the data structures are implemented efficiently. You can still find A* implementations where the open nodes are kept in a sorted list rather than a heap. Ouch. Some implementation details are non-trivial. How do you keep track of visited nodes? A hashtable? (Extra memory and CPU.) A flag in the nodes themselves? (Meaning you can only run one search at a time over the nodes.)

How is the graph created? Hand edited in the editor? How much work is it to redo it every time the level changes? Automatic? What do you do when the automatic generation fails? Can you modify it? Will these modifications stick if the level is changed and the graph is regenerated? Can you give certain nodes special properties when the graph is auto-generated?

How do you handle dynamic worlds were paths can be blocked and new paths can be opened? Can you update the graph dynamically in an efficient way? What happens to running queries? How do past queries get invalidated when their path is no longer valid?

Once you have a path, how do you do local navigation? I.e. how do you follow the path smoothly without colliding with other game agents or dynamic objects? Do AI units collide against the graph or against real physics? What happens if the two don't match up? How do you match animations to the path following movement?

In my opinion, local navigation is a lot more important to the impression a game AI makes than path finding. Nobody will care that much if an AI doesn't follow the 100 % best part towards the target. To err is human. But everybody will notice if the AI gets stuck running against a wall because its local navigation system failed.

In the next blog post I will talk a bit about how local navigation is implemented in the BitSquid engine.

Monday, October 18, 2010

Time Step Smoothing

Today I'm going to argue for smoothing your update delta time, i.e. to apply a simple low pass filter to it before using it to update the game state:

float dt = elapsed_time();
dt = filter(dt);

This assumes of course that you are using a variable time step. A fixed time step is always constant and doesn't need smoothing.

Some people will say that you should always use a fixed time step, and there is a lot of merit to that. A stable frame rate always looks better and with a fixed time step the gameplay code behaves more deterministically and is easier to test. You can avoid a whole class of bugs that "only appear when the frame rate drops below 20 Hz".

But there are situations where using a fixed frame rate might not be possible. For example if your game is dynamic and player driven, the player can always create situations when it will drop below 30 fps (for example, by gathering all the physics objects in the same place or aggroing all the enemies on the level). If you are using a fixed time step, this will force the game into slow motion, which may not be acceptable.

On the PC you have to deal with a wide range of hardware. From the ├╝ber-gamer that wants her million dollar rig to run the game in 250 fps (even though the monitor only refreshes at 90 Hz) to the poor min spec user who hopes that he can at least get the game to boot and chug along at 15 fps. It is hard to satisfy both customers without using a variable time step.

The BitSquid engine does not dictate a time step policy, that is entirely up to the game. You can use fixed, variable, smoothed or whatever else fits your game.

So, why am I arguing for smoothing variable time steps? A basic update loop without time smoothing looks something like this:

float dt = elapsed_time();

Each frame, we measure the time that has elapsed and use that to advance the game timer.

This seems straight forward enough. Every frame we advance the game clock by the time that has elapsed since the last frame. Even if the time step oscillates you would expect this to result in objects moving in a smooth and natural manner:

Here the dots represents the times (x-axis) at which we sample an object's position (y-axis). As you can see the samples result in a straight line for objects moving at a constant velocity, even though the time steps vary. All seems well.

Except this graph doesn't tell the entire truth. The time shown in this graph is the time when we start simulating a frame, not when that frame is drawn on screen. For each frame we simulate, there is an amount of latency from the point when we start simulating the frame until it is drawn. In fact, the variation in that latency is what accounts for the varying elapsed times that we measure. (Well not entirely, there can be GPU latency that doesn't show up in the delta time, but for the sake of simplicity, let's assume that the latency of the current frame is the delta time we measure in the next frame.)

If the take the latency into account and offset each time value in the graph above with the latency (the elapsed time of the next frame), we get a truer representation of what the player actually sees in the game:

Oops, the line is no longer straight and there is a good chance that the user will notice this as jerky motion.

If we could predict perfectly what the latency of the next frame would be, we could use that rather than the elapsed time of the last frame to advance our timers. This would compensate perfectly for the latency and the user would again see a straight line.

Unfortunately, perfectly predicting how long the next frame will take to simulate and render is impossible. But we can make a guess and that is one of the goals of our time step filter:
  • To make a better guess of how long it will take to update and draw the next frame.
When we are using the standard update loop, we are guessing that the next frame will take as long to draw as the current frame. This is a decent guess, but we can do better. For example, if we calculate the mean delta time of the last few frames and use that instead, we can reduce the average error by about 30 %. (Says my quick and dirty guesstimate calculation. If you want a real figure you have to presume a delta time probability distribution and do the math. Because I'm too lazy to.)

Here is the same graph again, but here we have smoothed the time step that we use to evaluate the object's position.

As you can see, we get a straighter line than we got by using the "raw" variable time steps.

Can we do anything to improve the prediction even further? Yes, if we know more about the data we are predicting. For example, on the PC, you can get occasional "glitch" frames with very long update times if you get interrupted by other processes. If you are using a straight mean, any time you encounter such a glitch frame you will predict that the next few frames will also take very long to update. But the glitch frame was an anomaly. You will get a better predictor by ignoring such outliers when you calculate the mean.

Another pattern in the frame rate that is quite common is a zig-zag pattern where the game oscillates between two different frame rates:

You could make a predictor for this. The predictor could detect if there is a strong correlation between the delta times for even and odd frames and if so, it could use only the even/odd frames to calculate the mean.

But I don't recommend doing that. For two reasons. First, the zig-zag pattern is usually a result of a bad setup in the engine (or, if you are unfortunate, in the driver). It is better to fix that problem and get rid of the oscillations than to work around them. Second, heavily oscillating time steps, which you would get if you tried to follow the zig-zag pattern, tend to make gameplay code behave badly.

So this gives us a second goal of time step filtering:
  • Keep the time step reasonably stable from frame to frame.
Why do I say that oscillating time steps make gameplay code behave badly? Certainly, at any good studio, the gameplay code is written to be time step independent. Whenever an object is moved, the length of the time step is taken into account, so that it moves with the same speed regardless of the frame rate:

s = s + v*dt

Yes, it is easy to make the update code for a single object frame rate independent in this manner. But once you start to have complex interactions between multiple objects, it gets a lot more difficult.

An example: Suppose you want a camera to follow behind a moving object, but you don't want it to be completely stiff, you want some smoothness to it, like maybe lerping it to the right position, or using a dampened spring. Now try to write that in a completely time step independent manner. I.e., an oscillation in the time step should not cause the camera to oscillate with respect to the object it is following.

Not so easy.

Game play code faces many similar issues. With 20+ game play programmers banging a way at the code base you can be quite certain that there are several places where oscillations in the time step lead to badly looking oscillating visuals in the game. And the best way to prevent that, in my opinion, is to smooth the time step so that it doesn't oscillate.

In summary, this is our default method for time step smoothing in the BitSquid engine (though as I said above, you are of course free to use whatever time step you want):
  • Keep a history of the time step for the last 11 frames.
  • Throw away the outliers, the two highest and the two lowest values.
  • Calculate the mean of the remaining 7 values.
  • Lerp from the time step for the last frame to the calculated mean (adding more smoothness)
Here is an example of the filter in action. The yellow line in this graph shows the raw time step. The red line is the smoothed time step.

One last thing to mention. This calculation can cause your game clock to drift from the world clock. If you need them to be synchronized (for network play for example) you need to keep track of your "time debt" -- i.e., how far your clock has drifted from the world clock and "pay off" that debt over a number of frames by increasing or decreasing your time step, until you are back at a synchronized state.

Tuesday, October 5, 2010

The Dependency Checker

Maintaining referential integrity between resources in a big game project can be challenging:

  • Someone might accidentally delete an entity that is used somewhere in a level.
  • Someone might change a texture to improve the look of one object, without knowing that that texture is shared by two other objects.
  • A resource may have a misspelled name, but no one dares change it, because it is used in too many places.
  • There may be "dead" resources in the project, that aren't used anywhere, but no one knows how to find them, so that they can be deleted.

To help with these issues, I've created a new tool for the BitSquid tool chain, the Dependency Checker. The Dependency Checker understands all BitSquid file formats and knows how they can refer to other resources. By parsing the source tree it is thus able to create the complete dependency graph of a project.

This isn't as complicated as it sounds, because we don't have that many different file formats, they are all based on SJSON and they use a standardized way of referring to other resources (type, name). The entire code for parsing and understanding all the different file formats is just 500 lines long.

Once we have the dependency graph, we can do lots of interesting things with it. The first is to find all missing and dangling resources:

Missing resources are resources that are referred somewhere, but that don't actually exist in the source tree. Dangling resources are existing resources that aren't referred anywhere.

We can click on any resource in the list (or any other resource in the project) to see its dependencies.

But the real interesting thing is the ability to patch dependencies. The Dependency Checker does not only know how to parse dependencies, it also knows how to modify them. (Yes, that is included in the 500 lines of code.) That means that it can replace, move and copy resources.

For example, we can replace the missing font texture with something else.

All files that used the font texture will be patched up to refer to the new resource.

Move is useful when we have given something a bad name and just want to clean up our resources a bit:

Copy can be used to quickly clone resources. But since the dependency checker lets you decide which (if any) references you want to redirect to the new copy, it can also be used as a quick way of splitting resources. For example if you decide that you want to use a different key entity on the himalaya level, you can make a copy of the key entity for just those levels.

Friday, October 1, 2010

Static Hash Values

We use 32-bit string hashes instead of strings in many places to save memory and improve performance. (When there is a risk for collision we use 64-bit hashes instead.)

At a number of places in the code we want to check these hashes against predefined values. For example, we may want to check if a certain object is the "root_point". With a straight forward implementation, you get code that looks like this:
const char *root_point_str = "root_point";
static unsigned root_point_id = murmur_hash(root_point_str, 
    strlen(root_point_str), 0);
if ( == root_point_id)
We use a static variable to avoid having to hash the string more than once, but this is still pretty inefficient. There is the extra application data, the computation of the hash the first time the function is run. On subsequent invocations there is still the check to see if the static variable has been initialized.

It would be a lot more efficient if we could precompute the hashes somehow to avoid that cost in the runtime. I can see three ways:
  • We could run a code generation pass in a pre-build step that generates the hash values and patches the code with them.
  • We could use the preprocessor to generate the values.
  • We could compute the values offline and hard-code them in the code.
I'm not too found of code generation. It is nice in theory, but to me it always seems kind of messy the way it interacts with the build system, the debugger, etc.

Rewriting the murmur hash algorithm in the preprocessor requires me to bring out some serious preprocessor-fu. But it is fun. It is almost like functional programming: With these lovely macros in place, we can now write:
if ( == HASH_STR_10('r','o','o','t','_','p','o','i','n','t'))
Having completed this task I feel a bit empty. That is certainly a lot of macro code for an end result that still is kind of meh.

I disregarded hard coding the values to begin with because no one wants to look at code like this:
if ( == 0x5e43bd96)
Even dressed up in comments, it is still kind of scary:
unsigned root_point_id = 0x5e43bd96; // hash of "root_point"
if ( == root_point_id)
What if someone types in the wrong value? What if we decide to change hash algorithm at some later point? Scary. But maybe we can ameliorate those fears:
#ifdef _DEBUG
    inline unsigned static_hash(const char *s, unsigned value) {
        assert( murmur_hash(s, strlen(s), 0) == value );
        return value;
    #define static_hash(s,v) (v)


if ( == static_hash("root_point", 0x5e43bd96)
That looks better and is completely safe. If something goes wrong, the assert will trigger in the debug builds.

I think I like this better than the preprocessor solution. It will make the debug builds run a bit slower, but that's what debug builds are for, right?