Thursday, May 26, 2011

Monitoring your game

Many bugs are easy to fix with debuggers, stack traces and printf-statements. But some are hard to even see with such tools. I'm thinking of things like frame rate hitches, animation glitches and camera stutters. You can't put a breakpoint on the glitch because what constitutes a glitch is only defined in relation to what happened in the frame before or what will happen in the next frame. And even if you are able to break exactly when the glitch occurs, you might not be able to tell what is going on from the call stack.

In these situations, some way of monitoring and visualizing your game's behavior can be invaluable. Indeed, if we graph the delta time for each frame, the hitches stand out clear as day.


Delta-time graph with frame rate drops.


A graph like this opens up many new ways of attacking glitch bugs. You can play the game with the graph displayed and try to see what game actions trigger the glitches. Do they happen when a certain enemy is spawned? When a particular weapon is fired? Another approach is to draw the total frame time together with the time spent in all the different subsystems. This immediately shows you which subsystem is causing the frame rate to spike. You can constrain the problem further by graphing the time spent in narrower and narrower profiler scopes.

Visualization tools like these can help with many other issues as well. Want to find out where a weird camera stutter comes from? Plot the camera position, the position of its look-at target and any other variables that may influence its behavior to pin down the source of the problem. Draw a graph representing your memory fragmentation to find problematic allocations and get an overall feeling for how bad the situation is. Does something look slightly off with the animations? Graph the bone rotations to make sure that you don't have any vibrations or discontinuities. Graph your network usage to make sure you stay below the bandwidth cap.


Rotation of a bone during a jump animation.


When you study your game in this way, you will most likely learn things that surprise you. Games are highly complex systems built by a large number of people over a long period of time. As all complex systems they show emergent behavior. You can be quite certain that at least someone has done at least done something that is completely unexpected and totally weird. You can't hope to discover these things using just a bottom-up approach. There is too much code and too much data. Instead you must study your game as if it was an alien organism. Prod it and see how it reacts. Keep the graphs on screen and make sure that they look sane.

There are many different kinds of data that can be interesting and many ways of visualizing them - graphs, bars, charts, etc. But in all cases the pattern is pretty much the same. We have some data that we record from the game and then we have a visualizer that takes this data and draws it in some interesting way. Schematically, we can represent it like this:


Basic monitoring system schematic.


I will refine this picture shortly, but first lets do a little data-oriented design and ask ourselves how we can best store and process this data.

If you have read any of my earlier blog posts you will know that I'm a fan of big dumb continuous memory buffers and data structures that look like "file formats for memory". And this approach works perfectly for this problem. We can just store the data as a big block of concatenated structs, where each struct represents some recorded data. We begin each record with an enum specifying the type of recorded event and follow that with a variable sized struct with data for that particular event.


Data buffer layout.


The event types might be things such as ENTER_PROFILER_SCOPE, LEAVE_PROFILER_SCOPE, ALLOCATE_MEMORY, FREE_MEMORY, RECORD_GLOBAL_FLOAT, etc.

RECORD_GLOBAL_FLOAT is the event type used for all kinds of data that we want to draw in graphs. We record the data with calls like these:

record_global_float("application.delta_time", dt);
record_global_float("application.frame_rate", 1.0f / dt);

The corresponding data struct is just:

struct RecordGlobalFloatEvent {
    const char *name;
    float value;
};

Note that there is an interesting little trick being used here. When we record the events, we just record the string pointers, not the complete string data. This saves memory, makes the struct fixed size and gives us faster string compares. This works because record_global_float() is called with static string data that is always at the same address and kept in memory throughout the lifetime of the application. (In the rare case where you want to call record_global_float() with a dynamic string, you must allocate a copy of that string at some permanent location, i.e. do a form of string interning.)

Now, let's refine the picture slightly. There is a problem with recording all data to a single memory buffer and that is multithreading. If all threads record their data to the same memory buffer then we need lots of mutex locking to make sure they don't step on each other's toes.

We might also want to add support for some kind of off-line (i.e., not in-game) visualization. Off-line visualizers can take advantage of the full power of your development PC to implement more powerful visualization algorithms. And since they have near unlimited memory, they can record the entire data history so that you can explore it back and forth after the game session has ended.

With these refinements our monitoring system now looks like this:


Advanced monitoring system schematic.


Each thread has a small TLS (thread-local-storage) cache with 64 K or so of debug memory where it records its events. When the cache gets full or we reach the end of the frame, the thread acquires the lock to the global event buffer and flushes its data there.

The active on-line visualizers process the events in the buffer and visualize them. Simulatenously, we send the data over TCP so that it can be processed by any off-line visualizers. In the process we consume the buffer data and the buffer can be filled with new data from the threads.

(We allocate all the buffers we use on a special debug heap, so that we separate the allocations which we only do for debugging purposes from the allocations done by the main game.)

Recording float data requires just a few lines of code.

enum RECORD_GLOBAL_FLOAT_EVENT = 17;
enum THREAD_BUFFER_SIZE = 64*1024;
__thread char *_thread_buffer;
__thread unsigned _thread_buffer_count;

inline void record_global_float(const char *name, float value)
{
     if (_thread_buffer_count + 12 > THREAD_BUFFER_SIZE)
         flush_thread_buffer();
     
     char *p = _thread_buffer + _thread_buffer_count
     *(unsigned *)p = GLOBAL_FLOAT;
     *(RecordGlobalFloatEvent *)(p+4).name = name;
     *(RecordGlobalFloatEvent *)(p+4).value = value;
    thread_buffer_count += 12;
}

When you have the data, writing the graph visualizer is not much work. Just save the data over a couple of frames and plot it using a line drawer.

In the BitSquid engine, we also expose all the data recording functions to Lua scripting. This makes it possible to dynamically create graphs for all kinds of data while the game is running.

As an example of this, a couple of days ago a game programmer suspected that some problematic behavior was caused by a low update frequency in the mouse driver. We quickly bashed out a couple of lines in the game console to produce a graph of the mouse data and could immediately confirm that this indeed was the case:

Core.Debug.add_updator(
  function ()
    Profiler.record_statistics("mouse", Mouse.axis(0))
  end 
)
graph make mousegraph
graph add_vector3 mousegraph mouse
graph range mousegraph -20 20


Graph of mouse input showing frames with no input.

Tuesday, April 26, 2011

Universal Undo, Copy and Paste

Undo, Copy and Paste are the bane of any tools programmer. Especially when they are bolted on to an already existing program. But even when they are properly planned from the start, these small (but essential) features can consume a lot of development time and be the source of many bugs.

Wouldn't it be nice if all that could be eliminated?

In an earlier post I presented a generic model for storing data: objects-with-properties. As any model it consists of a combination of generalizations and restrictions. The generalizations make the model broadly applicable. The restrictions let us reason about it and prevents it from becoming an "inner platform".

To quickly recap, here is the gist of the model:

  • The data consists of a set of objects-with-properties.
  • Each object is identified by a GUID.
  • Each property is identified by a string.
  • The property value can be null, a bool, a double, a vector3, a quaternion, a string, a data blob, a GUID or a set of GUIDs.
  • The data has a root object with GUID 0.

We need only five operations to manipulate data stored using this model:

create(guid)
creates the object with the specified GUID
destroy(guid)
destroys the object with the specified GUID
set_property(guid, key, value)
sets the specified property of the object to the value (set to nil to remove the property)
add_to_set(guid, key, item_guid)
adds the item to the GUID set property identified by the key
remove_from_set(guid, key, item_guid)
removes the item from the GUID set property identified by the key

The interesting thing about this model is that it is generic enough to represent almost any kind of data, yet restricted enough to make it possible to define and perform a variety of interesting operations on the data. For example, in the previous post we saw that it was possible to define a property-based merge operation on the data (which for content files is much more useful than the line-based merge used by most version control systems).

Other operations that are easy to perform on this data are:

  • referential integrity checks (check that all GUIDs used exist in the database)
  • checks for "dangling" objects
  • object replacement (replace all references to an object's GUID with references to another object)

And, of course, the topic for the day: Undo, Copy and Paste.

Undo


To implement undo in this model, note that each of the five mutating operations we can perform on the data has a simple inverse:

Operation Inverse
create(guid) destroy(guid)
destroy(guid) create(guid)
set_property(guid, key, value) set_property(guid, key, old_value)
add_to_set(guid, key, item_guid) remove_from_set(guid, key, item_guid)
remove_from_set(guid, key, item_guid) add_to_set(guid, key, item_guid)

To implement Undo, all we have to do is to make sure that whenever the user performs one of the mutating operations, we save the corresponding inverse operation to a stack. To undo the latest action, we pop that last action from the stack and perform it. (We also save its inverse operation to a redo queue, so the user can redo it.)

Since the Undo operation is implemented on the low-level data model, all high-level programs that use it will automatically get "Undo" for free.

In the high level program you typically want to group together all the mutations that resulted from a single user action as one "undo item", so the user can undo them with a single operation. You can do that by recording "restore points" in the undo stack whenever your program is idle. To undo an action, you undo all operations up to the last restore point.

Copy


To copy a set of objects, create a new database that holds just the copied objects. Copy the objects with their keys and values to the new database. Also copy all the objects they reference. (Use a set to remember the GUIDs of the objects you have already copied.)

In the root object of the new database, store the GUIDs of all the copied objects under some suitable key (for example: "copied-models").

Then serialize the database copy to the clipboard (using your standard method for serialization).

Paste


To paste data, first unserialize it from the clipboard to a new temporary database. Then rename all the objects (give them new GUIDs) to make sure they don't collide with existing objects.

Renaming is simple, just generate a new GUID for every object in the database. Use a dictionary to record the mapping from an object's old GUID to the new GUID. Then, using that dictionary, translate all the references in the object properties from the old GUIDs to the new ones.

Finally, copy the objects from the temporary database to your main database.

Again, since Copy and Paste were implemented on the underlying data model and don't depend on the high level data (what kind of objects you actually store) you get them for free in all programs that use the data model.

Tuesday, April 12, 2011

Extreme Bug Hunting

Put on your camouflage vest and step out onto the hot motherboard plains. Squint against the searing rays of burning processor cycles and feel the warm wind of chassi fans fill the air with anticipation. Today we go bug hunting.

Our prey: the worst kind. Crashes only in release builds. Only on PS3. At different places every time. With a low reproduction rate. And there are only a few days left until submission. (Aren't there always?)

What can we do? Luckily, the situation is not as hopeless as it might seem. I recently dealt with a bug of this kind and here are my tips and tricks for bringing down such beasts:

Don't Panic!


No bug is impossible to fix. The reason you feel that way is because you don't know anything about it. The more you learn, the less scary the bug will seem.

Instead of focusing on fixing the bug, something you can't possibly do at this point, focus on finding out more about it. Gather information. Take a sheet of paper and write down everything you know and don't know about the bug. Write down ideas of what might be causing the bug as you think of them and cross them out as you eliminate them. Don't get stressed out by the fact that you are not fixing the bug right now. Instead be confident that everything you learn about the bug takes you one step closer to finding the cause.

Actually, the very things that make tricky bugs tricky already tells you some things about them:

Only in release builds. There can be several reasons for this. It could be that some of the code that is stripped out in release builds protects against the bug. The bug could be timing related, making it disappear in slower debug builds. Or the bug could be caused by uninitialized variables.

Only on PS3. This indicates that the bug might be in a PS3 specific system.

Low reproduction rate. This indicates that the bug depends on something random. Could be uninitialized memory (can contain random data) or a thread timing issue.

Different call stacks. This indicates that a bad system is causing failures in multiple other systems. The most likely explanation is that the bad system is overwriting the memory used by the other systems.

All taken together. This gives us a pretty decent working hypothesis:

Timing issues or uninitialized variables is causing a system (possibly a PS3 only system) to overwrite memory that doesn't belong to it.

Get a Stable Repro Case


To learn more about the bug, you need to be able to do experiments. I.e., change something and see if the bug is still there or not.

To do that effectively you need a reliable way of reproducing the bug. Can you isolate the behavior that produces the bug? Can you find a way of getting a better reproduction rate? Can you script what you just did, so that you have a way of reproducing the bug that doesn't require user input?

Even if you can't find a 100 % reliable repro case, an automated test is still useful. If the bug has a 30 % chance of occurring and you run the test 20 times without seeing the bug you can be pretty certain that it has disappeared. And if you have a completely automated test process, it should be able to run the tests while you procure a tasty beverage of your choice.

Gather Information


As already mentioned, the next step is to try to gather as much information about the bug as possible. The more you learn about the bug, the better chance you have of fixing it.

Just running the same repro case again and again quickly leads to diminishing returns. Instead, try manipulating the system slightly on each attempt and see what happens to the bug. Does it disappear? Does it become more frequent? Does it move to a different place? What does this tell you about the bug? Below are some useful manipulations to try.

Turn off System by System


Try turning of system by system in the engine until the bug disappears. Disable the sound system. Is the bug still there? Disable rendering. Can you still get the bug? And so on. If you have a modular engine design, it should be easy to turn off individual engine systems.

When a bug has a random component you can't be certain that a fix that made the bug disappear really fixed the bug. It might have just masked it. Still, if you don't make any assumptions at all you won't get anywhere. Just as when you solve a difficult crossword puzzle, you may have to make some guesses to get started. So if the bug disappears when you disable a particular system and reappears when you enable it, you can assume as a working hypothesis that the bug is caused by something in that system. But you should be ready to abandon that hypothesis if you find evidence to the contrary.

Search the Version History


Was the bug discovered recently? Try reverting to an earlier version of the code/data and see if the bug is still there.

If the bug disappears in an earlier version you can do a binary search of the revisions until you find the point where the bug was introduced. Git even has a cool command for this: git bisect. When you find the revision that introduced the bug it should be easy to spot the error.

Look at the Data


When you get a crash because of overwritten memory, look at the data that was written. If you are really lucky, you might recognize it and can make a decent guess of what system it came from.

Memory Breakpoint


Another lucky break is if it is the same memory location that is being trashed every time you run the program. In that case, you can just place a data breakpoint at that location and get the compiler to break when the memory is being overwritten.

Fill Allocated Memory with Bad Data


Could the error be caused by uninitialized data? One way of finding out is to fill memory with specific values on malloc() and see if the behavior of the bug changes. This requires that you have implemented your own memory allocators, but you should do that anyway.

Try changing malloc() (or whatever function you use to allocate memory) to always memset() the allocated memory to zero. Does the behavior of the bug change? Try a different pattern: 0xffffffff or 0x12345678. Does anything happen?

Disable Multi-Threading


Could the error be caused by race conditions between execution threads? Try running your systems synchronously instead of asynchronously. Run them all on the same processor. Is the bug still there?

Clear on Free


The two most common causes of random memory overwrites are:

  1. Code that writes to a memory address after having called free().
  2. Code that allocates a buffer of a certain size and writes beyond that size (buffer overflow).

Errors of type (1) can sometimes be found by clearing the memory when free() is called. If a system is accessing memory after having called free(), you might trigger an error in that system by clearing out the memory or filling it with a pattern.

Canary Values


Buffer overflow problems can be detected with something called "canary values" (named after the way canary birds were used to detect gas leaks in mines).

The idea is that every time you allocate memory, you allocate some extra bytes and fill them with a "canary value", a known pattern, such as 0x12345678. In the call to free() you check that the canary value is still intact. If some code is writing beyond the end of its buffers, it will overwrite the canary value and cause an assert() in the call to free().

Memory Verification


Many memory allocators have some kind of internal consistency check. For example in dlmalloc you can check that you are able to walk through all allocated memory blocks. If something is trashing the block headers, the consistency check will fail. By running the consistency check at regular intervals you can find out when the corruption occurs.

Once you have a time interval where the memory is okay at the start and corrupted at the end you can do a binary search of that interval by inserting more and more consistency checks until you find the exact point where the headers are overwritten.

Change Allocators


Sometimes just changing what allocator you use can move the crash to a different place and make it easier to see the real problem. Try switching between dlmalloc, the system allocator and your own allocators (if you have any).

Use the Virtual Memory System


Using virtual memory allocations is a good way of finding out if memory is being accessed after free(), since access to a page that has been freed results in a page fault.

If you suspect that the error is in a particular system, you can switch its allocations over to using the virtual memory allocator. Typically, you can't switch the entire engine over to virtual allocations since it has huge overheads. (You must round up all allocations to the page size.)

The Bug That Inspired This Article


Using these techniques we were able to hunt down a really tricky bug reasonably quickly. We wrote a script that could reproduce the bug with a rate of about 30 %. System shutdown and version history tests indicated that the bug was in the SPU decompression library, a relatively new system. This indication was strengthened by the fact that the bug occurred only on PS3. Switching that system to using the virtual memory allocator gave us a DMA error when the bad write occurred (from an SPU). From that we could immediately see the problem -- a race condition could cause the SPUs to continue DMAing decompressed data even after the destination buffer had been freed. With that information, the problem was easily fixed.

Sunday, March 27, 2011

Collaboration and Merging

(We are looking for a tools programmer.)

Games are huge collaborative efforts, but usually they are not developed that way. Mostly, assets can only be worked on by one person at a time and need to be locked in version control to prevent conflicting changes. This can be a real time sink, especially for level design, but all assets would benefit from more collaborative workflows. As tool developers, it is time we start thinking seriously about how to support that.

Recently I faced this issue while doing some work on our localization tools. (Localization is interesting in this context because it involves collaboration over long distances -- a game studio in one country and a translation shop in another.) In the process I had a small epiphany: the key to collaboration is merging. When data merges nicely, collaborative work is easy. If you can't merge changes it is really hard to do collaboration well, no matter what methods you use.

Why databases aren't a magic solution

A central database can act as backend storage for a collaborative effort. But that, by itself, does not solve all issues of synchronization and collaboration.

Consider this: if you are going to use a database as your only synchronization mechanism then all clients will have to run in lockstep with the database. If you change something, you have to verify with the database that the change hasn't been invalidated by something done by somebody else, perform the change as a single transaction and then wait for the database to acknowledge it before continuing. Every time you change something, you will have to wait for this round trip to the database and the responsiveness of your program is now completely at its mercy.

Web applications have faced this issue for a long time and they all use the same solution. Instead of synchronizing every little change with the database, they gather up their changes and send them to the database asynchronously. This change alone is what have made "web 2.0" applications competitive with desktop software.

But once you start talking to the database asynchronously, you have already entered "merge territory". You send your updates to the server, they arrive at some later point, potentially after changes made by other users. When you get a reply back from the server you may already have made other, potentially conflicting, changes to your local data. Both at the server and in the clients, changes made by different users must be merged.

So you need merging. But you don't necessarily need a database. If your merges are robust you can just use an ordinary version control system as the backend instead of a database. Or you can work completely disconnected and send your changes as patch files. The technology you use for the backend storage doesn't matter that much, it is the ability to merge that is crucial.

A merge-based solution has another nice property that you don't get with a "lockstep database": the possibility of keeping a local changeset and only submitting it to others when it is "done". This is of course crucial for code (imagine keeping all your source files in constantly mutating Google Documents). But I think it applies to other assets as well. You don't want half-finished, broken assets all over your levels. An update/commit workflow is useful here as well.

Making assets mergable

If you have tried to merge assets in regular version control systems you will know that they usually don't do so well. The merge tool can mess up the JSON/XML structure, mangle the file in other ways or just plain fail (because of a merge conflict). All of these problems arise because the merge tool treats the data as "source code" -- a line-oriented text document with no additional structure. The reason for this is of course historic, version control systems emerged as a way of managing source code and then grew into other areas.

The irony of this is that source code is one of the hardest things to merge. It has complicated syntax and even more complicated semantics. Source code is so hard to merge that even humans with all their intelligency goodness find it taxing. In contrast, most assets are easy to merge, at least conceptually.

Take localization, for instance. The localization data is just a bunch of strings with translations for different languages. If one person has made a bunch of German translations, another person has made some Swedish translations and a third person has added some new source strings, we can merge all that without a hitch. The only time when we have any problem at all is if two people has provided different translations for the same string in the same language. We can solve such standoffs by just picking the most recent value. (Optionally, we could notify the user that this happened by hilighting the string in the tool.)

Many other assets have a similar structure. They can be described as "objects-with-properties". For example, in a level asset the objects are the entities placed in the level and their properties are position, rotation, color, etc. All data that has this structure is easy to merge, because there are essentially just three types of operations you can perform on it: create an object, destroy an object and change a property of an object. All these operations are easy to merge. Again, the only problem is if two different users have changed the same property of the same object.

So when we try to merge assets using regular merge tools we are doing something rather silly. We are taking something that is conceptually very easy to merge, completely ignoring that and trying to merge it using rather complex algorithms that were designed for something completely different, something that is conceptually very hard to merge. Silly, when you think about it.

The solution to this sad state of affairs is of course to write custom merge tools that take advantage of the fact that assets are very easy to merge. Tools that understand the objects-with-properties model and know how to merge that.

A first step might be to write a merge program that understands XML or JSON files (the program in the link has some performance issues -- I will deal with that in my next available time slot) and can interpret them as objects-with-properties.

This only goes half the way though, because you will need some kind of extra markup in the file for the tool to understand it as a set of objects-with-properties. For example, you probably need some kind of id field to mark object identity. Otherwise you can't tell if a user has changed some properties of an old object or deleted the old object and created a new one. And that matters when you do the merge.

Instead of adding this extra markup, which can be a bit fragile, I think it is better to explicitly represent your data as objects-with-properties. I've blogged about this before, but since then I feel my thoughts on the subject have clarified and I've also had the opportunity to try it out in practice (with the localization tool). Such a representation could have the following key elements.

  • The data consists of a set of objects-with-properties.
  • Each object is identified by a GUID.
  • Each property is identified by a string.
  • The property value can be null, a bool, a double, a vector3, a quaternion, a string, a data blob, a GUID or a set of GUIDs.
  • The data has a root object with GUID 0.

We use a GUID to identify the object, since that means the ids of objects created by different users won't collide. GUID values are used to make links between objects. Note that we don't allow arrays, only sets. That is because array operations (move object from 5th place to 3rd place) are hard to merge. Set operations (insert object, remove object) are easy to merge.

Here is what a change set for creating a player entity in a level might look like using this model. (I have shortened the GUIDs to 2 bytes to make the example more readable.)

create #f341
change_key #f341 "entity-type" "player"
change_key #f341 "position" vector3(0,0,0)
add_to_set #0000 "entities" #f341

Note that the root object (which represents the level) has a property "entities" that contains the set of all entities in the level.

To merge two such change sets, you could just append one to the other. You could even use the change set itself as your data format, if you don't want to use a database backend (that is actually what I did for the localization tool).

I think most assets can be represented in the objects-with-properties model and it is a rather powerful way of making sure that they are mergable and collaboration-friendly. I will write all the new BitSquid tools with the object-with-properties model in mind and retrofit it into our older tools.

Sunday, March 13, 2011

A Tiny Expression Language

Putting some of the power of programming into the hands of artists and designers can be a great thing. When they can customize the behavior of an object directly, without making the roundtrip through a programmer, there is a lot more room for experimentation and iteration. As a result you get better looking things with more interesting interactions.

Plus, if the artists do their own damn programming it means less work for me, so everybody wins.

Of course I don’t expect artists to actually program, but rather to use tools that expose that power, such as shader graphs, visual scripting systems, or — the topic of this post — expression languages.

By an expression language I mean a tiny little programming language that can be used to (and only used to) write one-line mathematical expressions, such as:

sin(t) + 0.1 * cos(10 * t)

So it is a really simple little calculator language. Simpler than Lisp. Simpler than Forth. (Well maybe not, but simpler than trying to teach artists Lisp or Forth.) This simplicity has two advantages. First, it makes it easier to write and understand the expressions. Second, it makes it possible to compute the expressions efficiently, which is important, because it allows us to use them in more places without worrying too much about the performance or memory costs.

The expression language can be used to replace static values where we want the artist to be able to specify more unique behaviors. Some examples:

  • In the particle system it can be used to script complicated custom particle behaviors that are hard to produce with other types of controllers.
  • In the animation system it can be used to compute the play speed and blend values of animations based on controller variables.
  • In the physics system it can be used to define custom force fields to achieve special effects, such as tornados, explosions or whirlwinds.


Computing the Expressions

Since the expressions are so simple, usually not more than a few operators, we need to be able to evaluate them with as little overhead as possible. Otherwise, the overhead will dominate the execution cost. This means that we should use a simple design, such as a stack-based virtual machine. That may sound complicated, but the concepts are really quite simple. What it means is that we convert our expression to a sequence of operations that pushes or pops data from a computation stack. So our example from above:

sin(t) + 0.1 * cos(10 * t)

Gets converted into:

t sin 0.1 10 t * cos * +

Here t pushes the value of the variable t to the stack. sin pops the top value from the stack, computes it and pushes the result to the stack. 0.1 pushes the value 0.1 to the stack. + pops two values from the stack, adds them together and pushes the result to the stack. * works the same way. If you go through the operations in the example you see that it computes the same result as the original expression.

This way of writing expressions is called Reverse Polish notation (RPN) or postfix notation and it’s the basis for the programming language Forth.

If we examine the issue, we see that we really just need three types of operations in our byte code:

PUSH_VARIABLE
pushes the content of a variable to the stack
PUSH_FLOAT
pushes a floating point number to the stack
COMPUTE_FUNCTiON
pops the arguments of the stack, computes the result and pushes it to the stack
END
marks the end of the byte code

For simplicity I use 32 bits for each bytecode word. The upper 8 bits specify the type of the operation and the lower 24 bits is the data. For a variable the data is the index of the variable in a variable list. When compiling the bytecode you specify a list of variable names: {“t”, “x”}. And when executing you specify a corresponding list of variable values: {0.5, 20.1}. Similarly, for COMPUTE_FUNCTION, the data is an index into a function table. For PUSH_FLOAT we need an extra code word to hold the data, since we want 32 bit floats.

We can now write the function that runs the virtual machine, it is not much code at all:

struct Stack
{
 float *data;
 unsigned size;
 unsigned capacity;
}; 

bool run(const unsigned *byte_code, const float *variables, Stack &stack)
{
 const unsigned *p = byte_code;
 while (true) {
  unsigned bc = *p++;
  unsigned op = (bc >> 24);
  int i = bc & 0xffffff;
  switch (op) {
   case BC_PUSH_FLOAT:
    if (stack.size == stack.capacity) return false;
    stack.data[stack.size++] = unsigned_to_float(*p++);
    break;
   case BC_PUSH_VAR:
    if (stack.size == stack.capacity) return false;
    stack.data[stack.size++] = variables[i];
    break;
   case BC_FUNCTION:
    compute_function((OpCode)i, stack);
    break;
   case BC_END:
    return true;
  }
 }
}

Compiling the Byte Code

Compiling an expression involves three phases, tokenizing the data to a stream of input symbols, transforming that stream from infix to postfix notation and finally generating the byte code from that.

Tokenization means matching the identifiers in the expressions against a list of variable names and function names. We can also support contants that get converted to floats directly in the tokenization process. That is useful for things like pi.

The tokenization process converts our sample expression to something like this:

{ sin, (, t, ), +, 0.1, *, cos, (, 10, *, t, ) }

Now we need to convert this to infix notation. One way would be to write a full blown yacc parser with all that entails, but for this kind of simple expressions we can get away with something simpler, such as Dijkstra's Shunting Yard algorithm.

I actually use an even simpler variant that doesn't support right-associative operators, where I just process the input tokens one by one. If the token is a value or a variable I put it directly in the output. If the token is a function or an operator I push it to a function stack. But before I do that, I pop all functions with higher precedence from the function stack and put them in the output. Precedence takes parenthesis level into account, so a + nested in three parentheses has higher precedence than a * nested in two.

Let us see how this works for our simple example:

Input Output Stack
sin ( t ) + 0.1 * cos ( 10 * t )
( t ) + 0.1 * cos ( 10 * t ) sin
+ 0.1 * cos ( 10 * t ) t sin
0.1 * cos ( 10 * t ) t sin +
* cos ( 10 * t ) t sin 0.1 +
cos ( 10 * t ) t sin 0.1 + *
( 10 * t ) t sin 0.1 + * cos
* t t sin 0.1 10 + * cos
t t sin 0.1 10 + * cos (*)
t sin 0.1 10 t + * cos (*)
t sin 0.1 10 t * + * cos
t sin 0.1 10 t * cos + *
t sin 0.1 10 t * cos * +
t sin 0.1 10 t * cos * +

 

Constant Folding

To further improve efficiency we may want to distinguish the cases where the users have actually written an expression (such as “sin x”) from the cases where they have just written a constant (“0.5”) or a constant valued expression (“2*sin(pi)”). Luckily, constant folding is really easy to do in an RPL expression.

After tokenizing and RPL conversion, the expression “2 * sin(pi)” has been converted to:

2 3.14159265 sin *

We can constant fold a function of arity n if the n argument that preceedes it are constants. So in the sample above we can constant fold sin to:

2 3.14159265 sin *
2 0 *

Continuing, we can fold *

2 0 *
0

If we end up with a constant expression, the byte code will used be a single PUSH_FLOAT operation. We can detect that and bypass the expression evaluation all together for that case.

Source Code

If you want to start playing with these things you can start with my expression language source code.

Tuesday, March 8, 2011

BitSquid Tech: Benefits of data-driven renderer

Here are the slides from the talk I did last Wednesday at GDC in Nvidia's Game Technology Theater:



The presentation is also available with synced audio here.