Wednesday, June 18, 2014

What Is In a Name?

Today I'd like to revisit one of the most basic questions when designing a resource system for a game engine:

How should resources refer to other resources?

It seems like a simple, almost trivial question. Yet, as we shall see, no matter what solution we choose, there are hidden pitfalls along the way.

To give some context to the question, let's assume we have a pretty typical project setup. We'll assume that our game project consists of a number of individual resources stored in a disk hierarchy that is checked into source control.

There are three basic ways of referring to resources that I can think of:

  • By path

  • By GUID

  • By "name"

By Path

texture = "textures/flowers/rose"

This is the most straightforward approach. To refer to a particular resource you just specify the path to that resource.

A word of warning: If you use paths as references I would recommend that you don't accept ridiculous things such as "./././models\../textures\FLOWers/////rose" even though your OS may think that is a perfectly valid path. Doing that will just lead to lots of headaches later when trying to determine if two paths refer to the same resource. Only use a canonical path format, from the root of the project, so that the path to same resource is always the same identical string (and can be hashed).

Path references run into problem when you want to rename a resource:

textures/flowers/rose -> textures/flowers/less-sweet-rose

Suddenly, all the references that used to point to the rose no longer works and your game will break.

There are two possible ways around this:

Redirects

You can do what HTML does and use a redirect.

I.e., when you move rose, you put a little placeholder there that notifies anyone who is interested that this file is now called less-sweet-rose. Anyone looking for rose will know by the redirect to go looking in the new place.

There are three problems with this, first the disk gets littered with these placeholder files. Second, if you at some point in the future want to create a new resource called rose, you are out of luck, because that name is now forever occupied by the placeholder. Third, with a lot of redirects it can be hard to determine when two things refer to the same resource.

Renaming tool

You can use a renaming tool that understands all your file formats, so that when you change the path of a resource, the tool can find all the references to that path and update them to point to the new location.

Such a tool can be quite complicated to write -- depending on how standardized your file formats are. It can also be very slow to run, since potentially it has to parse all the files in your project to find out which other resources might refer to your resource. To get decent performance, you have to keep an up-to-date cache of the referencing information so that you don't have to read it every time.

Another problem with this approach can occur in distributed workflows. If one user renames a resource while another creates references to it, the references will break when the changes are merged. (Note that using redirects avoids this problem.)

Both these methods require renames to be done with a tool. If you just change the file name on disk, without going through the tool, the references will break.

By GUID

The problems with renaming can be fixed by using GUIDs instead of paths. With this approach, each resource specifies a GUID that uniquely identifies it:

guid = "a54abf2e-d4a1-4f21-a0e5-8b2837b3b0e6"

And other resources refer to it by using this unique identifier:

texture = "a54abf2e-d4a1-4f21-a0e5-8b2837b3b0e6"

In the compile step, we create an index that maps from GUIDs to compiled resources that we can use to look things up by GUID.

Now files can be freely moved around on disk and the references will always be intact. There is not even a need for a special tool, everything will work automatically. But unfortunately there are still lots of bad things that can happen:

  • If a file is copied on disk, there will be two files with the same GUID, creating a conflict that needs to be resolved somehow (with a special tool?)

  • Lots of file formats that we might want to use for our resources (.png, .wav, .mp4, etc) don't have any self-evident place where we can store the GUID. So the GUID must be stored in a metadata file next to the original file. This means extra files on disk and potential problems if the files are not kept in sync.

  • Referring to resources from other resources is not enough. We also need some way of referring to resources from code, and writing:

    spawn_unit("a54abf2e-d4a1-4f21-a0e5-8b2837b3b0e6")

    is not very readable.

  • If a resource is deleted on disk, the references will break. Also if someone forgets to check in all the required resources, the references will break. This will happen no matter what reference system we use, but with GUIDs, everything is worse because the references:

    texture = "a54abf2e-d4a1-4f21-a0e5-8b2837b3b0e6"

    are completely unreadable. So if/when something breaks we don't have any clue what the user meant. Was that resource meant to be a rose, a portrait, a lolcat or something else.

In summary, the big problem is that GUIDs are unreadable and when they break there is no clue to what went wrong.

By "Name"

Perhaps we can fix the unreadability of GUIDs by using human readable names instead. So instead of a GUID we would put in the file:

name = "garden-rose"

And the reference would be:

texture = "garden-rose"

To me, this approach doesn't have any advantages over using paths. Sure, we can move and rename files freely on disk, but if we want to change the name of the resource, we run into the same problems as we did before. Also, it is pretty confusing that a resource has a name and a file name and those can be different.

By Path and GUID?

Could we get the best of both worlds by combining a path and a GUID?

I.e., the references would look like:

texture = {
 path = "textures/flower/rose"
 guid = "a54abf2e-d4a1-4f21-a0e5-8b2837b3b0e6"
}

The GUID would make sure that file renames and moves were handled properly. The path would give us the contextual information we need if the GUID link breaks. We would also use the path to refer to resources from code.

This still has the issue with needing a metadata file to specify the GUID. Duplicate GUIDs can also be an issue.

And also, if you move a file, the paths in the references will be incorrect unless you run a tool similar to the one discussed above to update all the paths.

Conclusions

In the Bitsquid engine we refer to resources by path. Frustrating as that can be sometimes, to me it still seems like the best option. The big problem with GUIDs is that they are non-transparent and unreadable, making it much harder to fix stuff when things go wrong. This also makes file merging harder.

Using a (GUID, path) combination is attractive in some ways, but it also adds a lot of complexity to the system. I really don't like adding complexity. I only want to do it when it is absolutely necessary. And the (GUID, path) combination doesn't feel like a perfect solution to me. It would also require us to come up with a new scheme for handling localization and platform specific resources. Currently we do that with extensions on the file name, so a reference to textures/flowers/rose may open textures/flowers/rose.fr.dds if you are using French localization. If we switched to GUIDs we would have to come up with a new system for this.

We already have a tool (the Dependency Checker) that understands references and can handle renames by patching references. So it seems to me that the best strategy going forward is to keep using paths as references and just add caching of reference information to the tool so that it is quicker to use.

Friday, April 25, 2014

Building an Engine Plugin System

A plugin system is a useful way to allow developers to extend the capabilities of a game engine. Of course, an engine can also be extended by directly modifying the source code, but there are several drawbacks with that approach:

  • Changing the code requires you to recompile the engine. Anyone who wants to modify the engine must have the full source code, access to all the libraries and the build environment set up correctly.

  • Every time you pull changes from upstream you will have to merge your changes with the incoming patches. Over time, this adds up to a significant chunk of work.

  • Since you work directly in the source code, instead of against a published API, refactoring of engine systems might force you to rewrite your code from scratch.

  • There is no easy way to share the modifications you have made with other people.

A plugin system solves all these issues. Plugins can be distributed as compiled DLLs. They are easily shared and you can install them by just putting them in the engine's plugin folder. Since the plugins use an explicit API, they will continue to work with new versions of the engine (unless backwards compatibility is explicitly broken).

Of course, the plugin API can never cover everything, so there will always be things you can do by modifying the engine that you can't do through the plugin API. Nevertheless, it is a good complement.

A Tale of Two APIs

When building a plugin system, there are actually two APIs that you need to think about.

The first, and most obvious one, is the API that the plugin exposes to the engine: a set of exported functions that the engine will call at predefined times. For a very basic system, it could look something like this:

__declspec(dllexport) void init();
__declspec(dllexport) void update(float dt);
__declspec(dllexport) void shutdown();

The other API, which usually is a lot more complicated, is the API that the engine exposes to the plugin.

In order to be useful, the plugin will want to call on the engine to do stuff. This can be things like spawning a unit, playing a sound, rendering some meshes, etc. The engine needs to provide some way for plugins to call on these services.

There are a number of ways of doing this. One common solution is to put all the shared functionality in a common DLL and then link both the engine application and the plugin against this DLL.

The drawback of this approach is that the more functionality that the plugins need access to, the more must go in the shared DLL. Eventually you end up with most of the engine in the shared DLL, which is pretty far from the clean and simple APIs that we strive for.

This creates a very strong coupling between the engine and the plugins. Every time we want to modify something in the engine, we will probably have to modify the shared DLL and thus likely break all of the plugins.

As anyone who has read my previous articles know I really don't like these kinds of strong couplings. They prevent you from rewriting and refactoring your systems and thus eventually cause your code to stagnate.

Another approach is to let the engine's scripting language (in our case Lua) work as the engine's API. With this approach, any time a plugin wanted the engine to do something it would use a Lua call.

For lots of applications I think this can be a really good solution, but in our case it doesn't seem like a perfect fit. First, the plugins will need access to a lot of stuff that is more "low level" than what you can access from Lua. And I'm not to keen on exposing all of the engine's innards to Lua. Second, since both the plugins and the engine are written in C++, marshalling all the calls between them through Lua seems both overly complicated and inefficient.

I prefer to have an interface that is minimalistic, data-oriented and C-based (because of C++ ABI compatibility issues and also because of... well... C++).

Interface Querying

Instead of linking the plugin against a DLL that provides the engine API. We can send the engine API to the plugin when we initialize it. Something like this (a simplified example):

plugin_api.h:

typedef struct EngineApi
{
 void (*spawn_unit)(World *world, const char *name, float pos[3]);
 ...
} EngineApi;

plugin.h

#include "plugin_api.h"

__declspec(dllexport) void init(EngineApi *api);
__declspec(dllexport) void update(float dt);
__declspec(dllexport) void shutdown();

This is pretty good. The plugin develeoper does not need to link against anything, just include the header file plugin_api.h, and then she can call the functions in the EngineApi struct to tell the engine to do stuff.

The only thing that is missing is versioning support.

At some point in the future we probably want to modify the EngineApi. Perhaps we discover that we want to add a rotation argument to spawn_unit() or somehting else. We can achieve this by introducing versioning in the system. Instead of sending the engine API directly to the plugin, we send the plugin a function that lets it query for a specific version of the engine API.

With this approach, we can also break the API up into smaller submodules that can be queried for individually. This gives us a cleaner organization.

plugin_api.h

#define WORLD_API_ID    0
#define LUA_API_ID      1

typedef struct World World;

typedef struct WorldApi_v0 {
 void (*spawn_unit)(World *world, const char *name, float pos[3]);
 ...
} WorldApi_v0;

typedef struct WorldApi_v1 {
 void (*spawn_unit)(World *world, const char *name, float pos[3], float rot[4]);
 ...
} WorldApi_v1;

typedef struct lua_State lua_State;
typedef int (*lua_CFunction) (lua_State *L);

typedef struct LuaApi_v0 {
 void (*add_module_function)(const char *module, const char *name, lua_CFunction f);
 ...
} LuaApi_v0;

typedef void *(*GetApiFunction)(unsigned api, unsigned version);

When the engine instances the plugin, it passes along get_engine_api(), which the plugin can use to get hold of different engine APIs.

The plugin will typically set up the APIs in the init() function:

static WorldApi_v1 *_world_api = nullptr;
static LuaApi_v0 *_lua_api = nullptr;

void init(GetApiFunction get_engine_api)
{
 _world_api = (WorldApi_v1 *)get_engine_api(WORLD_API, 1);
 _lua_api = (LuaApi_v0 *)get_engine_api(LUA_API, 0);
}

Later, the plugin case use these APIs:

_world_api->spawn_unit(world, "player", pos);

If we need to make a breaking change to an API, we can just introduce a new version of that API. As long as get_engine_api() can still return the old API version when requested for it, all existing plugins will continue to work.

With this querying system in place for the engine, it makes sense to use the same approach for the plugin as well. I.e. instead of exposing individual functions init(), update(), etc, the plugin just exposes a single function get_plugin_api() which the engine can use in the same way to query APIs from the plugin.

plugin_api.h

#define PLUGIN_API_ID 2

typedef struct PluginApi_v0
{
 void (*init)(GetApiFunction get_engine_api);
 ...
} PluginApi_v0;

plugin.c

__declspec(dllexport) void *get_plugin_api(unsigned api, unsigned version);

Since we now have versioning on the plugin API as well, this means we can modify it (add new required functions, etc) without breaking existing plugins.

Putting It All Together

Putting all this together, here is a complete (but very small) example of a plugin that exposes a new function to the Lua layer of the engine:

plugin_api.h

#define PLUGIN_API_ID       0
#define LUA_API_ID          1

typedef void *(*GetApiFunction)(unsigned api, unsigned version);

typedef struct PluginApi_v0
{
 void (*init)(GetApiFunction get_engine_api);
} PluginApi_v0;

typedef struct lua_State lua_State;
typedef int (*lua_CFunction) (lua_State *L);

typedef struct LuaApi_v0
{
 void (*add_module_function)(const char *module, const char *name, lua_CFunction f);
 double (*to_number)(lua_State *L, int idx);
 void (*push_number)(lua_State *L, double number);
} LuaApi_v0;

plugin.c

#include "plugin_api.h"

LuaApi_v0 *_lua;

static int test(lua_State *L)
{
 double a = _lua->to_number(L, 1);
 double b = _lua->to_number(L, 2);
 _lua->push_number(L, a+b);
 return 1;
}

static void init(GetApiFunction get_engine_api)
{
 _lua = get_engine_api(LUA_API_ID, 0);

 if (_lua)
  _lua->add_module_function("Plugin", "test", test);
}

__declspec(dllexport) void *get_plugin_api(unsigned api, unsigned version)
{
 if (api == PLUGIN_API_ID && version == 0) {
  static PluginApi_v0 api;
  api.init = init;
  return &api;
 }
 return 0;
}

engine.c

// Initialized elsewhere.
LuaEnvironment *_env = 0;

void add_module_function(const char *module, const char *name, lua_CFunction f)
{
 _env->add_module_function(module, name, f);
}

void *get_engine_api(unsigned api, unsigned version)
{
 if (api == LUA_API_ID && version == 0 && _env) {
  static LuaApi_v0 lua;
  lua.add_module_function = add_module_function;
  lua.to_number = lua_tonumber;
  lua.push_number = lua_pushnumber;
  return &lua;
 }
 return 0;
}

void load_plugin(const char *path)
{
 HMODULE plugin_module = LoadLibrary(path);
 if (!plugin_module) return;
 GetApiFunction get_plugin_api = (GetApiFunction)GetProcAddress(plugin_module, "get_plugin_api");
 if (!get_plugin_api) return;
 PluginApi_v0 *plugin = (PluginApi_v0 *)get_plugin_api(PLUGIN_API_ID, 0);
 if (!plugin) return;
 plugin->init(get_engine_api);
}

Wednesday, September 25, 2013

Scripted Network Debugging

Debugging network problems is horrible. Everything is asynchronous. Messages can get lost, scrambled or delivered out-of-order. The system is full of third-party black boxes: external transport layers (PSN, Steam, etc), routers, firewalls and ineptly written packet intercepting anti-virus programs. (I've got my eye on you!)

Reproducing a problem requires setting up multiple machines that are all kept in sync with any changes you make to the code and the data in order to try to fix the problem. It might also require roping in multiple players to actually sit down and play the game on all those machines. This can make a simple bug turn into a multi-day or even a multi-week problem.

Here are some quick tips for making network debugging easier:

  • Have a single place for disabling timeouts. Few things are as frustrating as looking at a problem in the debugger, almost finding the solution and then having the entire game shutdown because the server flagged your machine as unresponsive while you where broken in the debugger. Having a single place where you can disable all such timeouts makes the debugger a lot more useful for solving network problems.

  • Attach Visual Studio to multiple processes. Not everybody is aware of this, but you can actually attach the Visual Studio debugger to multiple processes simultaneously. So you can start a network session with eight players and then attach your debugger to all of them. This can be used to follow messages and code flow between different network nodes.

  • Make sure you can start multiple nodes on the same machine (using different ports). This allows you to debug many network issues locally, without running between different machines in the office or gather a stack of laptops on your desk. Of course this doesn't work if you are targeting consoles or Steam, since you can't use multiple Steam accounts simultaneously on the same machine. (Please fix, k thx bye!)

  • Have a way to log network traffic. We have a switch that allows us to log all network traffic (both incoming and outgoing) to a file. That file can be parsed by a custom GUI program that understands our network protocol. This allows us to see all the messages to and from each node, when they were sent and when they were received. This allows us to diagnose many network issues post-fact. We also have a replay functionality, where we can replay such a saved log to the network layer and get it to behave exactly as it did in the recording session.

But today I'm going to focus on a different part of network debugging: scripted tests.

The idea is that instead of running around manually to a lot of different machines, copying executables and data, booting the game, jumping into menus, etc, etc, we write a little Ruby script that does all that for us:

  • Distribute executables

  • Distribute project data

  • Start the game

  • Set up a multi-player session

  • Play the game

I recently had to debug a network issue with a low reproduction rate. With the script I was able to set up and run 500 sample matches in just a few hours and reproduce the bug. Doing that by hand wouldn't even have been possible.

Let's look at each of the tasks above and see how we can accomplish them:

Distribute executables

This could be done by the script, but to simplify things as much as possible, I just use a Bittorrent Sync folder to this. I've shared the tool-chain directory on my development machine (the tool-chain contains the tools and executables for all platforms) and connected all the other machines to that directory. That way, whenever I build a new executable it will automatically be distributed to all the nodes.

I have a nodes-config.rb file for defining the nodes, where I specify the tool-chain directory used by each node:

LOCAL = Node.new(
 :toolchain => 'c:\work\toolchain')

MSI = Node.new(
 :ip => '172.16.8.33', 
 :toolchain => 'd:\toolchain', 
 :exec => PsExec.new(:name => 'bitsquid-msi', :user => 'bitsquid', :password => ask_password('bitsquid-msi')))

MACBOOK = Node.new(
 :ip => '172.16.8.22', 
 :toolchain => 'c:\toolchain', 
 :exec => PsExec.new(:name => 'bitsquidmacbook', :user => 'bitsquid', :password => ask_password('bitsquidmacbook')))

NODES = [LOCAL, MSI, MACBOOK]

Distribute project data

Since the Bitsquid engine can be run in file server mode I don't actually need to distribute the project data. All I have to do is start a file server on my development machine and then tell all the network nodes to pull their data from that file server. I do that by starting the engine with the arguments:

-host 172.16.8.14 -project samples/network

The nodes will pull the data for the project samples/network from the file server at IP 172.16.8.14 and all get the latest data.

Start the game

On the local machine I can start the game directly with a system() call. To start the game on the remote machines I use PsExec. The relevant source code in the script looks like this:

require_relative 'console'

# Class that can launch executables on the local machine.
class LocalExec
 def launch(arg)
  system("start #{arg}")
 end
end

# Class used for executables launched by other means.
class ExternalExec
 def launch(arg)
 end
end

# Class used for executables launched on remote machines with psexec.
class PsExec
 def initialize(args)
  @name = args[:name]
  @user = args[:user]
  @password = args[:password]
 end

 def launch(arg)
  system("psexec \\\\#{@name} -i -d -u #{@user} -p #{@password} #{arg}")
 end
end

# Class that represents a node in the network test.
class Node
 # Initializes the node from hash data
 #
 # :ip => '127.0.0.1'
 #   The IP address of the node.
 # :toolchain
 #   Path to the toolchain folder on the node machine.
 # :exec => LocalExec.new
 #   Class for executing programs (LocalExec, ExternalExec, PsExec)
 # :port => 64000
 #   Port that the node should use.
 def initialize(args)
  @ip = args[:ip] || '127.0.0.1'
  @toolchain = args[:toolchain]
  @exec = args[:exec] || LocalExec.new
  @port = args[:port] || 64000
 end

 # Starts the project on the remote node and returns a console connection for talking to it.
 def start_project(arg)
  @exec.launch "#{exe_path} -port #{@port} #{arg}"
  return Console.new(@ip, @port)
 end

private
 def exe_path
  return @toolchain + '\engine\win32\bitsquid_win32_dev.exe'
 end
end

Each node specifies its own method for launching the game with the :exec parameter, and that method is used by start_project() to launch the game.

Additional execution methods could be added. For example for launching the game on PS3s and X360s.

Setup a multi-player session

To get the game to do what we want once it has started we use the in-game console.

All Bitsquid games act as TCP/IP servers when running in development mode. By connecting to the server port of a running game we can send Lua script commands to that game. The Ruby code for doing that is mercifully short:

require 'socket'

# Class that handles console communication with a running bitsquid executable.
class Console
 JSON = 0
 JSON_WITH_BINARY = 1

 # Create a new console connection to specified host and port.
 def initialize(host, port)
  @socket = TCPSocket.new(host, port)
 end

 # Send the specified JSON-encoded string to the target.
 def send(json)
  msg = [JSON, json.length].pack("NN") + json
  @socket.write(msg)
 end

 # Send the specified lua script to be executed on the target.
 def send_script(lua)
  lua = lua.gsub('"', '\\"')
  send("{type: \"script\", script: \"#{lua}\"}")
 end
end

# Handles multiple console connections
class Consoles
 attr_reader :consoles

 def initialize(arg)
  @consoles = arg.respond_to?(:each) ? arg : [arg]
 end

 def send_script(lua)
  @consoles.each do |c| c.send_script(lua) end
 end
end

Node.start_project() returns a Console object that can be used to talk with the newly created network node. Since all the gameplay code for Bitsquid games is written in Lua, setting up a multi-player game is just a matter of sending the right Lua commands over that connection.

Those commands will be game specific. In the network sample where I implemented this, there is a global variable called force_menu_choice which when set will force a selection in the in-game menus. We can use this to set up a network game:

require_relative 'nodes-config'

consoles = NODES.collect do |n| n.start_project("-host 172.16.8.14 -project samples/network") end
server = consoles[0]
clients = Consoles.new(consoles[1..-1])
all = Consoles.new(consoles)

puts "Waiting for exes to launch..."
sleep(1)
puts "Launching steam..."
all.send_script %q{force_menu_choice = "Steam Game"}
sleep(1)
server.send_script %q{force_menu_choice = "Create Lobby"}
sleep(1)
clients.send_script %q{force_menu_choice = "Find Lobby"}
sleep(1)
clients.send_script %q{force_menu_choice = "Niklas Test Lobby"}
sleep(1)
server.send_script %q{force_menu_choice = "Start Game"}

This will start a Steam Lobby on the server, all the clients will search for and join this lobby and then the server will start the game.

Play the game

Playing the game is again just a question of sending the right script commands to expose the bugs you are interested in. In my case, I just tested spawning some network synchronized boxes:

server.send_script %q{
 local self = Sample.screen
 local camera_pos = Unit.world_position(self.world_screen.camera_unit, 0)
 local camera_forward = Quaternion.forward(Unit.world_rotation(self.world_screen.camera_unit, 0))
 local box_unit = World.spawn_unit(self.world_screen.world, "units/box/box", camera_pos)
 local box_id = GameSession.create_game_object(self.game, "box", {position=camera_pos})
 self.my_boxes[box_id] = box_unit
 Actor.set_velocity(Unit.actor(box_unit, 0), camera_forward*20)
}
sleep(40)
clients.send_script %q{
 local self = Sample.screen
 local camera_pos = Unit.world_position(self.world_screen.camera_unit, 0)
 local camera_forward = Quaternion.forward(Unit.world_rotation(self.world_screen.camera_unit, 0))
 local box_unit = World.spawn_unit(self.world_screen.world, "units/box/box", camera_pos)
 local box_id = GameSession.create_game_object(self.game, "box", {position=camera_pos})
 self.my_boxes[box_id] = box_unit
 Actor.set_velocity(Unit.actor(box_unit, 0), camera_forward*20)
}

And that is really all. I also added some similar code for shutting down the gameplay session and returning to the main menu so that I could loop the test.

And 500 iterations later, running on the three machines on my desk, the bug was reproduced.

Friday, August 16, 2013

Finding nearby stuff

A problem that crops up quite often in game programming is the need to find stuff that is "close to" other stuff. For example, you may want to find all enemies close to the player, all treasures close to a goblin, etc.

Most recently, I ran into this problem when I added support for merging navigation meshes to the Bitsquid engine.

To merge meshes, I need to find all the vertices that are "sufficiently close" (within some tolerance limit) in the meshes that I'm merging. These vertices should be considered "the same" in the merged mesh and need to be treated specially.

The naive algorithm for finding these coinciding vertices is to just do a double loop:

foreach (v1 in vertices)
   foreach (v2 in vertices)
      if (distance(v1, v2) < tolerance)
         ...

But since this algorithm is O(n^2) in the number of vertices, it is often prohibitively expensive.

To improve the performance you need some kind of data structure for accelerating spatial queries. There are lots of different possibilities. Real-Time Collision Detection by Christer Ericsson has a good coverage of them.

One of the simplest approaches is to use some kind of grid bucket system. You divide the world into a grid and for each grid cell you store a list of the vertices that fall in that cell. To check for "close" vertices you look through the list of vertices in your cell:

Depending on what your definition of "close" is (how big the search radius is compared to the grid size), you may need to search more cells. Note that if the search starts close enough to a grid corner you always have to search at least four cells, no matter how small the search radius is. (For a two-dimensional grid, a three-dimensional grid requires you to search eight cells.)

If you know what the search radius is going to be (for example, in my vertex merging case I always use the same merge distance) you can adapt the grid so that you never have to search more than four cells, by setting the grid size to be equal to the search diameter.

With a larger grid size you can sometimes get by with searching just one or two cells, depending on how close to a grid line the search starts, but there will always be some positions where you will have to look at four cells.

The fact that you at most have to look at four cells can be used to make an interesting variant of the algorithm. Instead of checking the four neighboring cells in the lookup, you can store the item in all four neighboring cells. That way you will only have to check a single cell when you do the lookup. This approach will make lookups four times faster, insertions four times slower and use four times as much memory. It can be a good optimization if you have a high ratio of lookups to insertions.

For my particular case (vertex merging) I only have one lookup per insertion, so this would not be a net win.

Designing the grid can be tricky.

How large should it be? You may need to do a pre-pass over your data to find the range, which isn't possible if you don't have all the data up front. (Letting the grid coordinates wrap around can solve this in some cases.)

How big should the cells be? If you make them too small, the grid will consume too much memory. If you make them too big, you may end up with lots of points that you need to check in each cell, which will make the algorithm slow.

What if the data is laid out on a diagonal? In this case most grid cells will be empty, wasting memory:

Most of these concerns can be fixed by not storing the grid in a traditional matrix, but instead use a hash that maps from grid coordinates to cell data:

struct GridCoordinate {
   int x;
   int y;
};

HashMap<GridCoordinate, CellData> grid;

// To insert an item
GridCoordinate gc;
gc.x = (int)floor(p.x / cell_size);
gc.y = (int)floor(p.y / cell_size);
grid[gc] = cell_data;

This way, you will only use memory for the cells that actually contain data and lookups are still O(1). You also don't have to care about how big the grid is or what happens if data ends up outside the grid. In fact, you only have one parameter left to worry about, the cell_size.

As mentioned above, a good heuristic is to use:

float cell_size = 1.0 * search_diameter;

That way you have to investigate exactly four cells in each lookup.

If the data is sparse compared to the search radius you can use a bigger cell size, which means that you don't always have to check all four neighbors. But note that the average number of cells you need to search in each lookup goes down slowly, while the cell area (and correspondingly the average number of items in each cell) grows quickly. So only use this optimization when you know that the data is sparse.

Multiplier Avg. cells to search Cell area
1.0 4.00 1.00
1.5 2.78 2.25
2.0 2.25 4.00
3.0 1.78 9.00
5.0 1.44 25.00
10.0 1.21 100.00

The final piece of this puzzle is what CellData should look like. It might be tempting to do something like:

typedef Vector<VertexId> CellData;

However, this would be highly inefficient. In many cases cells will only contain a few items and allocating a Vector for them is total overkill. Using a Vector will mean tons of pointer chasing, cache misses and data copying.

For me, a general rule of thumb when writing high performance C++ code is to avoid collections stored inside other collections. If you store collections in collections it is very easy to create innocuous looking code that does tons of data copying behind your back.

So what can you do instead. If you have a good MultiHashMap implementation, you can use that. Otherwise, you can do something like this:

struct CellData {
   VertexId id;
   unsigned next; 
};
Array<CellData> items;

Here, items contains linked lists of the items in each cell, stored in a continuous array (which is the only sane way to store linked lists). The HashMap gives the first cell item. Then you follow the next references in the items list to find subsequent items in the same cell until next == UINT_MAX, which marks the end of the list.

This basic pattern: grid coordinates -> hash map -> linked list in array is my standard go-to solution for making spatial queries. It is straightforward, easy to understand and uses memory reasonably while providing O(1) lookups.

Tuesday, April 30, 2013

Code Share: Source Censoring, Part 2

A while ago I shared the tool we use for censoring source code in the Bitsquid engine.

Quick recap: We need to censor the code in our source distributions because there are parts of the code that are covered by NDAs to third parties and cannot be publicly disclosed. We do this with a tool that strips out the secret code and replaces it with blank lines, based on preprocessor definitions.

The stripping tool is only part of the solution, though. It works well if you only distribute code drops. You take a snapshot of the code, run the stripping tool to strip out secrets, zip it up. Done!

But frankly this is a terrible way of distributing source code. There is no history, no indication of what has changed from version to version and keeping local changes merged with the mainline is a constant pain in the ass.

The only sane way of distributing source code is to expose a mercurial (or git) source repository that you can pull changes from. This lets customers examine the history, find out which version introduced a particular bug, maintain their own branches that they merge with the mainline at their convenience, etc.

But of course, we cannot just share our own internal repository (because it contains secrets).

hg-clone.rb

We handle this with another tool, that we have inventively decided to call hg-clone.rb.

What hg-clone.rb does is pretty straight forward. Given two repositories as argument, a SOURCE and a DESTINATION, it checks out each revision in the SOURCE repository, runs a filter program (to strip out any secrets) and checks the result into the destination repository.

SRC:    0  --> 1  --> 2  --> 3  --> 4  --> 5  --> ...
     |      |      |      |      |      |
     F      F      F      F      F      F
     |      |      |      |      |      |
        v      v      v      v      v      v
DST:    0' --> 1' --> 2' --> 3' --> 4' --> 5' --> ...

You call the program as

hg-clone SOURCE DESTINATION --filter FILTER --target TARGET-REV --cutoff CUTOFF-REV

SOURCE and DESTINATION are the source and destination repositories. DESTINATION does not need to exist, if it doesn't it will be created. FILTER is the filter program, it will be run once in the destination directory before each revision is committed.

TARGET-REV is the target revision that should be copied from the source to the destination. hg-clone will first transfer the parent(s) of the target revision to the destination repository (if they haven't already been transfered), then it will transfer the target revision. This process is applied recursively, so if the parents' parents haven't been transferred, they will be transferred first, etc. Only the revisions that are ancestors of TARGET-REV will be transferred, so you can have secret development branches that won't be copied to the destination until they have been merged with your release branch.

If you don't specify a TARGET-REV, the last revision in the source repository will be used.

CUTOFF-REV can be used to cutoff the recursive parent transfer at a specific revision. If you set the cutoff to revision 1000, then any revision that has a parent before revision 1000 will be reparented to revision 1000 in the destination repository. Essentially, in the destination repository, history will start at revision 1000. This can be used to hide a shady past.

hg-clone tries its best to preserve authors, dates, messages, branches, etc between the source and destination repositories. It cannot however preserve version numbers, since those are based on a content hash, which changes when the filter is applied. What it does instead is to insert a marker [clonedfrom:116:91fe33c1a569] in the commit message that specifies which revision in the source repository the current revision in the destination repository was cloned from. This commitment marker is also used to determine the mapping between revisions in the source and the destination and whether a particular revision has already been copied over or not.

To use this in practice, you would typically set up one external repository for each customer with a corresponding filter program for stripping out the things that customer is not allowed to see. Then you would set up a cron job to run hg-clone and copy revisions from the main repository to the customer's.

Instead of having one repository per customer, you could alternatively have one repository for each possible NDA combination (e.g., +PS3 +PS4 -X360). However, this can be problematic, because if a customer becomes disclosed for a platform you will have to switch them over to a new repository, which might be painful. If you have one repository per customer you can just change the filter function.

The hg-clone program is available from our bitbucket repository.

Wednesday, April 3, 2013

Friday, March 15, 2013

What is gimbal lock and why do we still have to worry about it?

If you have ever worked with rotations and Euler angles you are probably at least somewhat familiar with the phrase "gimbal lock". But like many things concerning rotations, angles and spaces it can be tricky to visualize and get a good grasp of.

Sometimes it feels like every time I need to think about gimbal lock I have forgotten everything about it and have to go back the beginning and ask myself: OK, but what is it really that is happening?

Hopefully, this article will take care of that problem.

The Wikipedia page shows how gimbal lock can happen in a mechanical system. But it isn't necessarily self-evident how this translates to the computer game world. In the computer there are no mechanical limitations, we can rotate an object however we like. How can anything be "locked"?

Euler angles

When we are using Euler angles, we represent an object's orientation as three consecutive rotations around the object's axes. We can choose the axes and the order in which we apply the rotations arbitrarily, and depending on what we choose we get different Euler representations. So XYZ is the Euler representation where the first angle rotates the object around its X-axis, the second around its (new) Y-axis and the third around its (new) Z-axis. YZX gives us a different representation. We can even have representations with repeated axes, such as XZX.

So if we want to talk about the "Euler angles" of an object, we really must also talk about what axes we are rotating around and in what order. Otherwise we have no idea at all what we are talking about. Unfortunately, many articles about Euler angles are pretty sloppy with this and throw around terms like yaw, pitch and roll as if they had completely well-defined and unambiguous meanings. I prefer to use more wordy, but descriptive names, such as euler_xyz[0] that unambiguously state the axis rotation order and the index of the angle we are talking about.

An object has three rotational degrees of freedom and it is quite easy to see that the three Euler angles for a particular axis order (XYZ) are enough to define any possible orientation of an object. Note though that the representation is not unique. There are many possible Euler angles that represent the same orientation. For example, adding 360 degrees to any of the three angles will give us a different representation that results in the same object orientation.

So "gimbal lock" doesn't mean that there are rotations that can't be expressed as Euler angles. We can express any rotation in Euler angle form. Given an object, we can convert its orientation to Euler angles, and from that orientation we can rotate the object however we like and convert the new orientation to other Euler angles.

So what exactly is it that is "locked"? It seems we can do whatever we like.

The "lock" in gimbal lock

The term "gimbal lock" comes from the mechanical world. If the problem had originated in the world of computers, it would probably have been called something less confusing, such as "Euler angle flip" or "coordinate singularity".

Because, in the computer world, there is really nothing that gets "locked". Instead, the problem is this: when the Euler angles have particular values, there are orientations that are very similar to the current orientation which can't be achieved by just making small changes to the Euler angles. In particular, this happens when one of the angles is at 90 degrees, so that two rotation axes coincide.

So even though the orientations are "close" in the real world, they are not close in the Euler representation. In fact, at least one of the Euler angles will have to flip 180 degrees in order for us to represent the new orientation.

So one of the angles have to flip? What is the big deal? Can't we just flip it and get on with our stuff?

We can, as long as the angles only represent instantaneous "snapshots" of the object's orientation. However, if the angles represent key frames in an animation and we want to interpolate between those key frames we run into trouble. If one of the angles flips 180 degrees between two key frames and we interpolate between those values, we will see the the object animating through all those 180 degrees. In the viewport, we will see the object doing a "flip" or "roll" that shouldn't be there.

Note that it is only the interpolation that creates this unwanted behavior. If we just displayed the actual key frames and didn't interpolate between them -- everything would look right. We could work in Euler angles as much as we liked and be as close to the gimbal lock position as we wanted and no-one would ever know.

So the only thing we need to fix to get rid of gimbal lock is the interpolation. If you have done any work with 3D graphics you probably already know the answer -- to use quaternions instead of Euler angles to represent angles. Quaternions don't have the weird singularity points that Euler angles have and we can interpolate between any keyframes by just lerping the quaternions.

It doesn't matter if the animation package is using Euler angles internally, as long as we convert everything to quaternions before we do the interpolation. Note that interpolation in quaternion space is not the same as interpolation in Euler space though, so to get as close as possible to what the animator intended, we probably want to sample the animation at our target frame rate and generate our quaternion key frames from those samples, rather than directly converting the animator's key frames (which may be further apart).

Well, there is one caveat actually. If we have more than 180 degrees of rotation in a single frame we can't represent that nicely with quaternions. Quaternions always lerp the shortest path between two orientations and you can't represent several "laps" of rotation with quaternions as you can do with Euler angles (by setting one of the angles to 9000 degrees, for example). But you can fix that by sampling at a higher frame rate if you need to represent really fast rotations with quaternions.

So with that we can say good bye and good riddance to Euler angles and never have to worry about their sorry gimbal locking asses ever again.

Or so you may think...

The return of gimbal lock

I certainly thought so, until I started working on the new cutscene animation system for our level editor.

You see, animators really like to work with curves. They like to see a visual representation of what the animation will do to an object over time with key points that can be moved and handles that can be adjusted to change the slope of the curve.

Curves! Animators love them!

Quaternions are great for interpolation, but they are no good for curve editing.

Sure, you could probably draw some curves that represented a quaternion (the laziest thing would be to just draw the x, y and z components of the quaternion), but those curves wouldn't mean anything to an animator, the way the Euler angle curves do. They wouldn't be able to do anything with them.

So, animators want curves with keyframe interpolation. Curves need Euler angles. But what happens when we mix Euler angles with keyframe interpolation? Presto! Our old friend the gimbal lock is back again! Haven't we missed him.

That's it. We're stuck. Gimbal lock is here to stay and the animators will just have to work around it.

And we have to add support for all the usual tricks and workarounds that animators use to get around gimbal lock, such as changing the axis order (from XYZ, to XYZ, XZX or another of the twelve possible permutations), converting to quaternion and back again, applying an "Euler filter", etc.

But who said this game engine gig should be easy?