Friday, September 17, 2010

Visual Scripting the Data-Oriented Way

The BitSquid engine has two separate scripting systems. The first is Lua. The entire engine is exposed to Lua in such a way that you can (and are encouraged to) write an entire game in Lua without touching the C++ code at all.

The second system, which is the focus of this post, is a visual scripting system that allows artists to add behaviors to levels and entities. We call this system Flow.

Lua and Flow have different roles and complement each other. Lua allows gameplay programmers to quickly test designs and iterate over gameplay mechanics. Flow lets artists enrich their objects by adding effects, destruction sequences, interactivity, etc. When the artists can do this themselves, without the help of a programmer, they can iterate much faster and the quality is raised.

Flow uses a pretty standard setup with nodes representing actions and events.  Black links control how events cascade through the graph causing actions to be taken. Blue links represent variables that are fed into the flow nodes:


In this graph, the green nodes represent events and the red node represents an action. The yellow node is an action implemented in Lua. The gameplay programmers can set up such actions on a per-project basis and make them available to the artists.

Each unit (= entity) type can have its own flow graph, specifying that unit's behavior, e.g. an explosion sequence for a barrel. There is also a separate flow graph for the entire level that the level designers can use to set up the level logic.

Since Flow will be used a lot I wanted to implement it as efficiently as possible, both in terms of memory and performance. This means that I don't use a standard object-oriented design where each node is a separate heap-allocated object that inherits from an abstract Node class with a virtual do_action() method. That way lies heap allocation and pointer chasing madness.

Sure, we might use such a design in the editor, where we don't care about performance and where the representation needs to be easy to use, modifiable, stable to version changes, etc. In the runtime, where the graph is static and compiled for a particular platform, we can do a lot better.

(This is why you should keep your editor (source) data format completely separate from your in-game data format. They have very different requirements. One needs to be dynamic, multi-platform and able to handle file format version changes. The other is static, compiled for a specific platform and does not have to care about versioning -- since we can just recompile from source if we change the format. If you don't already -- just use JSON for all your source data, and binary blobs for your in-game data, it's the only sane option.)

The runtime data doesn't even have to be a graph at all. There are at least two other options:

  • We could convert the entire graph into a Lua program, representing the nodes as little snippets or functions of Lua code. This would be a very flexible approach that would be easy to extend. However, I don't like it because it would introduce significant overhead in both CPU and memory usage.
  • We could "unroll" the graph. For each possible external input event we could trace how it flows through the graph and generate a list of triggered actions as a sort of "bytecode". The runtime data would then just be a bytecode snippet for each external event. This has the potential of being very fast but can be complicated by nodes with state that may cause branching or looping. Also, it could potentially use a lot of extra memory if there are many shared code paths.

So I've decided to use a graph. But not a stupid object-oriented graph. A data-oriented graph, where we put the entire graph in a single blob of cohesive memory:


Oh, these blobs, how I love them. They are really not that complicated. I just concatenate all the node data into a single memory chunk. The data for each node begins with a node type identifier that lets me switch on the node type and do the appropriate action for each node. (Yes, I'll take that over your virtual calls any day.) Pointers to nodes are stored as offsets within the blob. To follow a pointer, just add the offset to the blob's start address, cast the pointer to something useful and presto!

Yes, it is ok to cast pointers. You can do it. You don't have to feel bad about it. You know that there is a struct ParticleEffectNode at that address. Just cast the pointer and get it over with.

The many nice things about blobs with offsets deserve to be repeated:

  • We can allocate the entire structure with a single memory allocation. Efficient! And we know exactly how much memory it will use.
  • We can make mem-copies of the blob without having to do any pointer patching, because there are no pointers, only offsets.
  • We can even DMA these copies over to a SPU. Everything is still valid.
  • We can write the data to disk and read it back with a single call. No pointer patching or other fixups needed.

In fact, doesn't this whole blob thing look suspicously like a file format? Yes! That is exactly what it is. A file format for memory. And it shouldn't be surprising. The speed difference between RAM and CPU means that memory is the new disk!

If you are unsure about how to do data-oriented design, thinking "file formats for memory" is not a bad place to start.


Note that there is a separate blob for dynamic data. It is used for the nodes' state data, such as which actor collided with us for a collision event (the blue data fields in the flow graph). We build that blob in the same way. When we compile the graph, any node that needs dynamic data reserves a little space in the dynamic blob and stores the offset to that space from the start of the dynamic block.

When we clone a new entity, we allocate a new dynamic data block and memcopy in the template data from the dynamic data block that is stored in the compiled file.

Sharing of dynamic data (the blue links in the graph) is implemented by just letting nodes point to the same offset in the dynamic data blob.

Since we are discussing performance, I should perhaps also say something about multithreading. How do we parallelize the execution of flow graphs?

The short answer: we don't.

The flow graph is a high level system that talks to a lot of other high level systems. A flow graph may trigger an effect, play a sound, start an animation, disable a light, etc, etc. At the same time there isn't really any heavy CPU processing going on in the flow graph itself. In fact it doesn't really do anything other than calling out to other systems. Multithreading this wouldn't gain a lot (since the compute cost is low to begin with) and add significant costs (since all the external calls would have to be synchronized in some way).

Wednesday, September 8, 2010

Code Share: Motion Builder Exporter

Writing exporters is boring. It mostly involves navigating huge unwieldy and poorly documented APIs.

I'd like to spend as little of my time as possible writing exporters, and I'm guessing most other developers feel the same. So, in that spirit I'm sharing the code of the simple Motion Builder exporter I just wrote:

BitSquid Python Motion Builder Exporter

Feel free to cannibalize the code for your own exporter. Hopefully you can save a little time and spend it doing something more interesting than writing exporters.

Wednesday, August 25, 2010

BitSquid's Dual Mode GUIs

The BitSquid engine uses a dual mode GUI system. That is, the GUI system can be run both in retained and immediate mode. For GUIs with lots of static data, retained mode can be used for increased efficiency. For smaller or more dynamic GUIs it is simpler to work in immediate mode.

The retained mode and the immediate mode use the same API and the same implementation with just a simple flag that controls the mode. To see how that is possible, it is easiest to begin by looking at how our GUIs are rendered.

Despite their simplicity, ordinary 2D GUIs can be quite taxing to a renderer. The reason is that they often contain many small individual objects. You can easily have a GUI with 500 little icons, text strings, radar blips, etc. If you render them as individual objects your batch count will go through the roof. The key to efficient GUI rendering is thus to batch together similar objects into larger buffers to get fewer draw calls.

In the BitSquid engine, the GUI batching works like this: When the main thread wants to render a GUI object it generates three pieces of data and sends them to the renderer.

  • An id that uniquely identifies the object.
  • A batch key consisting of (gui layer, material). Objects in the same layer with the same material can be batched together.
  • The vertex data (positions, normals, vertex colors, uv-coordinates) for the object to be rendered.
The renderer finds an existing batch with a matching batch key and appends the vertex data to the vertex buffer of that batch. If no matching batch exists a new batch is created.
When it is time to render, the renderer just renders all its batches with their corresponding data.

The main thread can modify an object by resending the same id with a new batch key and new vertex data. The renderer will delete the old data from the batch buffers and insert the new data. The main thread can also send an id to the renderer and request the object to be deleted. The renderer will delete the object's vertex data from the batch buffers.

To higher level systems, the GUI exposes an interface that looks something like this:

create_text(pos, text, font, color) : id
update_text(id, pos, text, font, color)
destroy_text(id)

create_text() creates a new id, generates the vertex data for the text object and sends it to the renderer. update_text()generates new vertex data and sends it to the renderer to replace the old data. destroy_text() tells the renderer to delete the vertex data corresponding to the object.

What is interesting about this API is that there are no separate move(), set_text(), set_font() and set_color() functions. If you want to change the text object, you have to provide all the necessary data to the update_text() function. This means that update_text() has all the data required to generate the object's data from scratch, so we don't have to retain any information about the objects in the main thread. The only data that is retained anywhere are the batch vertex buffers kept by the renderer. In this way we save memory, reduce the number of functions in the API and make the implementation a lot simpler. It also becomes easy to add new object types to the GUI, you just have to write a function that generates the batch key and the vertex data for the object.

You could argue that the API makes things more complicated for the user, since she now has to supply all the parameters to configure the text even if she just wants to change one of them (the color, for instance).  In my experience, that is usually not a problem. Typically, the user already has all the needed data stored somewhere and can just pass it to the update() function. For instance, the text to be displayed might be stored in a player_name variable. Retaining the data in the GUI would just mean that the data would be stored in two different places and add the burden of keeping them synchronized.

With all this in place, it is easy to see how we can support both retained mode and immediate mode in the same implementation.

In retained mode everything works as described above. The user calls create() to create an object, update() to modify it and destroy() to destroy it.

In immediate mode, only the create() function is used and the renderer is set to clear its batches every frame. Thus, any object drawn with create() will be drawn exactly one frame and then get cleared by the renderer.

Wednesday, August 18, 2010

A new data storage model

In my head, I am toying with an idea of a new data storage model that combines the flexibility and simplicity of JSON with the multi-user friendliness of a traditional database.

The basic idea is as follows:
  • A database is a collection of objects.
  • Each object is identified by a GUID.
  • An object consists of a set of key-value pairs.
  • The keys are always strings.
  • A value can be one of:
    • null (this is the same as the key not existing)
    • true/false
    • a number
    • a string
    • a generic data blob (texture, vertex data, etc)
    • a reference to another object (GUID)
    • a set of references to other objects (GUIDs)
  • A special object with GUID 0000-00000000-0000 acts as the root object of the database.
This simple setup has many nice properties.

It is easy to map objects back and forth between this storage representation and an in-memory representation in C++, C#, Lua, etc.

We can perform referential integrity checks on the GUIDs to easily locate "dangling pointers" or "garbage objects".

We can add new fields to objects and still be "backwards compatible" with old code.

It is easy to write batch scripts that operate on the database. For example, we can lookup the key "textures" in the root object to find all texture objects and then loop over them and examine their "height" and "width" to find any non-power-of-two textures.

All modifications to the data can be represented by a small set of operations:

    create(guid)
    destroy(guid)
    change_key(guid, key, value)
    add_to_set(guid, key, object_guid)
    remove_from_set(guid, key, object_guid)

These operations can also be used to represent a diff between two different versions of the database.

A user of the database can have a list of such operations that represents her local changes to the data (for testing purposes, etc). She can then commit all or some of these local changes to the central database. The database can thus be used in both online and offline mode. Versioning and branching systems can be built on this without too much effort.

Merge conflicts are eliminated in this system. The only possible conflict is when two users have changed the same key of the same object to two different values. In that case we resolve the conflict by letting the later change overwrite the value of the earlier one.

Note that this model only supports sets, not arrays. The reason is that array reordering operations are tricky to merge and in most cases the order of objects does not matter. In the few cases where order really does matter, you can use a key in the objects to specify the sort order.

Thursday, June 3, 2010

Avoiding Content Locks and Conflicts -- 3-way Json Merge

Locking content files in a CVS is annoying, doesn't scale well and prevents multiple people from working on different parts of the same level (unless you split the level in many small files which have to be locked individually -- which is even more annoying).

But having content conflicts is no fun either. A level designer wants to work in the level editor, not manage strange content conflicts in barely understandable XML-files. The level designer should never have to mess with WinMerging the engine's file formats.

And conflicts shouldn't be necessary. Most content conflicts are not actual conflicts. It is not that often that two people have moved the exact same object or changed the exact same settings parameter. Rather, the conflicts occur because a line-based merge tool tries to merge hierarchical data (XML or JSON) and messes up the structure.

In those rare cases when there is an actual conflict, the content people don't want to resolve it in WinMerge. If two level designers have moved the same object, we don't really help them address the issue by bringing up a dialog box with a ton of XML mumbo-jumbo. Instead, it is much better to just pick one of the two locations and go ahead with merging the file. Then, the level designers can fix any problems that might have occurred in the level editor -- the right tool for the job.

At BitSquid we use JSON for all our content files (actually, a slightly simplified version of JSON that we call SJSON). So to get rid of our conflict issues, I have written a 3-way merger that understands the structure of JSON files and resolves any remaining actual conflicts by always picking the right-hand branch.

If we disregard arrays for the moment, merging JSON files is quite simple. A diff between two JSON files can be expressed as a list of object[key] = value operations. Deleting a key is represented by changing its value to null. Adding a key is represented by changing a null value to something else. Merging these operations is simple. We only have trouble when the same key in the same object is changed to two different values, but then we just pick one of the values, as explained above.

Arrays are trickier because without context, it is impossible to tell what a change to an array means semantically. If the array [1, 2, 3] is changed to [1, 2, 4] is that a single operation that changed the last value from 3 to 4. Or is it two operations, deleting the 3 from the array and inserting 4. How we interpret it will affect the result of our 3-way merges. For example, the 3-way merge of [1, 2, 3], [1, 2, 4] and [1, 2, 5] can give either the result [1, 2, 5] or [1, 2, 4, 5].

I have resolved this by adding extra information to the arrays in our source files. Most of our arrays are arrays of objects. For such arrays, I require that the objects have an "id"-field with a GUID that uniquely identifies the object. With such an id in our array [ {x = 1, id = a}, {x = 2, id = b}, {x = 3, id = c} ] it becomes possible to distinguish between updating an existing value  [ {x = 1, id = a}, {x = 2, id = b}, {x = 4, id = c} ] and removing + adding a value [ {x = 1, id = a}, {x = 2, id = b}, {x = 4, id = d} ].

The 3-way merge algorithm I'm using applies some heuristics to guess array transformations even when no id-field is present, but the recommendation is to always add id-fields to array elements to get perfect merges.

You can download my 3-way Json merger here. I just wrote it today, so it haven't received much testing yet. But it is public domain software, so free free to fix the bugs and do whatever else you like with it.

Friday, April 23, 2010

Our Tool Architecture

The BitSquid tool architecture is based on two main design principles:
  • Tools should use the "real" engine for visualization.
  • Tools should not be directly linked or otherwise strongly coupled to the engine.
Using the real engine for visualization means that everything will look and behave exactly the same in the tools as it does in-game. It also saves us the work of having to write a completely separate "tool visualizer" as well as the nightmare of trying to keep it in sync with changes to the engine.

By decoupling the tools from the engine we achieve freedom and flexibility, both in the design of the tools and in the design of the engine. The tools can be written in any language (C#, Ruby, Java, Lisp, Lua, Python, C++, etc), using any methodology and design philosophy. The engine can be optimized and the runtime data formats changed without affecting the tools.

What we envision is a Unix-like environment with a plethora of special purpose tools (particle editor, animation editor, level editor, material editor, profiler, lua debugger, etc) rather than a single monolithic Mega-Editor. We want it to be easy for our licensees to supplement our standard tool set with their own in-house tools, custom written to fit the requirements of their particular games. For example, a top-down 2D game may have a custom written tile editor. Another programmer may want to hack together a simple batch script that drops a MIP-step from all vegetation textures.

At first glance, our two design goals may appear conflicting. How can we make our tools use the engine for all visualization without strongly coupling the tools to the engine? Our solution is shown in the image below:


Note that there is no direct linkage between the tool and the engine. The tool only talks to the engine through the network. All messages on the network connection are simple JSON structs, such as:

{
    "type" : "message",
    "level" : "info",
    "system" : "D3DRenderDevice",
    "message" : "Resizing swap chain: 1626 1051"
}

This applies for all tools. When the lua debugger wants to set a breakpoint, it sends a message to the engine with the lua file and line number. When the breakpoint is hit, the engine sends a message back. (So you can easily swap in your own lua debugger integrated with your favorite editor, by simply receiving and sending these messages.) When the engine has gathered a bunch of profiling data, it sends a profiler message. Et cetera.

For visualization, the tool creates a window where it wants the engine to render and sends the window handle to the engine. The engine then creates a swap chain for that window and renders into it.

(In the future we may also add support for a VNC-like mode where we instead let the engine send the content of the frame buffer over the network. This would allow the tools to work directly against consoles, letting the artists see, directly in their editors, how everything will look on the lead platform.)

A tool typically boots the engine in a special mode where it runs a custom lua script designed to collaborate with that particular tool. For example, the particle editor boots the engine with particle_editor_slave.lua which sets up a default scene for viewing particle effects with a camera, skydome, lights, etc. The tool then sends script commands over the network connection that tells the engine what to do, for example to display a particular effect:

{
    type = "script",
    script = "ParticleEditorSlave:test_effect('fx/grenade/explosion')"
}

These commands are handled by the slave script. The slave script can also send messages back if the tool is requesting information.

The slave scripts are usually quite simple. The particle editor slave script is just 120 lines of lua code.

To make the tools independent of the engine data formats we have separated the data into human-readable, extensible and backwards compatible generic data and fast, efficient, platform specific runtime data. The tools always work with the generic data, which is pretty much all in JSON (exceptions are textures and WAVs). Thus, they never need to care about how the engine represents its runtime data and the engine is free to change and optimize the runtime format however it likes.

When the tool has changed some data and wants to see the change in-engine, it launches the data compiler to generate the runtime data. (The data compiler is in fact just the regular Win32 engine started with a -compile flag, so the engine and the data compiler are always in sync. Any change of the runtime formats triggers a recompile.) The data compiler is clever about just compiling the data that has actually changed.

When the compile is done, the tool sends a network message to the engine, telling it to reload the changed data file at which point you will see the changes in-game. All this happens nearly instantaneously allowing very quick tweaking of content and gameplay (by reloading lua files).

This system has worked out really well for us. The decoupling has allowed for fast development of both the tools and the engine. Today we have about ten different tools that use this system and we have been able to make many optimizations to the engine and the runtime formats without affecting the tools or the generic data.