bitsquid: development blog

Sunday, May 20, 2012

Playing (with) Video

So you want to play some video? Shouldn't be too hard, right? Just download some video playing library and call the play_video() function. Easy-peasy-lemon-squeezy.

Well, you have to make sure that the video is encoded correctly, that the library works on all platforms and plays nice with your memory, file, sound and streaming abstractions, and that the audio and video doesn't desynchronize, which for some inexplicable reason seems to be a huge problem.

But this is just technical stuff. We can deal with that. What is worse is that video playback is also a legal morass.

There are literally thousands of broad patents covering different aspects of video decompression. If you want to do some video coding experiments of your own you will have to read, understand and memorize all these patents so that you can carefully tip-toe your code and algorithms around them.

Of course, if you had a big enough pool of patents of your own you might not have to care as much, since if someone sued you, you could sue them right back with something from your own stockpile. Mutually assured destruction through lawyers. Ah, the wonderful world of software patents.

So, creating your own solution is pretty much out of the question. You have to pick one of the existing alternatives and do the best you can with it. In this article I'm going to look at some different options and discuss the advantages and drawbacks of each one:

Just say no
Bink
Platform specific
H.264
WebM

There are other alternatives that didn't make it to this list, such as Dirac, Theora, and DivX. I've decided to focus on these five, since in my view H.264 is the best of the commercial formats and WebM the most promising of the "free" ones.

An initial idea might be: Why not just do whatever it is VLC does? Everybody's favorite video player plays pretty much whatever you throw at it and is open source software.

Unfortunately that doesn't work, for two reasons. First, VLC:s code is a mix of GPL and LGPL stuff. Even if you just use the LGPL parts you will run into trouble on platforms that don't support dynamic linking. Second, the VLC team doesn't really care about patents and just infringe away. You can probably not afford to do the same. (As a result, there is a very real threat that VLC might be sued out of existence.)

A quick introduction

Before we start looking at the alternatives I want to say something short about what a video file is, since there is some confusion in the matter, even among educated people.

A video file has three main parts:

Video data (H.264, DivX, Theora, VP8, ...)
Audio data (MP3, AAC, Vorbis, ...)
A container format (Avi, Mkv, MP4, Ogg, ...)

The container format is just a way of packing together the audio and video data in a single file, together with some additional information.

The simplest possible container format would be to just concatenate the audio data to the video data and be done with it. But typically we want more functionality. We want to be able to stream the content, i. e. start playing it before we have downloaded the whole file, which means that audio and video data must be multiplexed. We also want to be able to quickly seek to specific time codes, so we may need an index for that. We might also want things like audio tracks in different languages, subtitling, commentary, DVD menus, etc. Container formats can become quite intricate once you start to add all this stuff.

A common source of confusion is that the extension of a video file (.avi, .mkv, .mp4, .ogg) only tells you the container format, not the codecs used for the audio and video data in the container. So a video player may fail to play a file even though it understands the container format (because it doesn't understand what's inside it).

Option 1: Just say no

Who says there has to be video in a game? The alternative is to do all cut scenes, splash screens, logos, etc in-game and use the regular renderer for everything. As technology advances and real-time visuals come closer and closer in quality to offline renders, this becomes an increasingly attractive option. It also has a number of advantages:

You can re-use the in-game content.
Production is simpler. If you change something you don't have to re-render the entire movie.
You don't have to decide on resolution and framerate, everything is rendered at the user's settings.
You can dynamically adapt the content, for example dress the players in their customized gear.
Having everything be "in-game visuals" is good marketing.

If I was making a game I would do everything in-game. But I'm not, I'm making an engine. And I can't really tell my customers what they can and cannot do. The fact is that there are a number of legitimate reasons for using video:

Some scenes are too complex to be rendered in-game.
Producing videos can be simpler than making in-game content, since it is easier to outsource. Anybody can make a video, but only the core team can make in-game content and they may not have much time left on their hands.
Playing a video while streaming in content can be used to hide loading times. An in-game scene could be used in the same way, but a high-fidelity in-game scene might require too much memory, not leaving enough for the content that is streaming in.

As engine developers it seems we should at least provide some way of playing video, even if we recommend to our customers to do their cutscenes in-game.

Option 2: Bink

Bink from RAD game tools is as close as you can get to a de facto standard in the games industry, being used in more than 5800 games on 14 different platforms.

The main drawback of Bink is the pricing. At $ 8500 per platform per game it is not exactly expensive, but for a smaller game targeting multiple platforms that is still a noticeable sum.

Many games have quite modest video needs. Perhaps they will just use the video player for a 30 second splash screen at the start of the game and nothing more. Paying $ 34 000 to get that on four platforms seems excessive.

At Bitsquid our goal has always been to develop an engine that works for both big budget and small budget titles. This means that all the essential functionality of an engine (animation, sound, gui, video, etc) should be available to the licensees without any additional licensing costs (above what they are already paying for an engine). Licensees who have special interest in one particular area may very well choose to integrate a special middleware package to fulfill their needs, but we don't want to force everybody to do that.

So, in terms of video, this means that we want to include a basic video player without the $ 8500 price tag of Bink. That video player may not be as performant as Bink in terms of memory and processor use, but it should work well enough for anyone who just wants to play a full screen cutscene or splash screen when the CPU isn't doing much else. People who want to play a lot of video in CPU taxing situations can still choose to integrate Bink. For them, the price and effort will be worth it.

Option 3: Platform specific

One approach to video playing is to not develop a platform-independent library but instead use the video playing capabilities inherent in each platform. For example, Windows has Windows Media Foundation, MacOS has QuickTime, etc.

Using the platform's own library has several advantages. It is free to use, even for proprietary formats, because the platform manufacturers have already payed the license fees for the codecs. (Note though, that for some formats you need a license not just for the player, but for the distribution of content as well.) The implementation is already there, even if the APIs are not the easiest to use.

The biggest advantage is that on low-end platforms, using the built-in platform libraries can give you access to special video decoding hardware. For example, many phones have built-in H.264 decoding hardware. This means you can play video nearly for free, something that otherwise would be very costly on a low-end CPU.

But going platform specific also has a lot of drawbacks. If you target many platforms you have your work cut out for you in integrating all their different video playing backends. It adds an additional chunk of work that you need to do whenever you want to add a new platform. Furthermore, it may be tricky to support the same capabilities on all different platforms. Do they all support the same codecs, or do you have to encode the videos specifically for each platform? Do all platforms support "play to texture" or can you only play the videos full screen? What about the sound? Can you extract that from the video and position it as a regular source that reverbs through your 3D sound world? Some platforms (i.e. Vista) have almost no codecs installed by default, forcing you to distribute codecs together with your content.

Since we are developing a generic engine we want to cover as many platforms as possible and minimize the effort required to move a project from one platform to another. For that reason, we need a platform independent library as the primary implementation. But we might want to complement it with platform specific libraries for low end platforms that have built-in decoding hardware.

Option 4: H.264 (MPEG-4, AVC)

Over the last few years H.264 has emerged as the most popular commercial codec. It is used in Blu-ray players, video cameras, on iTunes, YouTube, etc. If you want a codec with good tool support and high quality, H.264 is the best choice.

However, H.264 is covered by patents. Patents that need to be licensed if you want to use H.264 without risking a lawsuit.

The H.264 patents are managed by an entity known as MPEG LA. They have gathered all the patents that they believe pertain to H.264 in "patent pool" that you can license all at once, with a single agreement. That patent pool contains 1700 patents. Yes, you read that right. The act of encoding/decoding a H.264 file is covered by 1700 patents. You can find the list in all its 97 page glory at http://www.mpegla.com/main/programs/avc/Documents/avc-att1.pdf.

I am not a lawyer, as they say on Slashdot, but this is my best understanding of how this patent game works:

Buying a license from MPEG LA gives you the right to use the 1700 patents in the pool.
This doesn't mean you can't be sued for patent infringement. Anyone that holds a patent which is not one of the 1700 in the pool could claim that H.264 infringes on it and sue you. That seems unlikely, MPEG LA has made an effort to gather all relevant patents, but there is no way to be certain.
MPEG LA doesn't by itself go after people who use H.264 without a license, that is up to the holders of the 1700 patents in the pool.

The licensing terms of H.264 are irritating, but not necessarily a big financial burden:

If you distribute an encoder or decoder you can distribute 100 000 copies for free, then you have to pay $ 0.20 per unit.
If you distribute a H.264 encoded movie, it is free if it is shorter than 12 minutes, then you have to pay $ 0.02 per copy.

Note that unlike the case with other popular codecs such as MP3, it is not just the decoder/encoder that you need to license, you also need a license just for distributing H.264 content.

From what I've been able to discern, but don't take my word for it, a game that only plays a fixed set of movies/cutscenes would not be regarded as a general decoder (even though it contains decoding software), but rather as content, which means you would pay $0.02 per copy sold if you had more than 12 minutes of video in your game and nothing otherwise (you would still need to obtain a license though).

Of course, if you support H.264, you may also want to support AAC, the standard audio format that accompanies it. AAC is covered by a separate licensing body (Via Licensing) that has its own licensing terms. I haven't investigated them in any great detail.

You have to decide for yourself how well these terms sit with you. At Bitsquid we finally decided that if we should have a standard video playing facility, it should be one that people could use without thinking too much about patents and licensing (to the extent that is possible).

Option 5: VP8 (WebM)

VP8 is a "free" video codec owned by Google. It is covered by patents, but Google has granted free use of those patents and also provides a BSD licensed library libvpx for encoding and decoding video files. The format is also endorsed by the Free Software Foundation.

It is generally acknowledged that when it comes to quality, VP8 is not quite as good as H.264, though the differences are not enormous. So you are trading some visual quality for the convenience of a license free format.

There has been some discussion (most of it about a year ago) about whether VP8 really is unencumbered by patents. MPEG LA claimed that it knew several patents that VP8 was infringing and showed interest in creating a "patent pool" for VP8 similar to the one it holds for H.264. Nothing has come of that yet and MPEG LA has not disclosed which patents it thinks VP8 is infringing, which means that Google cannot really respond to the allegations. It is hard to know how much stock to put in this and whether anything will come of it.

You could argue that with this potential "threat" against VP8, it would be better to use another "free" alternative such as Dirac or Theora. However, there is not much evidence that they would fare better. Everyone who makes a "free" codec tries their best to make sure that they don't infringe on any patents. But with thousands of these patents around, each open to legal interpretation, there is just no way to be sure.

This is just the sad state of affairs of software patents. And you are not safe with the commercial formats either. Even if you have licensed the 1700 patents in the MPEG LA patent pool, someone can still sue you for violating patent number 1701. No one in this business offers indemnification against patent suits. Not Google. Not MPEG LA. Not Bink (I think).

It all becomes a question of risk. Bink has been around a long time without being sued, which is reassuring. VP8 hasn't been around that long.

Will there be patent claims made against VP8? Maybe. Who knows. Similar threats were made against Theora, but nothing happened there. If it does happen, Google will most likely fight back and the whole thing will drag on in the courts. Will there be patent claims against you for using VP8? Seems extremely unlikely. Games are not interesting enough targets. Video decoding is not our main business, and we can easily switch technology if needed. Phone manufacturers, YouTube and TV companies are more interesting targets.

Do you have to care about any of this at all? Up to you to decide.

Our conclusion

None of these alternatives are really attractive, it is more about picking the least worst than finding the best, which is frustrating for a perfectionist with self-worth issues. Good cases can be made for all of them. This is what we have decided:

We will provide VP8 decoding on all platforms through libvpx. All other things equal, the "most free" format will give us most flexibility in the long run.
We will not (at least not right now) support matroska or other advanced container formats. Instead, we will play the videos from simple IVF streams. Sound will be played as Vorbis files through our regular sound system so we get positioning, reverb, etc.
If needed, we will complement this basic approach with platform-specific libraries that take advantage of the hardware decoders on low-end platforms (phones and tablets).
Customers that need to play a lot of movies while doing other CPU intense tasks and that aren't happy with the performance of libvpx are recommended to look into Bink.
Customers that are worried about "patent risk" with VP8 are recommended to do whatever their lawyers tell them to do. (Use Bink, a platform specific library, obtain a H.264 license or avoid video all together.)

Saturday, May 5, 2012

Embracing Dynamism

Are you stuck in static thinking? Do you see your program as a fixed collection of classes and functions with unchanging behavior.

While that view is mostly true for old school languages such as C++ and Java, the game is different for dynamic languages: Lua, JavaScript, Python, etc. That can be easy to forget if you spend most of your time in the static world, so in this article I'm going to show some of the tricks you can apply when everything is fluid and malleable.

At Bitsquid our dynamic language of choice is Lua. Lua has the advantage of being fast, fully dynamic, small, simple and having a traditional (i.e. non-LISP-y) syntax. We use Lua for most gameplay code and it interfaces with the engine through an API with exposed C functions, such as World.render() or Unit.set_position().

I will use Lua in all the examples below, but the techniques can be used in most dynamic languages.

1. Read-eval-print-loop

Dynamic languages can compile and execute code at runtime. In Lua, it is as simple as:

loadstring("print(10*10)")()

This can be used to implement a command console where you can type Lua code and directly execute it in the running game. This can be an invaluable debugging and tuning tool. For example if you need to debug some problem with the bazooka:

World.spawn_unit("bazooka", Unit.position(player))

Or tune the player's run speed:

Unit.set_data(player, "run_speed", 4.3)

2. Reload code

The console can be used for more than giving commands, you can also use it to redefine functions. If the gameplay code defines a scoring rule for kills:

function Player.register_kill(self, enemy)
 self.score = self.score + 10
end

you can use the console to redefine the function and change the rules:

function Player.register_kill(self, enemy)
 if enemy.type == "boss" then
  self.score = self.score + 100
 else
  self.score = self.score + 10
 end
end

Executing this code will replace the existing Player.register_kill function with the new one. All code that previously called the old function will now call the new one and the new scoring rules will apply immediately.

If you take some care with how you use the global namespace you can write your Lua code so that all of it is reloadable using this technique. Then the gameplay programmer can just edit the Lua files on disk and press a key to reload them in-game. The game will continue to run with the new gameplay code, without any need for a reboot. Pretty cool.

You can even get this to work for script errors. If there is an error in the Lua code, don't crash the game, just freeze it and allow the gameplay programmer to fix the error, reload the code and continue running.

3. Override system functions

The functions in the engine API don't have any special privileges, they can be redefined just as other Lua functions. This can be used to add custom functionality or for debugging purposes.

Say, for example, that you have some units that are mysteriously popping up all over the level. You know they are being spawned somewhere in the gameplay code, but you can't find where. One solution would be to override the World.spawn_unit function and print a stack trace whenever the offending unit is spawned:

old_spawn_unit = World.spawn_unit
function World.spawn_unit(type, position)
 if type == "tribble" then
  print "Tribble spawned by:"
  print_stack_trace()
 end
 old_spawn_unit(type, position)
end

Now, whenever a tribble is spawned by the script, a call stack will be printed and we can easily find who is doing the spawning.

Note that before we replace World.spawn_unit, we save the original function in the variable old_spawn_unit. This enables us to call old_spawn_unit() to do the actual spawning.

This technique could also be used to find all (potentially expensive) raycasts being done by the script.

4. Handle deprecated functions

Sometimes we need to deprecate functions in the engine API. It can be annoying to the people using the engine of course, but backwards compatibilty is the mother of stagnation. If you never throw away old code, you will eventually have a huge ugly code mess on your hands.

Luckily, since the script can create functions in the engine namespace, the script can provide the backwards compatibility when needed.

For example, we used to have a function PhysicsWorld.clear_kinematic(world, actor). That naming was inconsistent with some of our other functions so we changed it to Actor.set_kinematic(actor, false).

One way of dealing with this change would be to go through all the code in the project, find all uses of PhysicsWorld.clear_kinematic and change them to use Actor.set_kinematic instead. But another way would be to just implement PhysicsWorld.clear_kinematic in the script:

function PhysicsWorld.clear_kinematic(world, actor)
 Actor.set_kinematic(actor, false)
end

Now the rest of the code can go on using PhysicsWorld.clear_kinematic without even caring that the function has been removed from the engine API. You could even use a combination of the two strategies -- implementing the deprecated function in Lua for a quick fix, and then looking into removing the uses of it.

5. Dynamically inserting profiling

Top-down profiling with explicit profiler scopes is a good way of finding out where a game is spending most of its time. However, to be useful, explicit profiler scopes need to be inserted in all the "right" places (all potentially expensive functions).

In C we need to guess where these right places are before compiling the program. In Lua, we can just insert the profiler scopes dynamically. We can even create a function that adds profiling to any function we want:

function profile(class_name, method_name)
 local f = _G[class_name][method_name]
 _G[class_name][method_name] = function (...)
  Profiler.start(class_name .. "." .. method_name)
  f(...)
  Profiler.stop()
 end
end

When we call this function as profile('Player', 'update') it will first save the existing Player.update function and then replace it with a function that calls Profiler.start("Player.update") before calling the original function and Profiler.stop() before returning.

Using this techinque, we can dynamically add profiling to any function we want during our optimization session.

6. Tab completion

If you implement an interactive Lua console, it is nice to support tab completion, so the user doesn't have to remember all function names. But how do you build the list of callable functions to use with tab completion?

Using Lua of course! Just find all tables (i.e., classes) in the global namespace and all functions stored in those tables:

t = {}

for class_name,class in pairs(_G) do
 if type(class) == 'table' then
  for function_name,function in pairs(class) do
   if type(function) == 'function' then
    t[#t+1] = class_name .. '.' .. function_name
   end
  end
 end
end

After running this, t will contain the full list of function names.

7. Looping through all objects

By recursing through _G you can enumerate all reachable objects in the Lua runtime.

function enumerate(f)
 local seen = {}
 local recurse = function(t)
  if type(t) ~= 'table' then return end
  if seen[t] == true then return end
  f(t)
  seen[t] = true
  recurse(getmetatable(t))
  for k,v in pairs(t) do
   recurse(k)
   recurse(v)
  end
 end
 recurse(_G)
end

Calling enumerate(f) will call f(o) on all objects o in the runtime. (Assuming they are reachable from _G. Potentially, there could also be objects only reachable through Lua references held in C.)

Such an enumeration could be used for many things. For example, you could use it to print the health of every object in the game.

function print_health(o)
 if o.health then print(o.health) end
end
enumerate(print_health)

The technique could also be used for memory optimizations. You could loop through all Lua objects and find the memory used by each object type. Then you could focus your optimization efforts on the resource hogs.

Friday, April 20, 2012

Inheriting Velocity in Ragdolls

After a slew of abstract articles about C++ and code structuring I'd like to get back to some more meaty game engine stuff. So today I'll talk about ragdolls. In particular, how to preserve the momentum of animated objects, so that when you switch over to the ragdoll it continues to stumble forward in the same direction that the animation was moving, before crashing to a gruesome death.

So this is a small, but important problem. We want to somehow get the velocities of the animated objects and then apply them to the bodies in the ragdoll. The only snag is that animated objects typically don't know anything about velocities. Also, we need some way of matching up the physics bodies with the animated objects.

First, some background information. In the Bitsquid engine, physics, scene graph and animation are completely separate systems. We strongly believe in minimizing the couplings between different systems since that makes the engine easier to understand, reason about, modify, optimize and rewrite.

The physics system simulates a number of bodies, possibly connected by joints.
The scene graph handles local-to-world transforms for a collection of nodes in a hierarchy.
The animation system evaluates and blends animation curves for bones.

Bones and bodies hold references (just integer indices, really) to nodes in the scene graph and this how the systems communicate. After the animation has been evaluated, the resulting local transforms are written to the bones' nodes in the scene graph.

For keyframed physics (animated hit bodies), the animation drives the physics, which means the physics' bodies will read their world transforms from the corresponding nodes in the scene graph. For ragdolled physics, the world transforms of the bodies are written to the scene graph after the simulation has completed.

For partial ragdolls (such as a non-functioning, but still attached limb) or powered ragdolls (ragdolls driven by motors to achieve animation poses) it gets a little more involved (perhaps a topic for a future post), but the basic setup is the same.

Given this setup there are two ways of calculating the animation velocities:

We can calculate the velocities directly by differentiating the animation curves.
We can record a node's transform at two different time steps and compute the velocity from the difference.

The first approach is doable, but not very practical. Not only do we have to differentiate all the animation curves, we must also take into account how those velocities are affected by the blend tree and local-to-world transforms. And even if we do all that, we still don't account for movements from other sources than animation, such as scripted movements, IK or interactions with the character controller.

The second option is the more reasonable one. Now all we need is a way of obtaining the transforms from two different time steps. There are a number of possible options:

We could add an array of Matrix4x4:s to our scene graph's last_world where we store the last world transform of every object. So whenever we want to go to ragdoll we always have a last_world transform to calculate velocities from.
We could simulate the character backwards in time when we want to go to ragdoll and obtain a last_world transform that way.
We could delay the transition to ragdoll one frame, so that we have enough time to gather two world transforms for computing the velocity.

The first approach is conceptually simple, but costly. We are increasing the size of all our scene graphs by about 50 % (previously they contained local and world transforms, now they will also need last_world). In addition we must memcpy(last_world, world) before we compute new world transforms. That's a significant cost to pay all the time for something that happens very seldom (transition to ragdoll).

The second appraoch sounds a bit crazy, but some games actually already have this functionality. Servers in competetive multi-player fps games often need to rewind players in time in order to accurately determine if they were able to hit each other. Still, I find the approach to be a bit too complicated and invovled just to get a velocity.

The third aproach seems simple and cheap, but it violates one of our Bitsquid principles: Thou Shalt Not Have Any Frame Delays. Delaying something a frame can be a quick fix to many hairy problems, but it puts your game in a very weird transitional state where it at the same time both is and isn't (yet) something. The character isn't really a ragdoll yet, but it will be the next frame, whether I want to or not.

This new slightly self-contradictory state invites a host of bugs and before you know it, little logic pieces will start to seep into the code base "do this unless you are in the special transition-to-ragdoll state". Congratulations, you have just made your codebase a lot more complicated and bug prone.

If this is not enough, consider the poor sucker who just wants to write a routine that does A, B, C and D, when A, B and C requires frame delays. Suddenly what was supposed to be simple function got turned into a state machine that needs to run for four frames to produce it result.

The simple rule that actions should take place immediately protects against such insanity.

So three options, none of them especially palpable.

I actually went with the first one, to always compute and store last_world in the scene graph, but with a flag so that this is only used for units that actually need it (characters that can go to ragdoll). When there is no clear winner, I always pick the simplest solution, because it is a lot easier to optimize later if the need should arise. (We could for example track last_world only for the nodes which have a corresponding ragdoll actor. Also we could store last_world as (p,q) instead of as a matrix.)

For completion, given the two transforms, the code for compting the velocities will look something like this:

Vector3 p0 = translation(tm_0);
Vector3 p1 = translation(tm_1);
Vector3 velocity = (p1 - p0) / dt

Quaternion q0 = rotation(tm_0);
Quaternion q1 = rotation(tm_1);
Quaternion q = q1 * inverse(q0);
AxisAngle aa = q.decompose();
Vector3 angular_velocity = aa.axis * aa.angle / dt;

Wednesday, March 21, 2012

PIMPL vs Pure Virtual Interfaces

In C++, separating the interface (public declarations) of a class from its implementation (private methods, members and definitions) serves several useful purposes:

Implementation details are hidden, making interfaces easier to read and understand.
Smaller header files with fewer dependencies means faster compile times.
A weaker coupling between the interface and the implementation gives greater freedom in reorganizing and refactoring the implementation internals.

In pure C, we can achieve this separation by using a pointer to a forward declared struct:

struct SoundWorld;
typedef unsigned SoundInstanceId;

SoundWorld *make_sound_world();
void destroy_sound_world(SoundWorld *world);
SoundInstanceId play(SoundWorld *world, SoundResource *sound);
void stop(SoundWorld *world, SoundInstanceId id);

The struct is opaque to the users of the API. The actual content is defined in the .cpp file:

struct SoundWorld {
    SoundInstance playing_instances[MAX_PLAYING_INSTANCES];
    Matrix4x4 listener_pose;
    ...
};

C++ programmers are often recommended to use the PIMPL idiom (pointer to implementation) to achieve the same thing:

class SoundWorldImplementation;

class SoundWorld
{
public:
    typedef unsigned InstanceId;

    SoundWorld();
    ~SoundWorld();

    InstanceId play(SoundResource *sound);
    void stop(InstanceId id);

private:
    SoundWorldImplementation *_impl;
};

Here, SoundWorld is the external interface of the class. All the messy stuff: instance variables, private methods, etc is found in the SoundWorldImplementation class, which is in the .cpp file.

The _impl pointer is created in the constructor and calls to the methods in SoundWorld are forwarded to the implementation object via method stubs:

SoundWorld::SoundWorld()
{
    _impl = new SoundWorldImplementation();
}

InstanceId SoundWorld::play(SoundResource *sound)
{
    return _impl->play(sound);
}

Another solution to the same problem is to write the interface as an abstract, pure virtual class in the .h file and then create the implementation as a subclass in the .cpp file.

You don't see this solution recommended as much (at least not as a solution to this particular problem), but I actually like it better. With this approach, the header file will look something like this:

class SoundWorld
{
public:
    typedef unsigned InstanceId;

    virtual ~SoundWorld() {}
    virtual InstanceId play(SoundResource *sound) = 0;
    virtual void stop(InstanceId id) = 0;

    static SoundWorld *make(Allocator &a);
    static void destroy(Allocator &a, SoundWorld *sw);
};

Note that since the class is now abstract, we cannot create actual instances of it, to do that we need the factory functions make() and destroy(). I've added an allocator parameter for good measure, because I always want to specify explicit allocators for all memory operations.

The corresponding .cpp file looks something like:

class SoundWorldImplementation : public SoundWorld
{
    friend class SoundWorld;

    SoundInstance _playing_instances[MAX_PLAYING_INSTANCES];
    Matrix4x4 _listener_pose;

    SoundWorldImplementation()
    {
        ...
    }

    virtual InstanceId play(SoundResource *sound) 
    {
        ...
    }

    virtual void stop(InstanceId) 
    {
        ...
    }
};

SoundWorld *SoundWorld::make(Allocator &a)
{
    return a.make<SoundWorldImplementation>();
}

SoundWorld *SoundWorld::destroy(Allocator &a, SoundWorld *sw)
{
    return a.destroy<SoundWorldImplementation>(sw);
}

The reason why most people recommend the PIMPL approach is that it has some distinct advantages:

Factory functions are not needed, you can use new(), delete() or create objects on the stack.
The SoundWorld class can be subclassed.
The interface methods are not virtual, so calling them might be faster. (On the other hand, we need an extra memory fetch to get to the implementation object.)
PIMPL can be introduced in an existing class without changing its external interface or its relation to other classes.

For my use cases, none of these advantages matter that much. Since I want to supply my own allocators, I'm not interested in new and delete. And I only use this for "big" objects, that are always heap (rather than stack) allocated.

I don't make much use of implementation inheritance. In my opinion, it is almost always a bad design decision that leads to strongly coupled code and hard to follow code paths. Inheritance should be limited to interface inheritance.

The performance issue of virtual calls is not a huge issue, since I only use this for "big" objects (Systems and Managers). Also, I design the API so that the number of API calls is minimized. I.e., instead of a function:

void set_sound_position(InstanceId id, const Vector3 &pos);

I have:

void set_sound_positions(unsigned count, const InstanceId *ids, const Vector3 *positions);

This reduces the virtual call overhead, but also has additional benefits, such as being DMA friendly and allowing for parallelization and batch optimizations.

In the words of Mike Acton: Where there's one, there's more than one.

The abstract class method has some advantages of its own:

Cleaner code and a lot less typing, since we don't have to write forwarding stubs for the methods in the public interface.
Multiple classes can implement the same interface. We can statically or dynamically select which particular implementation we want to use, which gives us more flexibility.

To me, not having to write a ton of stupid boilerplate cruft is actually kind of a big deal. I know some people don't mind boilerplate. It's just a little extra typing, they say. Since there is nothing complicated or difficult in the boilerplate code, it doesn't pose a problem. Programmers are not limited by typing speed, so how much you have to type doesn't matter.

I don't agree at all. In my view, every line of code is a burden. It comes with a cost that you pay again and again as you write, read, debug, optimize, improve, extend and refactor your code. For me, the main benefit of "higher-level" languages is that they let me do more with less code. So I'm happy to pay the overhead of a virtual call if it saves me from having 150 lines of idiotic boilerplate.

A nice thing about the interface and implementation separation is that it gets rid of another piece of hateful C++ boilerplate: method declarations (hands up everybody who enjoys keeping their .h and .cpp files synchronized).

Methods defined inside a C++ class do not have to be declared and can be written in any order. So if we want to add helper methods to our implementation class, that are not part of the public interface, we can just write them anywhere in the class:

class SoundWorldImplementation : public SoundWorld
{
    virtual InstanceId play(SoundResource *resource) {
        InstanceId id = allocate_id();
        ...
    }

    // A private method - no declaration necessary.
    InstanceId allocate_id() {
        ...
    }
};

It's interesting that this small, purely syntactical change -- getting rid of method declarations -- makes a significant different in how the language "feels". At least to me.

With this approach, adding a helper method feels like "less work" and so I'm more inclined to do it. This favors better structured code that is decomposed into a larger number of functions. More like Smalltalk than traditional C (home of the mega-method). The Sapir-Worf hypothesis appears to hold some merit, at least in the realm of programming languages.

Another interesting thing to note is that the pure C implementation of opaque pointers stacks up pretty well against the C++ variants. It is simple, terse and fast (no virtual calls, no forwarding functions).

Every year I'm a little more impressed by C and a little more depressed by C++.

Monday, March 12, 2012

Rendering related talks @ GDC2012

Slides from my two GDC 2012 talks are now online:

Practical Particle Lighting (PDF)
Flexible Rendering for Multiple Platforms (PDF)

If you have questions or comments don't hesitate to contact me.

Cutting the Pipe: Achieving Sub-Second Iteration Times

Slides from my GDC 2012 talk:

Keynote
PDF

Sunday, March 4, 2012

Caring by Sharing: The Bitsquid Documentation System

In a previous article I talked a bit about our documentation system. It has now solidified into something interesting enough to be worth sharing.

The system consists of a collection of Ruby files that read input files (with extension .bsdoc) written in a simple markup language:

# Header

Some text.

* And
* A list

and converts them to HTML:

<h1>Header</h1>

<p>Some text.</p>

<ul>
 <li><p>And</p></li>
 <li><p>A list</p></li>
</ul>

We then use the HTML Help Compiler to convert the help files to .chm.

You can find the repository at:

https://bitbucket.org/bitsquid

Motivation

Why have we created our own markup system instead of just using an existing one? (Markdown, Textile, RDoc, POD, Restructured Text, Doxygen, BBDoc, Wikimedia, Docbook, etc.)

For two reasons. First, none of these existing systems work exactly the way that we want.

An example. A large part of our documentation consists of Lua interface documentation. To make that as easy to possible to write, we use a special tag @api to enter an API documentation mode. In that mode, each unindented line documents a new Lua function. The indented lines that follow contain the documentation for the function.

## Application (singleton)

Interface to access global application functionality. Note that since the
application is a singleton (there is only one application), you don’t need
to pass any %r Application object to the application functions. All the
functions operate on the application singleton.

@api

resolution() : width, height
 Returns the screen resolution.
 
argv() : arg1, arg2, arg3, ...
 Returns the command line arguments supplied to the application.

The documentation system recognizes the Lua function definitions and formats them appropriately. It also creates index entries for the functions in the .chm file. In addition, it can create cross-references between classes and functions (with the %r marker).

No out-of-the-box system can provide the same level of convenience.

In any documentation system, the documentation files are the most valuable resource. What really matters is that documentation is easy to write and easy to modify. In particular, my main concerns are:

Preserving semantic information.
Avoiding unnecessary markup and clutter.

By preserving semantic information I mean that we should be able to say, for example, that something is a Lua function definition, or a piece of sample C++ code, rather than just saying that something is italic or preformatted. If we have enough semantic information, we can do all kinds of things to the data in post-processing. We can parse the function definition using a Lua parser, or run the C++ code through a syntax highlighter. We can convert the files to some other format if we ever decide to switch documentation system.

If the documentation format doesn't preserve semantic data, there is no way of getting that data back, except by going through all the documentation and adding it manually. That's painful.

Avoiding markup and clutter is all about making the documents easy to write and easy to modify. That's the whole point of using a markup language (instead of plain HTML) in the first place.

Our custom markup language lets us achieve both these goals in a way that no off-the-shelf solution could.

The second reason for writing our own system is that there is no fundamentally hard problem that the existing systems solve. If they did something really advanced that would take us months to duplicate, then it might be better to use an existing system even if it wasn't perfectly adapted to our needs. But parsing some text and converting it to HTML isn't hard. The entire documentation system is just a few hundred lines of Ruby code.

(In contrast, Doxygen actually does solve a hard problem. Parsing general C++ code is tricky. That's why we use Doxygen to document our C++ code, but our own system for stand-alone documentation.)

The System Design

If I've done my job and convinced you that the best thing to do is to write your own documentation system, then what's the point of sharing my code with you?

Well, the system we use consists of two parts. One part (the bulk of it) is generic and can be used to implement any markup language. The rules that are specific to our markup language are all kept in a single file (bsdoc.rb). To write your own documentation system, you could re-use the generic parts and just write your own markup definition.

The generic part of the system consists of four files:

paragraph_parser.rb: Parses the paragraphs of a document into block-level HTML code.
span_parser.rb: Does span-level parsing inside a HTML block.
generator.rb: Generates the output HTML.
toc.rb: Adds section numbering and a table of contents to an HTML file.

Most of the code is pretty straight forward. A rule set is a collection of regular expressions. The expressions are tested in turn against the content and the first one that matches is applied. There are separate rules for parsing the document on the block level (the ParagraphParser) and inside each line (the SpanParser).

There are some ideas in the system that I think are interesting enough to mention though:

Line-by-line parsing

On the paragraph level, the document is parsed line-by-line. Each rule regex is tested in turn and the first one that matches is applied. This ensures that the process is speedy for all kinds of input (O(N) in the number of lines). It also makes the system simpler to reason about.

No intermediate representation

The system does not build any intermediate representation of the document. It is converted directly from the .bsdoc source format to HTML. This again simplifies the system, because we don't have to device an intermediate representation for all kinds of data that we want to handle.

HTML "contexts" for lines

When a rule is applied, it doesn't write raw HTML code to the output. Instead, it gives the generator a piece of text and a list of tags that should be applied to it. I call this the "context" of the text.

env.write(%w(ul li p), "Hi!")

The generator will add tags as appropriate to ensure that the line is printed in the right context:

<ul><li><p>Hi!</p></li></ul>

When several lines are printed, the generator only opens and closes the minimum number of tags that are necessary to give each line the right context. It does this by matching the list of contexts for neighboring lines:

This:

env.write(%w(ul li p), "First item!")
env.write(%w(ul li p), "First paragraph!")
env.write(%w(ul li), nil)
env.write(%w(ul li p), "First item, second paragraph!")
env.write(%w(ul), nil)
env.write(%w(ul li p), "Second item!")

ends up as:

<ul>
 <li>
  <p>
   First item!
   First paragraph!
  <p>
  <p>First item, second paragraph!</p>
 </li>
 <li><p>Second item!</p></li>
</ul>

Note the trick of writing nil to explicitly close a scope.

Since I really, really hate badly formatted HTML documents, I've made sure that the output from the generator looks (almost) as good as hand-written HTML.

Using contexts in this way gets rid of a lot of the complexities of HTML generation. When we write our rules we don't have to think about opening and closing tags, we just have to make sure that we use an appropriate context for each line.

Nested scopes

The final idea is to automatically handle nested markup by applying the rules recursively. Consider this input document:

* Caught in the jungle
 * By a bear 
 * By a lion
 * By something else
* Caught in the woods

I don't have any special parsing rules for dealing with nested lists. Instead, the first line of this document creates a scope with the context %w(ul li). That scope is applied to all indented lines that follow it. The system strips the indentation from the line, processes it using the normal rule set, and then prepends %w(ul li) to its context. When it reaches a line without indentation, it drops the scope. Scopes can be stacked for multiple levels of nesting.

This way we can deal with arbitrarily complex nested structures (a code sample in a list in a blockquote) without any special processing rules.

A Bonus for AltDevBlogADay Writers

As a bonus for my fellow AltDevBlogADay writers I've added a syntax module for writing AltDevBlogADay articles. It converts source documents to a format suitable for publishing on AltDevBlogADay. (This includes taking care of the tricky <pre> tags.)

There is also a package for Sublime Text 2 (my favorite text editor) that gives you syntax highlighting and a build command for converting a document to HTML and previewing it in a browser. I'm currently writing all my AltDevBlogADay articles in this way.

(This article has also been posted to The Bitsquid blog.)