Friday, February 25, 2011

Managing Decoupling Part 3 - C++ Duck Typing

Some systems need to manipulate objects whose exact nature are not known. For example, a particle system has to manipulate particles that sometimes have mass, sometimes a full 3D rotation, sometimes only 2D rotation, etc. (A good particle system anyway, a bad particle system could use the same struct for all particles in all effects. And the struct could have some fields called custom_1,custom_2 used for different purposes in different effects. And it would be both inefficient, inflexible and messy.)

Another example is a networking system tasked with synchronizing game objects between clients and servers. A very general such system might want to treat the objects as open JSON-like structs, with arbitrary fields and values:

{
    "score" : 100,
    "name": "Player 1"
}

We want to be able to handle such “general” or “open” objects in C++ in a nice way. Since we care about structure we don’t want the system to be strongly coupled to the layout of the objects it manages. And since we are performance junkies, we would like to do it in a way that doesn’t completely kill performance. I.e., we don’t want everything to inherit from a base class Object and define our JSON-like objects as:

typedef std::map OpenStruct;

Generally speaking, there are three possible levels of flexibility with which we can work with objects and types in a programming language:


1. Exact typing - Only ducks are ducks


We require the object to be of a specific type. This is the typing method used in C and for classes without inheritance in C++.

2. Interface typing - If it says it’s a duck


We require the object to inherit from and implement a specific interface type. This is the typing method used by default in Java and C# and in C++ when inheritance and virtual methods are used. It is more flexible that the exact approach, but still introduces a coupling, because it forces the objects we manage to inherit a type defined by us.

Side rant: My general opinion is that while inheriting interfaces (abstract classes) is a valid and useful design tool, inheriting implementations is usually little more than a glorified “hack”, a way of patching parent classes by inserting custom code here and there. You almost always get a cleaner design when you build your objects with composition instead of with implementation inheritance.

3. Duck typing - If it quacks like a duck


We don’t care about the type of the object at all, as long as it has the fields and methods that we need. An example:

      def integrate_position(o, dt):
          o.position = o.position + o.velocity * dt

This method integrates the position of the object o. It doesn’t care what the type of o is, as long as it has a “position” field and a “velocity” field.

Duck typing is the default in many “scripting” languages such as Ruby, Python, Lua and JavaScript. The reflection interface of Java and C# can also be used for duck typing, but unfortunately the code tends to become far less elegant than in the scripting languages:

      o.GetType().GetProperty(“Position”).SetValue(o, o.GetType().
         GetProperty(“Position”).GetValue(o, null) + o.GetType().
         GetProperty(“Velocity”).GetValue(o, null) * dt, null)

What we want is some way of doing “duck typing” in C++.

Let’s look at inheritance and virtual functions first, since that is the standard way of “generalizing” code in C++. It is true that you could do general objects using the inheritance mechanism. You would create a class structure looking something like:

class Object {...};
class Int : public Object {...};
class Float : public Object{...};

and then use dynamic_cast or perhaps your own hand-rolled RTTI system to determine an object’s class.
But there are a number of drawbacks with this approach. It is quite verbose. The virtual inheritance model requires objects to be treated as pointers so they (probably) have to be heap allocated. This makes it tricky to get a good memory layout. And that hurts performance. Also, they are not PODs so we will have to do extra work if we want to move them to a co-processor or save them to disk.

So I prefer something much simpler. A generic object is just a type enum followed by the data for the object:



To pass the object you just pass its pointer. To make a copy, you make a copy of the memory block. You can also write it straight to disk and read it back, send it over network or to an SPU for off-core processing.

To extract the data from the object you would do something like:

unsigned type = *(unsigned *)o;
if (type == FLOAT_TYPE)
    float f = *(float *)(o + 4);

You don’t really need that many different object types: boolintfloatvector3quaternionstring,array and dictionary is usually enough. You can build more complicated types as aggregates of those, just as you do in JSON.

For a dictionary object we just store the name/key and type of each object:



I tend to use a four byte value for the name/key and not care if it is an integer, float or a 32-bit string hash. As long as the data is queried with the same key that it was stored with, the right value will be returned. I only use this method for small structs, so the probability for a hash collision is close to zero and can be handled by “manual resolution”.

If we have many objects with the same “dictionary type” (i.e. the same set of fields, just different values) it makes sense to break out the definition of the type from the data itself to save space:



Here the offset field stores the offset of each field in the data block. Now we can efficiently store an array of such data objects with just one copy of the dictionary type information:



Note that the storage space (and thereby the cache and memory performance) is exactly the same as if we were using an array of regular C structs, even though we are using a completely open free form JSON-like struct. And extracting or changing data just requires a little pointer arithmetic and a cast.

This would be a good way of storing particles in a particle system. (Note: This is an array-of-structures approach, you can of course also use duck typing with a sturcture-of-arrays approach. I leave that as an exercise to the reader.)

If you are a graphics programmer all of this should look pretty familiar. The “dictionary type description” is very much like a “vertex data description” and the “dictionary data” is awfully similar to “vertex data”. This should come as no big surprise. Vertex data is generic flexible data that needs to be processed fast in parallel on in-order processing units. It is not strange that with the same design criterions we end up with a similar solution.


Morale and musings

It is OK to manipulate blocks of raw memory! Pointer arithmetic does not destroy your program! Type casts are not “dirty”! Let your freak flag fly!

Data-oriented-design and object-oriented design are not polar opposites. As this example shows a data-oriented design can in a sense be “more object-oriented” than a standard C++ virtual function design, i.e., more similar to how objects work in high level languages such as Ruby and Lua.

On the other hand, data-oriented-design and inheritance are enemies. Because designs based on base class pointers and virtual functions want objects to live individually allocated on the heap. Which means you cannot control the memory layout. Which is what DOD is all about. (Yes, you can probably do clever tricks with custom allocators and patching of vtables for moving or deserializing objects, but why bother, DOD is simpler.)

You could also store function pointers in these open structs. Then you would have something very similar to Ruby/Lua objects. This could probably be used for something great. This is left as an exercise to the reader.

Friday, February 11, 2011

Managing Coupling Part 2 — Polling, Callbacks and Events


In my last post, I talked a bit about the importance of decoupling and how one of the fundamental challenges in system design is to keep systems decoupled while still allowing the necessary interactions to take place.

This time I will look at one specific such challenge: when a low level system needs to notify a high level system that something has happened. For example, the animation system may want to notify the gameplay system that the character’s foot has touched the ground, so that a footstep sound can be played.

(Note that the reverse is not a problem. The high level system knows about the low level system and can call it directly. But the low level system shouldn’t know or care about the high level system.)

There are three common techniques for handling such notifications: polling, callbacks and events.

Polling

A polling system calls some function every frame to check if the event it is interested in has occurred. Has the file been downloaded yet? What about now? Are we there yet?

Polling is often considered “ugly” or “inefficient”. And indeed, in the desktop world, polling is very impolite, since it means busy-waiting and tying up 100 % of the CPU in doing nothing.

But in game development the situation is completely different. We are already doing a ton of stuff every 33 ms (or half a ton of stuff every 17 ms). As long as we don’t poll a huge amount of objects, polling won’t have any impact on the framerate.

And code that uses polling is often easier to write and ends up better designed than code that uses callbacks or events. For example, it is much easier to just check if the A key is pressed inside the character controller, than to write a callback that gets notified if A is pressed and somehow forward that information to the character controller.

So, in my opinion, you should actually prefer to use polling whenever possible (i.e., when you don’t have to monitor a huge number of objects).

Some areas where polling work well are: file downloads, server browsing, game saving, controller input, etc.

An area less suited for polling is physics collisions, since there are N*N possible collisions that you would have to poll for. (You could argue that rather than polling for a collision between two specific objects, you could poll for a collision between any two objects. My reply would be that in that case you are no longer strictly polling, you are in fact using a rudimentary effect system.)

Callbacks

In a callback solution, the low level system stores a list of high level functions to call when certain events occur.

An important question when it comes to callbacks is if the callback should be called immediately when the event occurs, or if it should be queued up and scheduled for execution later in the frame.

I much prefer the latter approach. If you do callbacks immediately you not only trash your instruction and data caches. You also prevent multithreading (unless you use locks everywhere to prevent the callbacks from stepping on each other). And you open yourself up to the nasty bug where a callback through a chain of events ends up destroying the very objects you are looping over.

It is much better to queue up all callbacks and only execute them when the high level system asks for it (with an execute_callbacks() call). That way you always know when the callbacks occur. Side effects can be minimized and the code flow is clearer. Also, with this approach there is no problem with generating callbacks on the SPU and merging the queue with other callback queues later.

The only thing you need to worry about with delayed callbacks is that the objects that the callback refers to might have been destroyed between the time when the callback was generated and the time when it was actually called. But this is neatly handled by using the ID reference system that I talked about in the previous post. Using that technique, the callback can always determine if the objects still exist.

Note that the callback system outlined here has some similarities with the polling system — in that the callbacks only happen when we explicitly poll for them.

It is not self-evident how to represent a callback in C++. You might be tempted to use a member function pointer. Don’t. The casting and typing rules make it near impossible to use them for any kind of generic callback mechanism. Also, don’t use an “observer pattern”, where the callback must be some object that inherits from an AnimationEventObserver class and overrides handle_animation_event(). That just leads to tons of typing and unnecessary heap allocation.

There is an interesting article about fast and efficient C++ delegates at http://www.codeproject.com/KB/cpp/FastDelegate.aspx. It looks solid, but personally I’m not comfortable with making something that requires so many platform specific tricks one of the core mechanisms of my engine. 

So instead I use regular C function pointers for callbacks. This means that if I want to call a member function, I have to make a little static function that calls the member function. That is a bit annoying, but better than the alternatives.

(Isn’t it interesting that when you try to design a clean and flexible C++ API it often ends up as pure C.)

When you use C callbacks you typically also want to pass some data to them. The typical approach in the C world is to use a void * to “user data” that is passed to the callback function. I actually prefer a slightly different approach. Since I sometimes want to pass more data than a single void * I use something like this:

struct Callback16
{
  void (*f)(void);
  char data[12];
};

There aren’t a huge amount of callbacks, so using 16 bytes instead of 8 to store them doesn’t matter. You could go to Callback32 if you want the option to store even more data.

When calling the callback, I cast the function pointer to the appropriate type and pass a pointer to its data as the first parameter.

typedef void (*AnimationEventCallback)(void *, unsigned);
AnimationEventCallback f = (AnimationEventCallback)callback.f;
f(callback.data, event_id);

I’m not worried about casting the function pointer back and forth between a generic type and a specific one or about casting the data in and out of a raw buffer. Type safety is nice, but there is an awful lot of power in juggling blocks of raw memory. And you don’t have to worry that much about someone casting the data to the wrong type, because doing so will 99% of the time cause a huge spectacular crash, and the error will be fixed immediately.

Events

Event systems are in many ways similar to callback systems. The only difference is that instead of storing a direct pointer to a callback function, they store an event enum. The high level system that polls the events decides what action to take for each enum.

In my opinion, callbacks work better when you want to listen to specific notifications: “Tell me when this sound has finished playing.” Events work better when you process them in bulk: “Check all collision notifications to see if the forces involved are strong enough to break the objects.” But much of it is a matter of taste.

For storing the event queues (or callback queues) I just use a raw buffer (Vector orchar[FIXED_SIZE]) where I concatenate all events and their data:

[event_1_enum] [event_1_data] [event_2_enum] [event_2_data] …

The high level system just steps through this buffer, processing each event in turn. Note that event queues like this are easy to move, copy, merge and transfer between cores. (Again, the power of raw data buffers.)

In this design there is only a single high level system that polls the events of a particular low level system. It understands what all the events mean, what data they use and knows how to act on them. The sole purpose of the event system (it is not even much of a “system”, just a stream of data) is to pass notifications from the low level to the high.

This is in my opinion exactly what an event system should be. It should not be a magic global switchboard that dispatches events from all over the code to whoever wants to listen to them. Because that would be horrid!