<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1994130783874175266</id><updated>2012-01-29T21:22:29.255+01:00</updated><title type='text'>Bitsquid</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>54</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-3844357319206593951</id><published>2012-01-22T10:49:00.000+01:00</published><updated>2012-01-22T10:49:00.719+01:00</updated><title type='text'>Sensible Error Handling: Part 1</title><content type='html'>To err is human. But it is also quite computery. Unfortunately, error handling tends to bring out the worst in APIs.&lt;br /&gt;&lt;br /&gt;Error handling is what makes your program go from something nice, clear and readable such as:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;Stuff s = open_something(x);&lt;br /&gt;int len = get_size(s);&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;To some horrible monstrosity such as:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp" escaped="true"&gt;Stuff s;&lt;br /&gt;int err = open_something(x, &amp;amp;s);&lt;br /&gt;if (err == E_STUFFNOTFOUND) {&lt;br /&gt;    fprintf(stderr, ”Something was not found”);&lt;br /&gt;    goto exit;&lt;br /&gt;} else if (err == E_INVAL) {&lt;br /&gt;    fprintf(stderr, ”Something was invalid”);&lt;br /&gt;    goto exit;&lt;br /&gt;} else if (err == E_RETRY || err = E_COMPUTERNOTINTHEMOOD) &lt;br /&gt;    goto exit;&lt;br /&gt;int len = 0;&lt;br /&gt;err = get_size(s, &amp;amp;len);&lt;br /&gt;if (err == E_HULLNOTPOLARIZED)&lt;br /&gt;    goto close_and_exit;&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In this article (and the follow-up) I’m going to discuss how I think you should design systems so that the error handling is as sensible as possible and the burden on the callers is minimized.&lt;br /&gt;&lt;br /&gt;Note that I’m discussing this from the perspective of game development where errors will never cause serious damage to humans or property (I’m disregarding the keyboards smashed in frustration when a game crashes during the final minutes of a three hour boss fight).&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Types of Errors&lt;/h2&gt;&lt;br /&gt;There are three main types of errors that we need to deal with:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Expected errors&lt;/li&gt; &lt;li&gt;Unexpected errors&lt;/li&gt; &lt;li&gt;Warnings&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;By an &lt;i&gt;expected error&lt;/i&gt; I mean any kind of error that happens in a situation where the caller can reasonably expect that something might go wrong and has a plan for dealing with that. The most typical example is network code. Since the network may die at anytime, the caller cannot just call &lt;i&gt;fetch_web_page()&lt;/i&gt; and assume that she will get a valid result. She must always check for and be prepared to handle errors.&lt;br /&gt;&lt;br /&gt;An &lt;i&gt;unexpected error&lt;/i&gt; is an error that happens when the caller has no reason to assume that something might go wrong. A typical example might be a &lt;i&gt;NULL&lt;/i&gt; pointer returned by an allocator that is out of memory or a corrupted internal state caused by a buffer overflow problem.&lt;br /&gt;&lt;br /&gt;What errors can be considered ”expected” depends on context. When opening a saved game or a user config file, &lt;i&gt;File Not Found&lt;/i&gt; might be an expected error, because we can expect the user to muck around with those files. When opening our main .&lt;i&gt;pak&lt;/i&gt; bundles, &lt;i&gt;File Not Found&lt;/i&gt; is an unexpected error, because we don’t expect the user to partially delete an installed game. And besides, there is not much we can do beyond displaying an error message if our data isn’t there.&lt;br /&gt;&lt;br /&gt;A &lt;i&gt;warning&lt;/i&gt; happens when someone has done something that is kind-of sort-of bad, probably, but we are able to continue running without any ill effects. An example might be a call to a deprecated function.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Unexpected Errors&lt;/h2&gt;&lt;br /&gt;The unexpected errors are the most common ones. Expected errors only happen in a few well-defined places, such as network code. Unexpected errors can happen everywhere. It is always safe to assume that you program contains lots of bugs that you have no idea about.&lt;br /&gt;&lt;br /&gt;My policy for handling unexpected errors is simple:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Crash the engine as soon as possible with an informative error message.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;This may seem like a totally irresponsible thing to do. Crashing is... bad, right?&lt;br /&gt;&lt;br /&gt;Actually it is exactly the opposite.&lt;br /&gt;&lt;br /&gt;If we didn’t crash it would be up to the caller to handle the error. So the programmer writing that code wouldn’t only have to think about what she wanted to achieve with our API, but also in what ways our code might fail and how she would have to handle that. That is more work and leads to cluttered code, as in the example above. It is also nearly impossible to do in a good way. Remember, these are unexpected errors. Anything might happen.&lt;br /&gt;&lt;br /&gt;By crashing, the API is &lt;i&gt;taking full responsibility&lt;/i&gt; for performing what the caller asks of it. We are saying: either we will do what you wanted or, if there is a problem with that, we will deal with that too. In either case, you don’t have to worry about it.&lt;br /&gt;&lt;br /&gt;Crashing makes APIs simpler and reduces the mental burden of the caller. Here is what a file API might look like if designed with the ”crash”-philosophy in mind.&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;bool exists(const char *path);&lt;br /&gt;Archive open(const char *path);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note the curious absence of any error codes. If the caller passes a malformed path, we crash, we do not return an &lt;i&gt;E_INVALIDARGUMENT&lt;/i&gt; error. If the file doesn’t exist, we crash. The caller is responsible for using &lt;i&gt;exist()&lt;/i&gt; to check for files that might not exist. There are no errors for the caller to handle and the code will be clean and readable.&lt;br /&gt;&lt;br /&gt;Since life is so much simpler for the caller when she doesn’t have to think about errors, we write our code with that in mind. Instead of functions returning error codes, such as:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp" escaped="true"&gt;/// Returns E_PARSE_ERROR on badly formatted Json, E_NULL if&lt;br /&gt;/// passed a null pointer, E_OVERFLOW if too big, etc.&lt;br /&gt;int parse_json_number(const char *s, double &amp;amp;number);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;we have functions that crash on errors:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;double parse_json_number(const char *s);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In most cases this is all we need, because we expect the Json to be well formed. If it isn't, some other part of our tech has made an error that needs to be fixed. If we had any situations where we could expect bad Json (perhaps hand-entered through the in-game console), we would add a validating function:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;bool is_valid_json_number(const char *s);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now we can have some code that deals with bad data without forcing error handling into all our code.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;But do we really need to crash?&lt;/h3&gt;&lt;br /&gt;At this point, some people will probably agree with most of the things I say, but still feel uneasy about crashing. Because crashing is... bad, right? Nobody wants to be the programmer that crashed the engine. Surely it is better to write a really serious, really super-stern error message that can’t be ignored but then try to patch things up and solider on so that we don’t crash. If a file doesn’t exist perhaps we can pretend that it did exist but was empty. If the Json we tried to parse was malformed, perhaps we can just return the part of it that we managed to parse. If the caller wants to access data beyond the end of an array, perhaps we can just return the last element.&lt;br /&gt;&lt;br /&gt;No thanks.&lt;br /&gt;&lt;br /&gt;I have two problems with this.&lt;br /&gt;&lt;br /&gt;First, this makes programmers expend a lot of mental energy thinking about how to patch up an erroneous state. Most likely, this work is completely futile. They won’t be able to think about all the errors that might possibly occur. The attempts of patching things up will probably just cause a cascade of other errors and a more serious (and confusing) crash later on. And the ”error fixing” code will be strange and ugly. More code is always a burden, a cost. Let’s not spend it on magically patching up errors in ways that won’t work. Let’s focus on fixing the errors instead. &lt;br /&gt;&lt;br /&gt;Second, I don’t care how stern your error message is, I promise you it will be ignored. If it happens infrequently, if it is just on one machine, if it is in a new system, if we just need to send these screen shots off to day, if a deadline is coming up, if we’re past the deadline, if there’s another deadline. It will be ignored. Your code will gather more and more errors that don’t get fixed, until it is a glitchy, horrible mess.&lt;br /&gt;&lt;br /&gt;That’s why I love crashing. It is an error that can’t be ignored. Of course it is unacceptable for an engine to crash. And that’s why the error will be fixed. Which will make everybody happier in the long run. Crashes improve the production process and lead to better quality code.&lt;br /&gt;&lt;br /&gt;Nobody wants the game to crash for the end user, but the way to achieve that is with testing and bug fixing, not by finding ways of ignoring the errors that you detect.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Exceptions&lt;/h3&gt;&lt;br /&gt;Rather than crashing isn’t it better to throw an exception? If the exception isn’t caught we get a crash, just as before. But we also have the option, if we really want to, to catch the exception and handle the error. It would seem that by using exceptions we can have our cake and eat it too.&lt;br /&gt;&lt;br /&gt;Low-level programmers tend to abhor exceptions because they come with some performance overheads, even when they aren’t thrown. I’m not actually sure what the current status is, whether this is something that you still have to worry about or if exceptions are ”fast enough” on all current compilers and platforms.&lt;br /&gt;&lt;br /&gt;I haven’t needed to care about that, because I dislike exceptions for the complexity they add. The crash model is dead simple, the code either works or not. The caller knows that she is not responsible for any error handling.&lt;br /&gt;&lt;br /&gt;With exceptions, this clear and useful distinction between expected and unexpected errors is muddled and the caller is faced with a number of questions:&lt;br /&gt;&lt;br /&gt;This function throws exceptions. Do I need to handle those? What kind of exceptions might it throw? Even if I don’t catch the exception, might someone higher up in the call hierarchy do it? Does this mean that I need to write all my code so that the state is valid if an exception is thrown somewhere (might be anywhere, really) by one of the functions I call? What if I’m in a constructor? What if I’m in a destructor.&lt;br /&gt;&lt;br /&gt;By using exceptions instead of just crashing we are creating a more complicated API (the API now includes all the different exceptions that the different functions might call) and significantly increasing the mental burden on the caller for very little gain.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Good error reports&lt;/h3&gt;&lt;br /&gt;When we crash, we try to create an error message and a log report that is as informative as possible to facilitate debugging of the problem. Our reports always include:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;A description of the error&lt;/li&gt; &lt;li&gt;The call stack&lt;/li&gt; &lt;li&gt;The error context&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;We use printf-formatting to create an the error message. Note that the C preprocessor supports variadic macros, so you can create macros that work like printf:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;#if defined(DEVELOPMENT)&lt;br /&gt;    #define XASSERT(test, msg, ...) do {if (!(test)) error(__LINE__, __FILE__, \&lt;br /&gt;        "Assertion failed: %s\n\n" msg, #test,  __VA_ARGS__);} while (0)&lt;br /&gt;#else&lt;br /&gt;    #define XASSERT(test, msg, ...) ((void)0)&lt;br /&gt;#endif&lt;br /&gt;&lt;br /&gt;XASSERT(exists(file), ”File %s does not exist”, %s);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Call stack generation and translation from raw addresses to file names and line numbers is platform specific and a lot more cumbersome than it ought to be. But it is still well worth doing. Call stacks let you diagnose many errors with a glance. It is a lot faster than loading up crash dumps in the debugger.&lt;br /&gt;&lt;br /&gt;On Windows, use &lt;i&gt;StalkWalk64&lt;/i&gt; to generate the call stack and the &lt;i&gt;Sym*&lt;/i&gt; functions to translate it.&lt;br /&gt;&lt;br /&gt;The error context is our way of providing contexts for error messages. The problem is that sometimes crashes happen in deeply nested code that doesn’t have all the information we would like to give to the user. For example:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;double parse_json_number(const char *s);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If there is a parse error, it would be very helpful for the user to know in which file the error occurred. But the &lt;i&gt;parse_json_number&lt;/i&gt; function doesn’t know that. It doesn’t even know if there is a file. It might have been asked to parse data from network or memory.&lt;br /&gt;&lt;br /&gt;If we were using exceptions we could handle this by catching the exception at a higher level, adding some information to it (such as the file name) and rethrowing it. But that is rather tedious and also tricky to do in a good way. If we want to add the information to the original exception, then it must already have members for all the possible information that all higher level functions might want to add. That’s a bit strange. Should we throw a new exception? Then the exception gets thrown from the ”wrong place”. The result of all this is that people seldom bother ”decorating” their exceptions in this way. At least I’ve never seen a code base that does it systematically.&lt;br /&gt;&lt;br /&gt;What we do instead, is to allow the programmer to define error contexts using scope variables:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp"&gt;void init(const char *file)&lt;br /&gt;{&lt;br /&gt;    ErrorContext ec("Parsing JSON:", file);&lt;br /&gt;    JsonDoc *doc = parse_json(file);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The error contexts get stored on a stack:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="cpp" escaped="true"&gt;__THREAD Array&amp;lt;const char *&gt; *_error_context_name;&lt;br /&gt;__THREAD Array&amp;lt;const char *&gt; *_error_context_data;&lt;br /&gt;&lt;br /&gt;class ErrorContext&lt;br /&gt;{&lt;br /&gt;public:&lt;br /&gt;    ErrorContext(const char *name, const char *data) {&lt;br /&gt;        _error_context_name-&gt;push_back(name);&lt;br /&gt;        _error_context_data-&gt;push_back(data);&lt;br /&gt;    }&lt;br /&gt;    ~ErrorContext() {&lt;br /&gt;        _error_context_name-&gt;pop_back();&lt;br /&gt;        _error_context_data-&gt;pop_back();&lt;br /&gt;    }&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that we only store string pointers, not the full string data. We assume that whatever string the user gives us lives in the same scope as the error context and is valid as long as the error context is. This means that setting the error context just requires pushing 8 bytes to a stack, so the performance overhead is very small.&lt;br /&gt;&lt;br /&gt;Note also that the stack uses thread local storage, so we have separate error context stacks for our different execution threads.&lt;br /&gt;&lt;br /&gt;When an error occurs, we print all the contexts in the stack, giving the user a good idea of where the error occurred:&lt;br /&gt;&lt;br /&gt;&lt;pre lang="text"&gt;When spawning level: big_world&lt;br /&gt;When spawning unit: big_bird&lt;br /&gt;When applying material: feathers&lt;br /&gt;Assertion failed: texture != NULL&lt;br /&gt;    Texture not loaded: yellow_feathers&lt;br /&gt;    In material_manager.cpp:1337&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;h3&gt;Next time&lt;/h3&gt;&lt;br /&gt;Next time, I’ll look at the other kinds of errors: expected errors and warnings.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-3844357319206593951?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/3844357319206593951/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2012/01/sensible-error-handling-part-1.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3844357319206593951'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3844357319206593951'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2012/01/sensible-error-handling-part-1.html' title='Sensible Error Handling: Part 1'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4650814670661042597</id><published>2012-01-06T13:17:00.002+01:00</published><updated>2012-01-06T13:17:59.118+01:00</updated><title type='text'>5 Tips for Programmer Productivity</title><content type='html'>&lt;h2&gt;1. Embrace the now-principle&lt;/h2&gt;&lt;br /&gt;If something takes less than five minutes to do, do it immediately.&lt;br /&gt;&lt;br /&gt;It seems like the lazy option, but postponing something actually takes a lot of effort. The task needs to be written down somewhere. Then you need to track it and prioritize it with respect to other tasks. You will probably think &lt;em&gt;about&lt;/em&gt; doing it lots of times, before you actually get down to &lt;em&gt;doing&lt;/em&gt; it. And then you have to understand what you meant when you wrote it down and try to get back in that same mindset.&lt;br /&gt;&lt;br /&gt;For small tasks it is just not worth it.&lt;br /&gt;&lt;br /&gt;Instead, just do it. Fix the issue right now while you are already thinking about it. It is faster, simpler and saves you the agony of an ever growing todo-list.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;2. Fix the cause, not just the symptom&lt;/h2&gt;&lt;br /&gt;Don’t just fix problems. Fix the &lt;em&gt;processes&lt;/em&gt; that allowed the problems to occur, so that the same problems never occur again. See bugs not as nuisances, but as chances to improve your processes and increase the quality of your code.&lt;br /&gt;&lt;br /&gt;If an artist tells you that she gets an ”Error when compiling unit” error, don’t just diagnose it and tell her: ”It’s because you have two nodes with the same name, that is not allowed.” At the &lt;em&gt;very least&lt;/em&gt; fix the error message so that it says ”Error when compiling unit ‘bed’. Two nodes have the same name ‘pillow’. One of them must be renamed so that names are unique.” Even better, fix the exporter or the tool, so that it is &lt;em&gt;impossible&lt;/em&gt; for the artist to create two nodes with the same name.&lt;br /&gt;&lt;br /&gt;If you find an error that could have been caught by an assert, then add that assert so that it will find the error next time.&lt;br /&gt;&lt;br /&gt;If someone asks you ”How can I configure the animation compression?”, don’t just answer them. Also write a short text that explains how it is done, and &lt;em&gt;add that text to the documentation&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;In this way, you are not just patching holes and fixing leaks, you are actively making things better. This not only pleases the people who come to you with problems, it also makes your work feel more meaningful.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;3. Try not to break concentration while ”the computer is working”&lt;/h2&gt;&lt;br /&gt;The job of programming is filled with a lot of weird little micro pauses. The code is compiling. The console is rebooting. The level is loading. The client is connecting. Etc.&lt;br /&gt;&lt;br /&gt;In the best of worlds, these pauses would not exist, and by all means, do all you can to get rid of them. Make your code build faster. Hot reload data and scripts. Make a tool for quickly setting up a bunch of PS3s for a network test. Etc.&lt;br /&gt;&lt;br /&gt;But even with the best of efforts, some pauses will likely remain. The question is what to do with them. The temptation is to take a short break from programming and do something else: check mail, answer a Skype, read two paragraphs of an interesting article, update the twitter feed, etc.&lt;br /&gt;&lt;br /&gt;For me, these constant mental context switches can put a real damper on productivity, since they make it impossible to maintain concentration and flow.&lt;br /&gt;&lt;br /&gt;Nor are these micro-excursions particularly relaxing. Reading two paragraphs of a web page while constantly glancing at a progress bar on the other monitor is not something that soothes my mind. Quite contrary, it is much more stressful than remaining in the zen-like state of concentrated work. It is much better to take one real break than a hundred micro breaks.&lt;br /&gt;&lt;br /&gt;So for both productivity and peace of mind I now make a conscious effort to stay focused on the problem at hand while ”the computer is working”. I have a &lt;a href="http://www.sublimetext.com/2"&gt;separate text editor&lt;/a&gt;, unaffected by IDE freezes, where I can work on related tasks, such as:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Adding documentation&lt;/li&gt;&lt;li&gt;Refactoring and code review&lt;/li&gt;&lt;li&gt;Planning the next stage of implementation&lt;/li&gt;&lt;li&gt;Writing script code that tests the system&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;It is still an ongoing battle against the Lure of the Internet, but I find that when I manage to stay focused I am both more productive and more relaxed.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;4. Use source control even more than you think you should&lt;/h2&gt;&lt;br /&gt;Source control is not just for source code. With modern distributed source control systems such as Mercurial and Git it is dead simple to create a source repository anywhere and then later (if needed) push it to a server for backup/sharing.&lt;br /&gt;&lt;br /&gt;Do you have configuration and settings files for your text editor, IDE, etc? Put them in source control so that you can easily share them between your different machines. Do they need to be installed in special locations. Put them in source control together with a script that installs them in the right place.&lt;br /&gt;&lt;br /&gt;Do you use any third party libraries such as zlib, LuaJIT or stb_vorbis? Check them into source control. That way, if you have to do any modifications (bug fixes, fixes for compiler warnings, platform fixes, your own personal optimizations, etc) you will know exactly what you have changed. If a new version of the library is released you can use the source control diff to see what has changed upstream and merge it with your own local changes.&lt;br /&gt;&lt;br /&gt;Does an API come with sample code? Before you start playing with that sample code, check it into source control. That way, you can always revert the samples back to their pristine state, without having to reinstall the API. And if you find a bug in the APIs and manage to reproduce it in one of the samples, you can use the source control tool to produce a .patch file for the sample that you can send to the API manufacturers as part of your bug report. That will keep both you and them happy.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;5. Monitor your builds&lt;/h2&gt;&lt;br /&gt;Set up a build server that continuously builds all your executables (engine, tools, exporters, ...) in all configurations (debug, development, release) for all platforms, so you know as soon as possible if something breaks. Fixing a problem right away is much easier than doing it two months down the line.&lt;br /&gt;&lt;br /&gt;The build server doesn’t have to be a complicated thing. It is more important that it exists than that it has all the bells and whistles. If you don’t have time to do something advanced &lt;a href="http://altdevblogaday.com/2011/05/11/write-a-script-for-it/"&gt;just write a script&lt;/a&gt; that compiles everything and reports the result. You can expand on that later.&lt;br /&gt;&lt;br /&gt;Do the same for content. Write a script that loads all levels and spawns all units.&lt;br /&gt;&lt;br /&gt;Use the report system that works best for you. We use Skype for internal real-time communication, so it makes sense to report broken builds over Skype. If e-mail or IRC works better for you, use that instead.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4650814670661042597?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4650814670661042597/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2012/01/5-tips-for-programmer-productivity.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4650814670661042597'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4650814670661042597'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2012/01/5-tips-for-programmer-productivity.html' title='5 Tips for Programmer Productivity'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2585709356068731600</id><published>2011-12-28T12:53:00.000+01:00</published><updated>2011-12-28T12:53:56.820+01:00</updated><title type='text'>Code Share: Patch link.exe to ignore LNK4099</title><content type='html'>By default, Visual Studio's &lt;i&gt;link.exe&lt;/i&gt; does not let you ignore the linker warning &lt;a href="http://msdn.microsoft.com/en-us/library/b7whw3f3(v=vs.80).aspx"&gt;LNK4099&lt;/a&gt; (PDB file was not found).&lt;br /&gt;&lt;br /&gt;This can be a real nuisance when you have to link with third party libraries that reference (but do not come with) PDBs. You can get hundreds of linker warnings that you have no way of getting rid of.&lt;br /&gt;&lt;br /&gt;The only way I've found of fixing the problem is to patch &lt;i&gt;link.exe&lt;/i&gt; so that it allows warning 4099 to be ignored. Luckily, it is not as scary as it sounds. You only need to patch a single location to remove 4099 from a list of warnings that cannot be ignored. An outline of the procedure can be found &lt;a href="http://www.bottledlight.com/docs/lnk4099.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Following my general philosophy to &lt;a href="http://altdevblogaday.com/2011/05/11/write-a-script-for-it/"&gt;write-a-script-for-it&lt;/a&gt; I wrote a short ruby script that does the patching. I'm sharing it here for everybody that want to do voodoo on their &lt;i&gt;link.exe&lt;/i&gt; and get rid of the warning.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pastebin.com/RrkbXYZu"&gt;(Click here for pastebin version.)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;# This ruby program will patch the linker executable (link.exe)&lt;br /&gt;# so that linker warning LNK4099 is ignorable.&lt;br /&gt;#&lt;br /&gt;# Reference: http://www.bottledlight.com/docs/lnk4099.html&lt;br /&gt;&lt;br /&gt;require "fileutils"&lt;br /&gt;&lt;br /&gt;def link_exes()&lt;br /&gt; res = []&lt;br /&gt; res &lt;&lt; File.join(ENV["VS90COMNTOOLS"], "../../VC/bin/link.exe") if ENV["VS90COMNTOOLS"]&lt;br /&gt; res &lt;&lt; File.join(ENV["VS100COMNTOOLS"], "../../VC/bin/link.exe") if ENV["VS100COMNTOOLS"]&lt;br /&gt; res &lt;&lt; File.join(ENV["XEDK"], "bin/win32/link.exe") if ENV["XEDK"]&lt;br /&gt; return res&lt;br /&gt;end&lt;br /&gt;&lt;br /&gt;def patch_link_exe(exe)&lt;br /&gt; data = nil&lt;br /&gt; File.open(exe, "rb") {|f| data = f.read}&lt;br /&gt; unpatched = [4088, 4099, 4105].pack("III")&lt;br /&gt; patched = [4088, 65535, 4105].pack("III")&lt;br /&gt;&lt;br /&gt; if data.scan(patched).size &gt; 0&lt;br /&gt;  puts "* Already patched #{exe}"&lt;br /&gt;  return&lt;br /&gt; end&lt;br /&gt;&lt;br /&gt; num_unpatched = data.scan(unpatched).size&lt;br /&gt; raise "Multiple patch locations in #{exe}" if num_unpatched &gt; 1&lt;br /&gt; raise "Patch location not found in #{exe}" if num_unpatched == 0&lt;br /&gt;&lt;br /&gt; offset = data.index(unpatched)&lt;br /&gt; puts "* Found patch location #{exe}:#{offset}"&lt;br /&gt; bak = exe + "-" + Time.now.strftime("%y%m%d-%H%M%S") + ".bak"&lt;br /&gt; puts "  Creating backup #{bak}"&lt;br /&gt; FileUtils.cp(exe, bak)&lt;br /&gt; puts "  Patching exe"&lt;br /&gt; data[offset,unpatched.size] = patched&lt;br /&gt; File.open(exe, "wb") {|f| f.write(data)}&lt;br /&gt; return true&lt;br /&gt;end&lt;br /&gt;&lt;br /&gt;link_exes.each do |exe|&lt;br /&gt; patch_link_exe(exe)&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2585709356068731600?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2585709356068731600/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/12/code-share-patch-linkexe-to-ignore.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2585709356068731600'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2585709356068731600'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/12/code-share-patch-linkexe-to-ignore.html' title='Code Share: Patch link.exe to ignore LNK4099'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-6459071078257716793</id><published>2011-12-22T22:36:00.001+01:00</published><updated>2011-12-22T22:36:42.622+01:00</updated><title type='text'>Platform Specific Resources</title><content type='html'>I recently added a new feature to the BitSquid tool chain – support for source and destination platforms in the data compiler. What it means is that you can take the data for one platform (the source) and compile it to run on a different platform (the destination). So you can take the data for the mobile version of a game (with all its content optimizations) and compile it so that it runs on your development PC.&lt;br /&gt;&lt;br /&gt;This is nice for two reasons. First, access to target hardware can be limited. In a perfect world, every artist would have a dev kit for every target platform. In practice, this might not be economically possible. It might not even be &lt;em&gt;electrically&lt;/em&gt; possible (those main fuses can only take so much). Being able to preview and play console/handheld content on PC is better than nothing, in this less-than-perfect world.&lt;br /&gt;&lt;br /&gt;Second, since all our editors use the engine for visualization, if we have specified a handheld device as our source platform, all the editors will automatically show the resources as they will appear on that device.&lt;br /&gt;&lt;br /&gt;This new feature gives me a chance to talk a little bit about how we have implemented support for platform specific resources, something I haven’t touched on before in this blog.&lt;br /&gt;&lt;br /&gt;The BitSquid Tech uses the regular file system for its source data. A resource is identified by its name and type, both of which are determined from the path to the source file:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-GW4VF34Rc34/TvOirF7OP8I/AAAAAAAAAME/SxeQAukuCzM/s1600/properties_1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="129" width="400" src="http://4.bp.blogspot.com/-GW4VF34Rc34/TvOirF7OP8I/AAAAAAAAAME/SxeQAukuCzM/s400/properties_1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Note that even though the name &lt;em&gt;is&lt;/em&gt; a path, it is not treated as one, but as a unique identifier. It is hashed to a 64-bit integer by the engine and to refer to a resource you must always specify its full name (and get the same hash result). In the compiled data, the raw names don’t even exist anymore, the files are stored in flat directories indexed by the hash values.&lt;br /&gt;&lt;br /&gt;In addition to name and type a resource can also have a number of properties. Properties are dot-separated strings that appear before the type in the file name:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-CfgY57Cr5T0/TvOiyD9saOI/AAAAAAAAAMQ/cnWiOo5O_kY/s1600/properties_2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="93" width="400" src="http://2.bp.blogspot.com/-CfgY57Cr5T0/TvOiyD9saOI/AAAAAAAAAMQ/cnWiOo5O_kY/s400/properties_2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Properties are used to indicate different variants of the same resource. So all these files represent variants of the same resource:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;buttons.texture&lt;br /&gt;buttons.ps3.texture&lt;br /&gt;buttons.en.x360.texture&lt;br /&gt;buttons.fr.x360.texture&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The two most important forms of properties are &lt;em&gt;platforms&lt;/em&gt; and &lt;em&gt;languages&lt;/em&gt;. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;Platform properties&lt;/em&gt; (x360, ps3, android, win32, etc) are used to provide platform specific versions of resources. This can be used for platform optimized versions of units and levels. Another use is for controller and button images that differ from platform to platform. Since BitSquid is scripted in Lua and Lua files are just a resource like any other, this can also be used for platform specific gameplay code:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;PlayerController.android.lua&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;em&gt;Language properties&lt;/em&gt; (en, fr, jp, it, sv, etc) are used for localization. Since all resources have properties, all resources can be localized.&lt;br /&gt;&lt;br /&gt;But the property system is not limited to platforms and languages. A developer can make up whatever properties she needs and use them to provide different variants of resources:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;bullet_hit.noblood.particle_effect&lt;br /&gt;foilage.withkittens.texture&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Properties can be resolved either at data compile time or at runtime.&lt;br /&gt;&lt;br /&gt;Platform properties are resolved at compile time. When we compile for PS3 and a resource has &lt;em&gt;ps3&lt;/em&gt; specific variants, only those variants are included in the compiled data. (If the resource doesn’t have any ps3 variants, we include all variants that do not have a specified platform.)&lt;br /&gt;&lt;br /&gt;Language properties and other custom properties are resolved at runtime. All variants are compiled to the runtime data. When running, the game can specify what resource variants it wants with a &lt;em&gt;property preference order&lt;/em&gt;. The property preference order specifies the variants it wants to use, in order of preference.&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;Application.set_property_preference_order {”withkittens”, ”noblood”, ”fr”}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This means that the game would prefer to get a resource that has lots of kittens, no blood and is in French. But if it can’t get all that, it will rather have something that is kitten-full than blood-free. And it prefers a bloodless English resource to a bloody French one.&lt;br /&gt;&lt;br /&gt;In other words, if we requested the resource &lt;em&gt;buttons.texture&lt;/em&gt; with these settings, the engine would look for variants in the order:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;buttons.withkittens.noblood.fr.texture&lt;br /&gt;buttons.withkittens.noblood.texture&lt;br /&gt;buttons.withkittens.fr.texture&lt;br /&gt;buttons.withkittens.texture&lt;br /&gt;buttons.noblood.fr.texture&lt;br /&gt;buttons.noblood.texture&lt;br /&gt;buttons.fr.texture&lt;br /&gt;buttons.texture&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;To add support for different source and destination platforms to this system all I had to do was to add a feature that lets the data compiler use &lt;em&gt;one&lt;/em&gt; platform for resolving properties and a &lt;em&gt;different&lt;/em&gt; platform as the format for the runtime files it produces.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-6459071078257716793?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/6459071078257716793/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/12/platform-specific-resources.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6459071078257716793'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6459071078257716793'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/12/platform-specific-resources.html' title='Platform Specific Resources'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-GW4VF34Rc34/TvOirF7OP8I/AAAAAAAAAME/SxeQAukuCzM/s72-c/properties_1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-8701337091213636484</id><published>2011-12-08T01:21:00.001+01:00</published><updated>2011-12-08T01:24:04.292+01:00</updated><title type='text'>A Pragmatic Approach to Performance</title><content type='html'>Is premature optimization the root of all evil? Or is the fix-it-later attitude to performance turning programmers from proud ”computer scientists” to despicable ”script kiddies”?&lt;br /&gt;&lt;br /&gt;These are questions without definite answers, but in this article I’ll try to describe my own approach to performance. How I go about to ensure that my systems run decently, without compromising other goals, such as modularity, maintainability and flexibility.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§1 Programmer time is a finite resource&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;If you are writing a big program, some parts of the code will not be as fast as theoretically possible. Sorry, let me rephrase. If you are writing a big program, &lt;em&gt;no part&lt;/em&gt; of the code will be as fast as theoretically possible. Yes, I think it is reasonable to assume that every single line of your code could be made to run a little tiny bit faster.&lt;br /&gt;&lt;br /&gt;Writing fast software is not about maximum performance all the time. It is about &lt;em&gt;good performance where it matters&lt;/em&gt;. If you spend three weeks optimizing a small piece of code that only gets called once a frame, then that’s three weeks of work you could have spent doing something more meaningful. If you had spent it on optimizing code that actually mattered, you could even have made a significant improvement to the game’s frame rate.&lt;br /&gt;&lt;br /&gt;There is never enough time to add all the features, fix all the bugs and optimize all the code, so the goal should always be maximum performance for minimum effort.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§2 Don’t underestimate the power of simplicity&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Simple solutions are easier to implement than complex solution. But that’s only the tip of the iceberg. The real benefits of simple solutions come in the long run. Simple solutions are easier to understand, easier to debug, easier to maintain, easier to port, easier to profile, easier to optimize, easier to parallelize and easier to replace. Over time, all these savings add up.&lt;br /&gt;&lt;br /&gt;Using a simple solution can save so much time that even if it is slower than a more complex solution, as a whole your program will run faster, because you can use the time you saved to optimize other parts of the code. The parts that really matter.&lt;br /&gt;&lt;br /&gt;I only use complex solutions when it is really justified. I.e. when the complex solution is significantly faster than the simple one (a factor 2 or so) and when it is in a system that matters (that consumes a significant percentage of the frame time).&lt;br /&gt;&lt;br /&gt;Of course simplicity is in the eyes of the beholder. I think arrays are simple. I think POD data types are simple. I think blobs are simple. I don’t think class structures with 12 levels of inheritance are simple.  I don’t think classes templated on 8 policy class parameters are simple. I don’t think geometric algebra is simple.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§3 Take advantage of the system design opportunity&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Some people seem to think that to avoid ”premature optimization” you should design your systems without any regard to performance whatsoever. You should just slap something together and fix it later when you ”optimize” the code.&lt;br /&gt;&lt;br /&gt;I wholeheartedly disagree. Not because I love performance for its own sake, but for purely pragmatic reasons.&lt;br /&gt;&lt;br /&gt;When you design a system you have a clear picture in your head of how the different pieces fit together, what the requirements are and how often different functions get called. At that point, it is not much extra effort to take a few moments to think about how the system will perform and how you can setup the data structures so that it runs at fast as possible.&lt;br /&gt;&lt;br /&gt;In contrast, if you build your system without considering performance and have to come in and ”fix it” at some later point, that will be much harder. If you have to rearrange the fundamental data structures or add multithreading support, you may have to rewrite the entire system almost from scratch. Only now the system is in production, so you may be restricted by the published API and dependencies to other systems. Also, you cannot break any of the projects that are using the system. And since it was several months since you (or someone else) wrote the code, you have to start by understanding all the thoughts that went into it. And all the little bug fixes and feature tweaks that have been added over time will most likely be lost in the rewrite. You will start again with a fresh batch of bugs.&lt;br /&gt;&lt;br /&gt;So by just following our general guideline ”maximum efficiency with minimum effort”, we see that it is better to consider performance up front. Simply since that requires a lot less effort than fixing it later.&lt;br /&gt;&lt;br /&gt;Within reason of course. The performance improvements we do up front are easier, but we are less sure that they matter in the big picture. Later, profile-guided fixes require more effort, but we know better where to focus our attention. As in whole life, balance is important.&lt;br /&gt;&lt;br /&gt;When I design a system, I do a rough estimate of how many times each piece of code will be executed per frame and use that to guide the design:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;1-10 Performance doesn’t matter. Do whatever you want.&lt;/li&gt; &lt;li&gt;100 Make sure it is O(n), data-oriented and cache friendly&lt;/li&gt; &lt;li&gt;1000 Make sure it is multithreaded&lt;/li&gt; &lt;li&gt;10000 Think really hard about what you are doing&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;I also have a few general guidelines that I try to follow when writing new systems:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Put static data in immutable, single-allocation memory blobs&lt;/li&gt; &lt;li&gt;Allocate dynamic data in big contiguous chunks&lt;/li&gt; &lt;li&gt;Use as little memory as possible&lt;/li&gt; &lt;li&gt;Prefer arrays to complex data structures&lt;/li&gt; &lt;li&gt;Access memory linearly (in a cache friendly way)&lt;/li&gt; &lt;li&gt;Make sure procedures run in O(n) time&lt;/li&gt; &lt;li&gt;Avoid ”do nothing” updates -- instead, keep track of active objects&lt;/li&gt; &lt;li&gt;If the system handles many objects, support data parallelism&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;By now I have written so many systems in this ”style” that it doesn’t require much effort to follow these guidelines. And I know that by doing so I get a decent baseline performance. The guidelines focus on the most important low-hanging fruit: algorithmic complexity, memory access and parallelization and thus give good performance for a relatively small effort.&lt;br /&gt;&lt;br /&gt;Of course it is not always possible to follow all guidelines. For example, some algorithms really require more than O(n) time. But I know that when I go outside the guidelines I need to stop and think things through, to make sure I don’t trash the performance.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§4 Use top-down profiling to find bottlenecks&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;No matter how good your up front design is, your code will be spending time in unexpected places. The content people will use your system in crazy ways and expose bottlenecks that you’ve never thought about. There will be bugs in your code. Some of these bugs will not result in outright crashes, just bad performance. There will be things you haven’t really thought through.&lt;br /&gt;&lt;br /&gt;To understand where your program is &lt;em&gt;actually&lt;/em&gt; spending its time, a top down profiler is an invaluable tool. We use explicit profiler scopes in our code and pipe the data live over the network to an external tool that can visualize it in various ways:&lt;br /&gt;&lt;br /&gt;￼&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-LShVe1Vfeoc/TuADO41suwI/AAAAAAAAALs/fGMmNoDzwOo/s1600/Pragmatic%2BPerformance%2B1.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="178" width="400" src="http://2.bp.blogspot.com/-LShVe1Vfeoc/TuADO41suwI/AAAAAAAAALs/fGMmNoDzwOo/s400/Pragmatic%2BPerformance%2B1.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;p align="center"&gt;&lt;em&gt;An (old) screenshot of the BitSquid Profiler.&lt;/em&gt;&lt;/p&gt;&lt;br /&gt;The top-down profiler tells you where your optimization efforts need to be focused. Do you spend 60 % of the frame time in the animation system and 0.5 % in the Gui. Then any optimizations you can make to the animations will really pay off, but what you do with the Gui won’t matter one iota.&lt;br /&gt;&lt;br /&gt;With a top-down profiler you can insert narrower and narrower profiler scopes in the code to get to the root of a performance problem -- where the time is actually being spent.&lt;br /&gt;&lt;br /&gt;I use the general design guidelines to get a good baseline performance for all systems and then drill down with the top-down profiler to find those systems that need a little bit of extra optimization attention.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§5 Use bottom-up profiling to find low-level optimization targets&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;I find that as a general tool, interactive top-down profiling with explicit scopes is more useful than a bottom-up sampling profiler.&lt;br /&gt;&lt;br /&gt;But sampling profilers still have their uses. They are good at finding hotspot functions that are called from many different places and thus don’t necessary show up in a top-down profiler. Such hotspots can be a target for low-level, instruction-by-instruction optimizations. Or they can be an indication that you are doing something bad.&lt;br /&gt;&lt;br /&gt;For example if strcmp() is showing up as a hotspot, then your program is being very very naughty and should be sent straight to bed without any cocoa.&lt;br /&gt;&lt;br /&gt;A hotspot that often shows up in our code is lua_Vexecute(). This is not surprising. That is the main Lua VM function, a big switch statement that executes most of Lua’s opcodes. But it does tell us that some low level, platform specific optimizations of that function might actually result in real measurable performance benefits.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§6 Beware of synthetic benchmarks&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;I don’t do much synthetic benchmarking, i.e., looping the code 10 000 times over some made-up piece of data and measuring the execution time.&lt;br /&gt;&lt;br /&gt;If I’m at a point where I don’t know whether a change will make the code faster or not, then I want to verify that with data from an actual game. Otherwise, how can I be sure that I’m not just optimizing the benchmark in ways that won’t carry over to real world cases.&lt;br /&gt;&lt;br /&gt;A benchmark with 500 instances of the same entity, all playing the same animation is quite different from the same scene with 50 different unit types, all playing different animations. The data access patterns are completely different. Optimizations that improve one case may not matter at all in the other.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;§7 Optimization is gardening&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Programmers optimize the engine. Artists put in more stuff. It has always been thus. And it is good.&lt;br /&gt;&lt;br /&gt;Optimization is not an isolated activity that happens at a specific time. It is a part of the whole life cycle: design, maintenance and evolution. Optimization is an ongoing dialog between artists and programmers about what the capabilities of the engine should be.&lt;br /&gt;&lt;br /&gt;Managing performance is like tending a garden, checking that everything is ok, rooting out the weeds and finding ways for the plants to grow better.&lt;br /&gt;&lt;br /&gt;It is the job of the artists to push the engine to its knees. And it is the job of the programmers’ job to bring it back up again, only stronger. In the process, a middle ground will be found where the games can shine as bright as possible.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-8701337091213636484?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/8701337091213636484/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/12/pragmatic-approach-to-performance.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8701337091213636484'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8701337091213636484'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/12/pragmatic-approach-to-performance.html' title='A Pragmatic Approach to Performance'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-LShVe1Vfeoc/TuADO41suwI/AAAAAAAAALs/fGMmNoDzwOo/s72-c/Pragmatic%2BPerformance%2B1.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2294413372896659289</id><published>2011-11-07T23:30:00.000+01:00</published><updated>2011-11-07T23:30:23.643+01:00</updated><title type='text'>An Example in Data-Oriented Design: Sound Parameters</title><content type='html'>The BitSquid sound system allows arbitrary parameters to be set on playing sounds:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;force = 35.3&lt;br /&gt;material = "wood"&lt;br /&gt;weapon = "axe"&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In the sound editor the sound designer can setup curves and switches that depend on these parameters. So, for example, the designer can choose to play different wav files for a weapon impact, depending on the weapon that was used and the material it hit. In addition the volume and pitch of the sound can be controlled by a curve connected to the force of the impact.&lt;br /&gt;&lt;br /&gt;To implement this behavior, we need a way of representing such parameter sets in the engine. Since there can potentially be lots of playing sounds, we need a representation that is as efficient as possible.&lt;br /&gt;&lt;br /&gt;If you did a by-the-book C++ design of this problem, you might end up with an abomination like this:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;struct ParameterValue&lt;br /&gt;{&lt;br /&gt; enum Type {STRING_TYPE, NUMERIC_TYPE};&lt;br /&gt; Type type;&lt;br /&gt; std::string string_value;&lt;br /&gt; float numeric_value;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;typedef std::map&amp;lt;std::string, ParameterValue&amp;gt; Parameters;&lt;br /&gt;&lt;br /&gt;struct SoundInstance&lt;br /&gt;{&lt;br /&gt; // Other members...&lt;br /&gt; Parameters *parameters;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;std::vector&amp;lt;SoundInstance&amp;gt; playing_sounds;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;which would result in tons of pointer chasing, memory allocation and data copying.&lt;br /&gt;&lt;br /&gt;So let’s fix it!&lt;br /&gt;&lt;br /&gt;First, let’s get rid of the strings. Strings should almost only be used for text that is &lt;em&gt;displayed to the end user&lt;/em&gt;. For everything else, they are usually a bad idea. In this case, since the only thing we need to do is match strings that are equal (find the parameter named ”material”, check if its is value ”wood”, etc) we can use a hash instead of the full string value:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;struct ParameterValue&lt;br /&gt;{&lt;br /&gt; enum Type {STRING_TYPE, NUMERIC_TYPE};&lt;br /&gt; Type type;&lt;br /&gt; union {&lt;br /&gt;  IdString32 string_value;&lt;br /&gt;  float numeric_value;&lt;br /&gt; };&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;typedef std::map&amp;lt;IdString32, ParameterValue&amp;gt; Parameters;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;IdString32 is our type for representing hashed strings. It just stores a 4-byte string hash. Since it is a POD-type, we can put it in a union together with the numeric value. This takes the ParameterValue struct down to a manageable 8 bytes with no dynamic data allocation.&lt;br /&gt;&lt;br /&gt;But we can actually make it even smaller, by just getting rid of the type:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;union ParameterValue {&lt;br /&gt; IdString32 string_value;&lt;br /&gt; float numeric_value;&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We can do this because when we access the parameter we know which type we want. If we are evaluating a curve, we want a numeric value. If we want to compare it to a hash, we want a string value. Getting rid of the type means we can’t &lt;em&gt;assert()&lt;/em&gt; on type errors (if someone has done something silly like setting the ”material” to 3.5 or the ”force” to ”banana”). But other than that everything will work as before.&lt;br /&gt;&lt;br /&gt;Next, let’s attack the map:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;typedef std::map&amp;lt;IdString32, ParameterValue&amp;gt; Parameters;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Just like std::string, std::map should set off all kinds of warning bells in your head. std::map is almost never a good choice. Better alternatives are: linear search in a std::vector (for smallish maps), binary search in a sorted array (for larger, static maps) or hash_map.&lt;br /&gt;&lt;br /&gt;In this case, we don’t expect there to be that many parameters set on a sound (&lt;10 in the typical case), so linear search is fine:&lt;pre escaped="true" lang="cpp"&gt;&lt;br /&gt;&lt;br /&gt;struct Parameter {&lt;br /&gt;&lt;br /&gt; IdString32 key;&lt;br /&gt;&lt;br /&gt; union {&lt;br /&gt;&lt;br /&gt;  IdString32 string_value;&lt;br /&gt;&lt;br /&gt;  float numeric_value;&lt;br /&gt;&lt;br /&gt; };&lt;br /&gt;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;typedef std::vector&amp;lt;Parameter&amp;gt; Parameters;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;struct SoundInstance&lt;br /&gt;&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt; // Other members...&lt;br /&gt;&lt;br /&gt; Parameters *parameters;&lt;br /&gt;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;std::vector&amp;lt;SoundInstance&amp;gt; _playing_sounds;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;A lot better than what we started with. But I’m still not 100 % satisfied.&lt;br /&gt;&lt;br /&gt;I don’t like the fact that we have a vector of sound instances, and each of those contains a vector of parameters. Vectors-in-vectors raise performance warning flags for me. I like it when my data structures are just arrays of POD structs. Then I know that they are cache friendly and don’t put much strain on the memory system. 512 parameter vectors allocated on the heap for 512 playing sounds make me uneasy.&lt;br /&gt;&lt;br /&gt;So what can we do? We could go to a fixed number of parameters:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;struct SoundInstance&lt;br /&gt;{&lt;br /&gt; // Other members...&lt;br /&gt; unsigned num_parameters;&lt;br /&gt; Parameter parameters[MAX_INSTANCE_PARAMETERS];&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now the SoundInstance is a POD and all the data is just one big happy blob.&lt;br /&gt;&lt;br /&gt;The drawback of this approach is that you might need to set &lt;em&gt;MAX_INSTANCE_PARAMETERS&lt;/em&gt; pretty high to be able to handle the most complicated sounds. This would waste some memory for all the sounds that use just one or two parameters.&lt;br /&gt;&lt;br /&gt;Say you have 512 sounds and MAX_INSTANCE_PARAMETERS = 32, with 8 bytes in the Parameter struct that then totals to 131 K. Not terrible, but not a tuppence either.&lt;br /&gt;&lt;br /&gt;There should be some way of doing better. But if we can’t use a dynamic vector, nor a static array, what can we then possibly use?&lt;br /&gt;&lt;br /&gt;A linked list!&lt;br /&gt;&lt;br /&gt;Regular linked list have horrible cache behavior and are best stayed away from. But we can achieve the benefits of linked lists while still having decent cache performance by putting the list in an array:&lt;br /&gt; &lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;struct ParameterNode {&lt;br /&gt; IdString32 key;&lt;br /&gt; union {&lt;br /&gt;  IdString32 string_value;&lt;br /&gt;  float numeric_value;&lt;br /&gt; };&lt;br /&gt; ParameterNode *next;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;ParameterNode nodes[MAX_PARAMETERS];&lt;br /&gt;&lt;br /&gt;struct SoundInstance&lt;br /&gt;{&lt;br /&gt; // Other members...&lt;br /&gt; ParameterNode *parameters;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;std::vector&amp;lt;SoundInstance&amp;gt; playing_sounds;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now we have all the parameters stored in a single memory blob. And instead of having a maximum number of parameters per sound, we have a total limit on the number of set parameters (which works much better when most sounds have few parameters). We could get rid of that limit as well if we needed to, by using a vector instead of an array to store the nodes and indices instead of pointers for the ”links”.&lt;br /&gt;&lt;br /&gt;You can use many different strategies for allocating nodes from the array. My favorite method is to walk over the array until the next free node is found:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;unsigned last_allocated = MAX_PARAMETERS-1;&lt;br /&gt;&lt;br /&gt;Node *allocate_node()&lt;br /&gt;{&lt;br /&gt; while (true) {&lt;br /&gt;  last_allocated = (last_allocated + 1) % MAX_PARAMETERS;&lt;br /&gt;  if (nodes[last_allocated].key == 0)&lt;br /&gt;   break;&lt;br /&gt; }&lt;br /&gt; return &amp;amp;nodes[last_allocated];&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here, an empty key is used to indicate free nodes.&lt;br /&gt;&lt;br /&gt;The advantage of this method is that nodes that are allocated at the same time end up in adjacent array slots. This means that all the parameters of a particular sound (which tend to get set at the same time) get stored next to each other in memory, which means they can be accessed without cache misses.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2294413372896659289?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2294413372896659289/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/11/example-in-data-oriented-design-sound.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2294413372896659289'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2294413372896659289'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/11/example-in-data-oriented-design-sound.html' title='An Example in Data-Oriented Design: Sound Parameters'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-6230496158150463586</id><published>2011-10-23T20:45:00.000+02:00</published><updated>2011-10-23T20:45:20.223+02:00</updated><title type='text'>Low Level Animation -- Part 2</title><content type='html'>Some time ago I wrote an &lt;a href="http://bitsquid.blogspot.com/2009/11/bitsquid-low-level-animation-system.html"&gt;article&lt;/a&gt; describing how animation compression is implemented in the BitSquid engine. In that article I made a vague promise that I would follow up with a description of how to pack the data in a cache-friendly way. Now, the time has come to deliver on that vague promise.&lt;br /&gt;&lt;br /&gt;A quick recap: After curve fitting, each track of our animation consists of a number of curve points that describe the curve for each animation track:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-IRgzl_to1AU/TqRfsslr4TI/AAAAAAAAAJc/CeK5ILJGkwU/s1600/animation_1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="169" width="400" src="http://4.bp.blogspot.com/-IRgzl_to1AU/TqRfsslr4TI/AAAAAAAAAJc/CeK5ILJGkwU/s400/animation_1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;By an &lt;em&gt;animation track&lt;/em&gt; I mean the animation of a single parameter, typically the position or rotation of a bone.&lt;br /&gt;&lt;br /&gt;The data for the track is a sequence of times and curve data:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-5kQ1L8BoKUM/TqRfyvzxMyI/AAAAAAAAAJo/1-5ys5NQMfo/s1600/animation_2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="61" width="400" src="http://2.bp.blogspot.com/-5kQ1L8BoKUM/TqRfyvzxMyI/AAAAAAAAAJo/1-5ys5NQMfo/s400/animation_2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Here t_i is the time of a curve point and A_i is the corresponding curve data.&lt;br /&gt;&lt;br /&gt;To evaluate the curve at any particular point t we need the curve points both before and after the time t&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-CMscYo81tdM/TqRf5CnclRI/AAAAAAAAAJ0/bTdui16OUEQ/s1600/animation_3.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="204" width="400" src="http://3.bp.blogspot.com/-CMscYo81tdM/TqRf5CnclRI/AAAAAAAAAJ0/bTdui16OUEQ/s400/animation_3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Depending on what curve type you use (hermite, bezier, b-spline, etc) you might actually need more than two curve points to evaluate a segment, but that doesn’t really affect the discussion in this article, so for the sake of simplicity, let’s stick with two.&lt;br /&gt;&lt;br /&gt;Note that the time points for the different tracks in the animation typically do not match up. For example, one curve may be completely flat and only require one sample at the start and one sample at the end. Another curve may be complicated and require lots of samples.&lt;br /&gt;&lt;br /&gt;To simplify the discussion further, assume that the animation only contains two tracks (it is easy to generalize the solution to more tracks). We will call the curve points of one (t_i,  A_i) and the curve points of the other (s_i, B_i):&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-3AH3PF1OpMQ/TqRf-XEcBNI/AAAAAAAAAKA/QqxCcualSTU/s1600/animation_4.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="203" width="400" src="http://4.bp.blogspot.com/-3AH3PF1OpMQ/TqRf-XEcBNI/AAAAAAAAAKA/QqxCcualSTU/s400/animation_4.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;How can we organize this data to be as cache friendly as possible?&lt;br /&gt;&lt;br /&gt;The most natural approach is perhaps to sort the data first by track and then by time. Let’s see what this means for the cache. To evaluate the animation for some particular time t, we have to go into the data for each track at that time to look up the two neighboring curve points. Let’s assume that we have somehow cached our current position in each track, so that we don’t have to search for it, we will still have at least one cache miss for each track. A modern character can have over 100 bones, with two tracks per bone. That’s 200 cache misses for just a single frame of a single animation.&lt;br /&gt;&lt;br /&gt;To do better, we need to organize the data by time somehow. But it is not immediately clear how. Just sorting the data by time won’t help, because then a flat curve with just two curve points, one at the beginning and one at the end, will have them at complete opposite ends of the data and no matter what we do we will get cache misses when touching them.&lt;br /&gt;&lt;br /&gt;Let’s consider all the data we need to evaluate the tracks at time t. We need (t_i, A_i), (t_i+1, A_i+1) and (s_j, B_j), (s_j+1, B_j+1) where t_i &lt;= t &lt;= t_i+1 and s_j &lt;= t &lt;= s_j+1. This is our ”hot” data, because we will need to refer to it several times as we evaluate the curve at different points in time. In fact, we can keep using this same data until we reach whichever is smallest of t_i+1 and s_j+1.A general rule in memory access optimization is to keep the ”hot” data together, so let’s create an additional data structure, an array with the currently active curve points for a playing animation instance.&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/-DRQzOOqDci0/TqRgD-Mb9yI/AAAAAAAAAKM/ph_D7GfYAoo/s1600/animation_5.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="259" width="361" src="http://1.bp.blogspot.com/-DRQzOOqDci0/TqRgD-Mb9yI/AAAAAAAAAKM/ph_D7GfYAoo/s400/animation_5.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Now we’re getting somewhere. Not only have we significantly improved the cache behavior; as long as we don’t need to fetch new curve points we only need to refer to the active array, a single memory access. We have also decomposed our animation evaluation problem into two simpler tasks: evaluating curves and fetching new curve points. This makes our code both simpler and more flexible.&lt;br /&gt;&lt;br /&gt;Let’s look at the second issue, fetching new curve points. In the example above, when we reach the time t_i+1 we will need to fetch the new curve point (t_i+2, A_i+2) and when we reach the time s_j+1 we will need to fetch (s_j+2, B_j+2).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-5z_1TX7_ISo/TqRgLbzBlOI/AAAAAAAAAKY/uzgdcebNPwE/s1600/animation_6.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="245" width="400" src="http://1.bp.blogspot.com/-5z_1TX7_ISo/TqRgLbzBlOI/AAAAAAAAAKY/uzgdcebNPwE/s400/animation_6.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Generalizing, we always need to fetch the point (t_i, A_i) at the time t_i-1, and we always need to fetch the point (s_i, B_i) at the time s_i-1. This is excellent, because since we now the time when each of our curve points will be needed we can put them all in a single stream of data which is sorted by the time when they will be needed.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-pYuL6YaPU1w/TqRgRXKE_9I/AAAAAAAAAKk/M1sM9q0IaNM/s1600/animation_7.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="62" width="400" src="http://4.bp.blogspot.com/-pYuL6YaPU1w/TqRgRXKE_9I/AAAAAAAAAKk/M1sM9q0IaNM/s400/animation_7.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;This means that our animation player only needs to keep a single pointer into the animation stream. That pointer will always point to the next curve point that needs to be moved to the &lt;em&gt;active&lt;/em&gt; list. As time is advanced, curve points are copied from the animation data into the &lt;em&gt;active&lt;/em&gt; list and then the curve is evaluated.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-XP-lDgqVvNQ/TqRgXQ1kkxI/AAAAAAAAAKw/X2iyTRWS49E/s1600/animation_8.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="227" width="400" src="http://4.bp.blogspot.com/-XP-lDgqVvNQ/TqRgXQ1kkxI/AAAAAAAAAKw/X2iyTRWS49E/s400/animation_8.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Note the excellent cache behavior this gives us. To fetch new curve points, we just move a pointer forward in memory. And then, to evaluate the curves, we just need to access our &lt;em&gt;active&lt;/em&gt; array, a single continuous memory block. This gives us a grand total of just two memory accesses.&lt;br /&gt;&lt;br /&gt;Another nice property is that since we are now accessing the animation data as a stream (strictly linearly, from beginning to end) we can gzip it and get another factor two of compression. We can also easily stream it from disk.&lt;br /&gt;&lt;br /&gt;One drawback of this system is that it only supports playing an animation forward, you cannot jump to a particular time in an animation without ”fast forwarding” through all intermediate curve points.&lt;br /&gt;&lt;br /&gt;If you need support for jumping, the easiest way to achieve it is perhaps to add a separate index with &lt;em&gt;jump frames&lt;/em&gt;. A &lt;em&gt;jump frame&lt;/em&gt; consists of the state of the &lt;em&gt;active&lt;/em&gt; array at some point in time, together with an offset into the data stream. In other words, all the state information that the animation player needs to jump to that time point and resume playing.&lt;br /&gt;&lt;br /&gt;Using jump frames let’s you balance performance and memory use. If you add more jump frames you will use more memory but on the other hand, you will be able to find a jump frame closer to the time you &lt;em&gt;actually&lt;/em&gt; want to go to which means less fast forwarding.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-6230496158150463586?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/6230496158150463586/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/10/low-level-animation-part-2.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6230496158150463586'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6230496158150463586'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/10/low-level-animation-part-2.html' title='Low Level Animation -- Part 2'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-IRgzl_to1AU/TqRfsslr4TI/AAAAAAAAAJc/CeK5ILJGkwU/s72-c/animation_1.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4979941714963709032</id><published>2011-10-08T18:25:00.001+02:00</published><updated>2011-10-08T18:25:31.201+02:00</updated><title type='text'>Caring by Sharing: Header Hero</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-2EVlfIEiSd4/TpB42zwmuII/AAAAAAAAAIw/iYNA1tOaHWs/s1600/header_hero_1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="301" width="400" src="http://4.bp.blogspot.com/-2EVlfIEiSd4/TpB42zwmuII/AAAAAAAAAIw/iYNA1tOaHWs/s400/header_hero_1.png" /&gt;&lt;/a&gt;&lt;/div&gt;￼&lt;br /&gt;Compile times get worse over time, that is the second law of C++ programming dynamics. There are many small day-to-day changes that each exacerbate the problem slightly: The project grows. New header files get included. Clever templates get written. And so on. There are comparatively few forces that work in the other direction. Once an &lt;em&gt;#include&lt;/em&gt; has been added, it stays.&lt;br /&gt;&lt;br /&gt;The only exception is when some hero steps up, says &lt;em&gt;Enough!&lt;/em&gt; and starts to crunch down on those header files. It is thankless menial work that offers few rewards, save the knowledge that you are contributing to the public good. &lt;br /&gt;&lt;br /&gt;Today, I want to give something back to these unsung heroes, so I’ve made a small tool to make their drudgery a bit less... drudgery-ish? It is called &lt;em&gt;Header Hero&lt;/em&gt;:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-gY0ldAx_jH8/TpB4-s8sHTI/AAAAAAAAAI4/NxhMG5Oc1kU/s1600/header_hero_2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="285" width="397" src="http://1.bp.blogspot.com/-gY0ldAx_jH8/TpB4-s8sHTI/AAAAAAAAAI4/NxhMG5Oc1kU/s400/header_hero_2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;To run &lt;em&gt;Header Hero&lt;/em&gt; you specify the directories where your &lt;em&gt;.cpp&lt;/em&gt; files can be found as well as the directories to search for included headers. The program scans your &lt;em&gt;.h&lt;/em&gt; and &lt;em&gt;.cpp&lt;/em&gt; files to find all the include links. It presents the result in a summarized report that shows you what the worst headers are. You can think of it as a header file profiler.&lt;br /&gt;&lt;br /&gt;You don’t need to specify all your include directories, but only the ones you have specified will be scanned.&lt;br /&gt;&lt;br /&gt;I’ve focused on making the tool &lt;em&gt;fast&lt;/em&gt; by caching as much information as possible and using a simple parser that just looks for &lt;em&gt;#include&lt;/em&gt; patterns rather than running the real C preprocessor. The downside is that if you are using any fancy preprocessor tricks, they will most likely be missed. On the other hand, the tool can scan a huge project in seconds. And after the initial scan, new scans can be done in a fraction of that time.&lt;br /&gt;&lt;br /&gt;The program produces a report that looks something like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-aXcjWRswSYA/TpB5EhMlSVI/AAAAAAAAAJA/dVSpSrInT1U/s1600/header_hero_3.PNG" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="319" width="400" src="http://2.bp.blogspot.com/-aXcjWRswSYA/TpB5EhMlSVI/AAAAAAAAAJA/dVSpSrInT1U/s400/header_hero_3.PNG" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;At the top are some statistics, such as the total number of files and lines in the project. &lt;em&gt;Total Parsed&lt;/em&gt; counts how many lines that would actually be parsed in a full recompile of the project. So, a header that is included by several &lt;em&gt;.cpp&lt;/em&gt; files adds to that number every time. The &lt;em&gt;Blowup Factor&lt;/em&gt; are the last two items divided. It specifies how many times, on average, each line gets parsed. A value of 35 means that on average, each line in our project is parsed 35 times. That seems quite a lot.&lt;br /&gt;&lt;br /&gt;Below the summary are a list of the header files sorted by how many lines they contributed to the &lt;em&gt;Total Parsed&lt;/em&gt; number. In other words, the size of that file multiplied by the number of times it was included.&lt;br /&gt;&lt;br /&gt;Looking at the sample report above, it seems pretty reasonable. At the top we find big templated collection classes (&lt;em&gt;map, set, string, vector&lt;/em&gt;) that have big header files and are used in a lot of places. Math (&lt;em&gt;matrix4x4, vector3&lt;/em&gt;) and utility (&lt;em&gt;critical_section, file_system&lt;/em&gt;) files also end up high on the list.&lt;br /&gt;&lt;br /&gt;But when you dig into it, there are also things that seem a bit fishy. &lt;em&gt;Set&amp;lt;T&amp;gt;&lt;/em&gt; is not a very popular collection class. Sets are used less than maps, and &lt;em&gt;HashSet&lt;/em&gt; is usually preferable to &lt;em&gt;Set&lt;/em&gt;. Why does it end up so high on the list? What is &lt;em&gt;shader.h&lt;/em&gt; doing there? That seems too specialized to end up so high. And &lt;em&gt;file_system.h&lt;/em&gt;? There shouldn’t be that much code that directly accesses the file system, only the resource loader needs to do that.&lt;br /&gt;&lt;br /&gt;To answer those questions, you can click on any file in the report to get a detailed view of its relations:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-teIVN3An2Yc/TpB5LMyAvTI/AAAAAAAAAJI/Gx5JhBGjUGE/s1600/header_hero_4.PNG" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="319" width="400" src="http://3.bp.blogspot.com/-teIVN3An2Yc/TpB5LMyAvTI/AAAAAAAAAJI/Gx5JhBGjUGE/s400/header_hero_4.PNG" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;In the middle we find the file we are looking at. To the left are the files that directly include it. The number after each file name specifies how many files that directly or indirectly include &lt;em&gt;that&lt;/em&gt; file. To the right are the files included by the file. The numbers are all the files directly or indirectly included by &lt;em&gt;those&lt;/em&gt; files. You can double click on any file name in the view to refocus on it.&lt;br /&gt;&lt;br /&gt;Here we clearly see that the main culprit is &lt;em&gt;data_compiler.h&lt;/em&gt;. It includes &lt;em&gt;set.h&lt;/em&gt; and is in turn included by 316 other files. To fix the compile times we can make &lt;em&gt;data_compiler.h&lt;/em&gt; not include &lt;em&gt;set.h&lt;/em&gt; or we can try to reduce the number of files that include &lt;em&gt;data_compiler.h&lt;/em&gt; (that number also seems high). If we also fix &lt;em&gt;scene_graph.h&lt;/em&gt; we can really make a difference.&lt;br /&gt;&lt;br /&gt;Breaking dependencies is a whole topic in itself, especially when it comes to templates and inlined code. Here are some quick tips though:&lt;br /&gt;&lt;br /&gt;1) Predeclare the structs and classes that you use instead of including the header file. Don’t forget that you can predeclare templates and typedefs as well as regular classes:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;class MyClass;&lt;br /&gt;typedef int Id;&lt;br /&gt;template &amp;lt;class T&amp;gt; class Vector;&lt;/pre&gt;&lt;br /&gt;2) Predeclared types can only be used as pointers and references. You can’t have a member variable of a type whose actual size is unknown. So you may have to change your member variables to pointers in order to get rid of the header dependency. You can also use the &lt;a href="http://en.wikipedia.org/wiki/Opaque_pointer"&gt;pimpl idiom&lt;/a&gt;, if you can live with the extra indirection and lack of inlining.&lt;br /&gt;&lt;br /&gt;3) Switching from in-place variables to pointers can lead to bad memory access patterns. One way of fixing that is to placement new the object directly into a raw memory buffer.&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;// a.h&lt;br /&gt;&lt;br /&gt;class B;&lt;br /&gt;&lt;br /&gt;class A {&lt;br /&gt;    A();&lt;br /&gt;    B *_b;&lt;br /&gt;    static const int SIZE_OF_B = 20;&lt;br /&gt;    char _b_storage[SIZE_OF_B];&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;// a.cpp&lt;br /&gt;&lt;br /&gt;#include ”b.h”&lt;br /&gt;&lt;br /&gt;A::A()&lt;br /&gt;{&lt;br /&gt;    XASSERT(sizeof(B) == SIZE_OF_B);&lt;br /&gt;    _b = new (_b_storage) B();&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;With this technique, you get the data for &lt;em&gt;B&lt;/em&gt; stored inside &lt;em&gt;A&lt;/em&gt;, without having to include the &lt;em&gt;b.h&lt;/em&gt; header in &lt;em&gt;a.h&lt;/em&gt;. But the code isn’t exactly easy to read, so you should only use this in desperate situations.&lt;br /&gt;&lt;br /&gt;4) For files with small type definitions, but lots of inlined methods (e.g., &lt;em&gt;matrix4x4.h&lt;/em&gt;), a good strategy is to split the file, so you have just the type in one file and all the methods in the other. Header files can then include just the type definition, while &lt;em&gt;.cpp&lt;/em&gt; files pull in the whole shebang.&lt;br /&gt;&lt;br /&gt;Using these techniques you can get rid of the header dependencies one by one, until you are back at reasonable compile times. Since a rescan takes just a fraction of a second it is easy to see how your changes affect the compile time. Just make sure you have your integration test running, it is easy to break build configurations when you are fiddling around with the headers.&lt;br /&gt;&lt;br /&gt;Here is the result of about a day and a half of header optimization in our code base:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-0NOW2T7Z534/TpB5Q7SGRPI/AAAAAAAAAJQ/3YZNodc2S3M/s1600/header_hero_5.PNG" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="332" width="400" src="http://4.bp.blogspot.com/-0NOW2T7Z534/TpB5Q7SGRPI/AAAAAAAAAJQ/3YZNodc2S3M/s400/header_hero_5.PNG" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;From 6 million to 4.3 million lines, that’s not too shabby. We can now do a complete rebuild in 37 seconds on a reasonably modern machine. With this tool we can hopefully keep that number.&lt;br /&gt;&lt;br /&gt;You can download the C# source code here. Feel free to do whatever you like with it:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://bitbucket.org/bitsquid/header_hero"&gt;https://bitbucket.org/bitsquid/header_hero&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4979941714963709032?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4979941714963709032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/10/caring-by-sharing-header-hero.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4979941714963709032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4979941714963709032'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/10/caring-by-sharing-header-hero.html' title='Caring by Sharing: Header Hero'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-2EVlfIEiSd4/TpB42zwmuII/AAAAAAAAAIw/iYNA1tOaHWs/s72-c/header_hero_1.png' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-6244074227797836660</id><published>2011-09-23T22:32:00.000+02:00</published><updated>2011-09-24T11:46:44.151+02:00</updated><title type='text'>Managing Decoupling Part 4 -- The ID Lookup Table</title><content type='html'>Today I am going to dig deeper into an important and versatile data structure that pops up all the time in the BitSquid engine -- the ID lookup table.&lt;br /&gt;&lt;br /&gt;I have &lt;a href="http://altdevblogaday.com/2011/01/26/managing-decoupling/"&gt;already talked&lt;/a&gt; about the advantages of using IDs to refer to objects owned by other systems, but let me just quickly recap.&lt;br /&gt;&lt;br /&gt;IDs are better than direct pointers because we don’t get dangling references if the other system decides that the object needs to be destroyed.&lt;br /&gt;&lt;br /&gt;IDs are better than &lt;em&gt;shared_ptr&amp;lt;&amp;gt;&lt;/em&gt; and &lt;em&gt;weak_ptr&amp;lt;&amp;gt;&lt;/em&gt; because it allows the other system to reorganize its objects in memory, delete them at will and doesn’t require thread synchronization to maintain a reference count. They are also POD (plain old data) structures, so they can be copied and moved in memory freely, passed back and forth between C++ and Lua, etc.&lt;br /&gt;&lt;br /&gt;By an ID I simply mean an opaque data structure of &lt;em&gt;n&lt;/em&gt; bits. It has no particular meaning to us, we just use it to refer to an object. The system provides the mechanism for looking up an object based on it. Since we seldom create more than 4 billion objects, 32 bits is usually enough for the ID, so we can just use a standard integer. If a system needs a lot of objects, we can go to 64 bits.&lt;br /&gt;&lt;br /&gt;In this post I’m going to look at what data structures a system might use to do the lookup from ID to system object. There are some requirements that such data structures need to fulfill:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;There should be a 1-1 mapping between live objects and IDs.&lt;/li&gt;&lt;li&gt;If the system is supplied with an ID to an old object, it should be able to detect that the object is no longer alive.&lt;/li&gt;&lt;li&gt;Lookup from ID to object should be very fast (this is the most common operation).&lt;/li&gt;&lt;li&gt;Adding and removing objects should be fast.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Let’s look at three different ways of implementing this data structure, with increasing degrees of sophistication.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;The STL Method&lt;/h2&gt;&lt;br /&gt;The by-the-book object oriented approach is to allocate objects on the heap and use a &lt;em&gt;std::map&lt;/em&gt; to map from ID to object.&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp" line="1"&gt;typedef unsigned ID;&lt;br /&gt;&lt;br /&gt;struct System&lt;br /&gt;{&lt;br /&gt;	ID _next_id;&lt;br /&gt;	std::map&amp;lt;ID, Object *&amp;gt; _objects;&lt;br /&gt;&lt;br /&gt;	System() {_next_id = 0;}&lt;br /&gt;&lt;br /&gt;	inline bool has(ID id) {&lt;br /&gt;		return _objects.count(id) &amp;gt; 0;&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline Object &amp;amp;lookup(ID id) {&lt;br /&gt;		return *_objects[id];&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline ID add() {&lt;br /&gt;		ID id = _next_id++;&lt;br /&gt;		Object *o = new Object();&lt;br /&gt;		o-&amp;gt;id = id;&lt;br /&gt;		_objects[id] = o;&lt;br /&gt;		return id;&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline void remove(ID id) {&lt;br /&gt;		Object &amp;amp;o = lookup(id);&lt;br /&gt;		_objects.erase(id);&lt;br /&gt;		delete &amp;amp;o;&lt;br /&gt;	}&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that if we create more than four billion objects, the &lt;em&gt;_next_id&lt;/em&gt; counter will wrap around and we risk getting two objects with the same ID.&lt;br /&gt;&lt;br /&gt;Apart from that, the only problem with this solution is that it is really inefficient. All objects are allocated individually on the heap, which gives bad cache behavior and the map lookup results in tree walking which is also bad for the cache. We can switch the map to a &lt;em&gt;hash_map&lt;/em&gt; for slightly better performance, but that still leaves a lot of unnecessary pointer chasing.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Array With Holes&lt;/h2&gt;&lt;br /&gt;What we really want to do is to store our objects linearly in memory, because that will give us the best possible cache behavior. We can either use a fixed size array &lt;em&gt;Object[MAX_SIZE]&lt;/em&gt; if we know the maximum number of objects that will ever be used, or we can be more flexible and use a &lt;em&gt;std::vector&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you care about performance and use &lt;em&gt;std::vector&amp;lt;T&amp;gt;&lt;/em&gt; you should make a variant of it (call it &lt;em&gt;array&amp;lt;T&amp;gt;&lt;/em&gt; for example) that doesn’t call constructors or initializes memory. Use that for simple types, when you don’t care about initialization. A dynamic &lt;em&gt;vector&amp;lt;T&amp;gt;&lt;/em&gt; buffer that grows and shrinks a lot can spend a huge amount of time doing completely unnecessary constructor calls.&lt;br /&gt;&lt;br /&gt;To find an object in the array, we need to know its index. But just using the index as ID is not enough, because the object might have been destroyed and a new object might have been created at the same index. To check for that, we also need an id value, as before. So we make the ID type a combination of both:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp" line="1"&gt;struct ID {&lt;br /&gt;	unsigned index;&lt;br /&gt;	unsigned inner_id;&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now we can use the index to quickly look up the object and the &lt;em&gt;inner_id&lt;/em&gt; to verify its identity.&lt;br /&gt;&lt;br /&gt;Since the object index is stored in the ID which is exposed externally, once an object has been created it cannot move. When objects are deleted they will leave holes in the array.&lt;br /&gt;&lt;br /&gt;￼&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-9rHMIMlZBpc/TnzsIxoQW_I/AAAAAAAAAIY/p2eklRw6Gmw/s1600/id_lookup_1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="36" width="400" src="http://2.bp.blogspot.com/-9rHMIMlZBpc/TnzsIxoQW_I/AAAAAAAAAIY/p2eklRw6Gmw/s400/id_lookup_1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;When we create new objects we don’t just want to add them to the end of the array. We want to make sure that we fill the holes in the array first.&lt;br /&gt;&lt;br /&gt;The standard way of doing that is with a free list. We store a pointer to the first hole in a variable. In each hole we store a pointer to the next hole. These pointers thus form a linked list that enumerates all the holes.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-eKZSXXp2rZE/TnzsTIDk1oI/AAAAAAAAAIg/5Y2Ubw-Zh2g/s1600/id_lookup_2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="128" width="400" src="http://3.bp.blogspot.com/-eKZSXXp2rZE/TnzsTIDk1oI/AAAAAAAAAIg/5Y2Ubw-Zh2g/s400/id_lookup_2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;An interesting thing to note is that we usually don’t need to allocate any memory for these pointers. Since the pointers are only used for holes (i. e. dead objects) we can reuse the objects’ own memory for storing them. The objects don’t need that memory, since they are dead.&lt;br /&gt;&lt;br /&gt;Here is an implementation. For clarity, I have used an explicit member &lt;em&gt;next&lt;/em&gt; in the object for the free list rather than reusing the object’s memory:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp" line="1"&gt;struct System&lt;br /&gt;{&lt;br /&gt;	unsigned _next_inner_id;&lt;br /&gt;	std::vector&amp;lt;Object&amp;gt; _objects;&lt;br /&gt;	unsigned _freelist;&lt;br /&gt;&lt;br /&gt;	System() {&lt;br /&gt;		_next_inner_id = 0;&lt;br /&gt;		_freelist = UINT_MAX;&lt;br /&gt;	}&lt;br /&gt;&lt;br /&gt;	inline bool has(ID id) {&lt;br /&gt;		return _objects[id.index].id.inner_id == id.inner_id;&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline Object &amp;amp;lookup(ID id) {&lt;br /&gt;		return _objects[id.index];&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline ID add() {&lt;br /&gt;		ID id;&lt;br /&gt;		id.inner_id = _next_inner_id++;&lt;br /&gt;		if (_freelist == UINT_MAX) {&lt;br /&gt;			Object o;&lt;br /&gt;			id.index = _objects.size();&lt;br /&gt;			o.id = id;&lt;br /&gt;			_objects.push_back(o);&lt;br /&gt;		} else {&lt;br /&gt;			id.index = _freelist;&lt;br /&gt;			_freelist = _objects[_freelist].next;&lt;br /&gt;		}&lt;br /&gt;		return id;&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline void remove(ID id) {&lt;br /&gt;		Object &amp;amp;o = lookup(id);&lt;br /&gt;		o.id.inner_id = UINT_MAX;&lt;br /&gt;		o.next = _freelist;&lt;br /&gt;		_freelist = id.index;&lt;br /&gt;	}&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is a lot better than the STL solution. Insertion and removal is O(1). Lookup is just array indexing, which means it is very fast. In a quick-and-dirty-don’t-take-it-too-seriously test this was 40 times faster than the STL solution. In real-life it all depends on the actual usage patterns, of course.&lt;br /&gt;&lt;br /&gt;The only part of this solution that is not an improvement over the STL version is that our ID structs have increased from 32 to 64 bits.&lt;br /&gt;&lt;br /&gt;There are things that can be done about that. For example, if you never have more than 64 K objects live at the same time, you can get by with 16 bits for the index, which leaves 16 bits for the &lt;em&gt;inner_id&lt;/em&gt;. Note that the &lt;em&gt;inner_id&lt;/em&gt; doesn’t have to be globally unique, it is enough if it is unique for that index slot. So a 16 bit &lt;em&gt;inner_id&lt;/em&gt; is fine if we never create more than 64 K objects in the same index slot.&lt;br /&gt;&lt;br /&gt;If you want to go down that road you probably want to change the implementation of the free list slightly. The code above uses a standard free list implementation that acts as a LIFO stack. This means that if you create and delete objects in quick succession they will all be assigned to the same index slot which means you quickly run out of &lt;em&gt;inner_ids&lt;/em&gt; for that slot. To prevent that, you want to make sure that you always have a certain number of elements in the free list (allocate more if you run low) and rewrite it as a FIFO. If you always have &lt;em&gt;N&lt;/em&gt; free objects and use a FIFO free list, then you are guaranteed that you won’t see an inner_id collision until you have created at least &lt;em&gt;N&lt;/em&gt; * 64 K objects.&lt;br /&gt;&lt;br /&gt;Of course you can slice and dice the 32 bits in other ways if you hare different limits on the maximum number of objects. You have to crunch the numbers for your particular case to see if you can get by with a 32 bit ID.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Packed Array&lt;/h2&gt;&lt;br /&gt;One drawback with the approach sketched above is that since the index is exposed externally, the system cannot reorganize its objects in memory for maximum performance. &lt;br /&gt;&lt;br /&gt;The holes are especially troubling. At some point the system probably wants to loop over all its objects and update them. If the object array is nearly full, no problem, But if the array has 50 % objects and 50 % holes, the loop will touch twice as much memory as necessary. That seems suboptimal.&lt;br /&gt;&lt;br /&gt;We can get rid of that by introducing an extra level of indirection, where the IDs point to an array of indices that points to the objects themselves:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-rDA7nY7EI8Q/TnzsdAbMV1I/AAAAAAAAAIo/VRgddAjzesc/s1600/id_lookup_3.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="155" width="400" src="http://3.bp.blogspot.com/-rDA7nY7EI8Q/TnzsdAbMV1I/AAAAAAAAAIo/VRgddAjzesc/s400/id_lookup_3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;This means that we pay the cost of an extra array lookup whenever we resolve the ID. On the other hand, the system objects are packed tight in memory which means that they can be updated more efficiently. Note that the system update doesn’t have to touch or care about the index array. Whether this is a net win depends on how the system is used, but my guess is that in most cases more items are touched internally than are referenced externally.&lt;br /&gt;&lt;br /&gt;To remove an object with this solution we use the standard trick of swapping it with the last item in the array. Then we update the index so that it points to the new location of the swapped object.&lt;br /&gt;&lt;br /&gt;Here is an implementation. To keep things interesting, this time with a fixed array size, a 32 bit ID and a FIFO free list.&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp" line="1"&gt;typedef unsigned ID;&lt;br /&gt;&lt;br /&gt;#define MAX_OBJECTS 64*1024&lt;br /&gt;#define INDEX_MASK 0xffff&lt;br /&gt;#define NEW_OBJECT_ID_ADD 0x10000&lt;br /&gt;&lt;br /&gt;struct Index {&lt;br /&gt;	ID id;&lt;br /&gt;	unsigned short index;&lt;br /&gt;	unsigned short next;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;struct System&lt;br /&gt;{&lt;br /&gt;	unsigned _num_objects;&lt;br /&gt;	Object _objects[MAX_OBJECTS];&lt;br /&gt;	Index _indices[MAX_OBJECTS];&lt;br /&gt;	unsigned short _freelist_enqueue;&lt;br /&gt;	unsigned short _freelist_dequeue;&lt;br /&gt;&lt;br /&gt;	System() {&lt;br /&gt;		_num_objects = 0;&lt;br /&gt;		for (unsigned i=0; i&amp;lt;MAX_OBJECTS; ++i) {&lt;br /&gt;			_indices[i].id = i;&lt;br /&gt;			_indices[i].next = i+1;&lt;br /&gt;		}&lt;br /&gt;		_freelist_dequeue = 0;&lt;br /&gt;		_freelist_enqueue = MAX_OBJECTS-1;&lt;br /&gt;	}&lt;br /&gt;&lt;br /&gt;	inline bool has(ID id) {&lt;br /&gt;		Index &amp;amp;in = _indices[id &amp;amp; INDEX_MASK];&lt;br /&gt;		return in.id == id &amp;amp;&amp;amp; in.index != USHRT_MAX;&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline Object &amp;amp;lookup(ID id) {&lt;br /&gt;		return _objects[_indices[id &amp;amp; INDEX_MASK].index];&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline ID add() {&lt;br /&gt;		Index &amp;amp;in = _indices[_freelist_dequeue];&lt;br /&gt;		_freelist_dequeue = in.next;&lt;br /&gt;		in.id += NEW_OBJECT_ID_ADD;&lt;br /&gt;		in.index = _num_objects++;&lt;br /&gt;		Object &amp;amp;o = _objects[in.index];&lt;br /&gt;		o.id = in.id;&lt;br /&gt;		return o.id;&lt;br /&gt;	}&lt;br /&gt;	&lt;br /&gt;	inline void remove(ID id) {&lt;br /&gt;		Index &amp;amp;in = _indices[id &amp;amp; INDEX_MASK];&lt;br /&gt;		&lt;br /&gt;		Object &amp;amp;o = _objects[in.index];&lt;br /&gt;		o = _objects[--_num_objects];&lt;br /&gt;		_indices[o.id &amp;amp; INDEX_MASK].index = in.index;&lt;br /&gt;		&lt;br /&gt;		in.index = USHRT_MAX;&lt;br /&gt;		_indices[_freelist_enqueue].next = id &amp;amp; INDEX_MASK;&lt;br /&gt;		_freelist_enqueue = id &amp;amp; INDEX_MASK;&lt;br /&gt;	}&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-6244074227797836660?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/6244074227797836660/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/09/managing-decoupling-part-4-id-lookup.html#comment-form' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6244074227797836660'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6244074227797836660'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/09/managing-decoupling-part-4-id-lookup.html' title='Managing Decoupling Part 4 -- The ID Lookup Table'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-9rHMIMlZBpc/TnzsIxoQW_I/AAAAAAAAAIY/p2eklRw6Gmw/s72-c/id_lookup_1.png' height='72' width='72'/><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-3177776006133553555</id><published>2011-09-08T23:38:00.000+02:00</published><updated>2011-09-08T23:38:57.837+02:00</updated><title type='text'>A Simple Roll-Your-Own Documentation System</title><content type='html'>I like to roll my own documentation systems. There, I’ve said it. Not for inline documentation, mind you. For that there is Doxygen and for that I am grateful. Because while I love coding, there is fun coding and not-so-fun coding, and writing C++ parsers tends to fall in the latter category.&lt;br /&gt;&lt;br /&gt;So for inline documentation I use Doxygen, but for everything else, I roll my own. Why?&lt;br /&gt;&lt;br /&gt;I don’t want to use Word or Pages or any other word processing program because I want my documents to be plain text that can be diffed and merged when necessary. And I want to be able to output it as &lt;em&gt;clean&lt;/em&gt; HTML or in any other format I may like.&lt;br /&gt;&lt;br /&gt;I don’t want to use HTML or LaTeX or any other presentation-oriented language, because I want to be able to massage the content in various ways before presenting it. Reordering it, adding an index or a glossary, removing deprecated parts, etc. Also, writing &amp;lt;p&amp;gt; gets boring very quickly.&lt;br /&gt;&lt;br /&gt;I don’t want to use a Wiki, because I want to check in my documents together with the code, so that code versions and document versions match in the repository. I definitely &lt;em&gt;don’t&lt;/em&gt; want to manage five different Wikis, corresponding to different engine release versions. Also, Wiki markup languages tend to be verbose and obtuse.&lt;br /&gt;&lt;br /&gt;I &lt;em&gt;could&lt;/em&gt; use an existing markup language, such as DocBook, Markdown or ReStructured Text. But all of them contain lots of stuff that I don’t need and lack some stuff that I &lt;em&gt;do&lt;/em&gt; need. For example I want to include snippets of syntax highlighted Lua code, margin notes and math formulas. And I want to do it in a way that is easy to read and easy to write. Because I want there to be as few things as possible standing in the way of writing good documentation.&lt;br /&gt;&lt;br /&gt;So I roll my own. But as you will see, it is not that much work.&lt;br /&gt;&lt;br /&gt;I’ve written a fair number of markup systems over the years (perhaps one too many, but hey, that is how you learn) and I’ve settled on a pretty minimalistic structure that can be implemented in a few hundred lines of Ruby. In general, I tend to favor simple minimalistic systems over big frameworks that try to ”cover everything”. Covering everything is usually impossible and when you discover that you need new functionality, the lightweight systems are a lot easier to extend than the behemoths.&lt;br /&gt;&lt;br /&gt;There are two basic components to the system. Always two there are, a &lt;em&gt;parser&lt;/em&gt; and a &lt;em&gt;generator&lt;/em&gt;. The parser reads the source document and converts it to some kind of structured representation. The generator takes the structured representation and converts it to an output format. Here I’ll only consider HTML, because to me that is the only output format that really matters.&lt;br /&gt;&lt;br /&gt;To have something concrete to talk about, let’s use this source document, written in a syntax that I just made up:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="text"&gt;@h1 Flavors of ice cream&lt;br /&gt;&lt;br /&gt;My favorite ice cream flavors are:&lt;br /&gt;&lt;br /&gt;@li Strawberry&lt;br /&gt;@li Seagull&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;h2&gt;The Parser&lt;/h2&gt;&lt;br /&gt;The most crucial point of the system is what the structured representation should look like. How should the parser communicate with the generator? My minimalistic solution is to just let the representation be a list of lines, with each line consisting of a type marker and some text.&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="text"&gt;(:h1, ”Flavors of...”)&lt;br /&gt;(:empty, ””)&lt;br /&gt;(:text, ”My favorite...”)&lt;br /&gt;(:empty, ””)&lt;br /&gt;(:li, ”Strawberry”)&lt;br /&gt;(:li, ”Seagull”)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;To some this will probably seem like complete heresy. Surely I need some kind of hierarchical representation. How can I otherwise represent things like a list-in-a-list-in-a-cat-in-a-hat?&lt;br /&gt;&lt;br /&gt;No problem, to represent a list item nested in another list, I just use a &lt;em&gt;@li_li&lt;/em&gt; tag and a corresponding &lt;em&gt;:li_li&lt;/em&gt; type marker. If someone wants three or more levels of nesting I suggest that they rewrite their document. This is supposed to be &lt;em&gt;readable&lt;/em&gt; documentation, not Tractatus Logico-Philosophicus. I simply don’t think that deep nesting is important enough to warrant a complicated hierarchical design. As I said, I prefer the simple things in life.&lt;br /&gt;&lt;br /&gt;So, now that we know the output format, we can write the parser in under 20 lines:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="ruby" line="1"&gt;class Parser&lt;br /&gt;  attr_reader :lines&lt;br /&gt;  &lt;br /&gt;  def initialize()&lt;br /&gt;    @lines = []&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def parse(line)&lt;br /&gt;    case line&lt;br /&gt;    when /^$/&lt;br /&gt;      @lines &amp;lt;&amp;lt; {:type =&amp;gt; :empty, :line =&amp;gt; ""}&lt;br /&gt;    when /@(\S+)\s+(.*)$/&lt;br /&gt;      @lines &amp;lt;&amp;lt; {:type =&amp;gt; $1.intern, :line =&amp;gt; $2}&lt;br /&gt;    when /^(.*)$/&lt;br /&gt;      @lines &amp;lt;&amp;lt; {:type =&amp;gt; :text, :line =&amp;gt; line}&lt;br /&gt;    end&lt;br /&gt;  end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Of course you can go a lot fancier with the parser than this. For example, you can make a more Markdown-like syntax where you create lists by just starting lines with bullet points. But this doesn’t really change the basic structure, you just need to add more whens in your case-statement.&lt;br /&gt;&lt;br /&gt;One useful approach, as you make more advanced parsers, is to have markers that put the parser in a particular state. For example, you could have a marker &lt;em&gt;@lua&lt;/em&gt; that made the parser consider all the lines following it to be of type &lt;em&gt;:lua&lt;/em&gt; until the marker &lt;em&gt;@endlua&lt;/em&gt; was reached.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;The Generator&lt;/h2&gt;&lt;br /&gt;A useful trick when writing HTML generators is to always keep track of the HTML tags that you have currently opened. This lets you write a method &lt;em&gt;context(tags)&lt;/em&gt; which takes a list of tags as arguments and closes and opens tags so that exactly the tags specified in the list are open.&lt;br /&gt;&lt;br /&gt;With such a method available, it is simple to write the code for outputting tags:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="ruby" line="1"&gt;class Generator&lt;br /&gt;  def h1(line)&lt;br /&gt;    context(%W(h1 #{"a name=\"#{line}\""}))&lt;br /&gt;    print line&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def text(line)&lt;br /&gt;    context(%w(p))&lt;br /&gt;    print line&lt;br /&gt;  end&lt;br /&gt;&lt;br /&gt;  def empty(line)&lt;br /&gt;    context(%w())&lt;br /&gt;    print line&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def li(line)&lt;br /&gt;    context(%w(ul li))&lt;br /&gt;    print line&lt;br /&gt;    context(%w(ul))&lt;br /&gt;  end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Notice how this works. The &lt;em&gt;li()&lt;/em&gt; method makes sure that we are in a &lt;em&gt;&amp;lt;ul&amp;gt; &amp;lt;li&amp;gt;&lt;/em&gt; context, so it closes all other open tags and opens the right ones. Then, after printing its content, it says that the context should just be &lt;em&gt;&amp;lt;ul&amp;gt;&lt;/em&gt; which forces the closure of the &lt;em&gt;&amp;lt;li&amp;gt;&lt;/em&gt; tag. If we wanted to support the &lt;em&gt;:li_li&lt;/em&gt; tag, mentioned above, we could write it simply as:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="ruby" line="1"&gt;class Generator&lt;br /&gt;  def li_li(line)&lt;br /&gt;    context(%w(ul li ul li))&lt;br /&gt;    print line&lt;br /&gt;    context(%w(ul li ul))&lt;br /&gt;  end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Notice also that this approach allows us to just step through the lines in the data structure and print them. We don’t have to look back and forward in the data structure to find out where a &lt;em&gt;&amp;lt;ul&amp;gt;&lt;/em&gt; should begin and end.&lt;br /&gt;&lt;br /&gt;The rest of the Generator class implements the &lt;em&gt;context()&lt;/em&gt; function and handles indentation:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="ruby" line="1"&gt;class Generator&lt;br /&gt;  def initialize()&lt;br /&gt;    @out = ""&lt;br /&gt;    @context = []&lt;br /&gt;    @indent = 0&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def print(s)&lt;br /&gt;    @out &amp;lt;&amp;lt; ("  " * @indent) &amp;lt;&amp;lt; s &amp;lt;&amp;lt; "\n"&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def open(ci)&lt;br /&gt;    print "&amp;lt;#{ci}&amp;gt;"&lt;br /&gt;    @indent += 1&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def close(ci)&lt;br /&gt;    @indent -= 1&lt;br /&gt;    print "&amp;lt;/#{ci[/^\S*/]}&amp;gt;"&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def context(c)&lt;br /&gt;    i = 0&lt;br /&gt;    while @context[i] != nil &amp;amp;&amp;amp; @context[i] == c[i]&lt;br /&gt;      i += 1&lt;br /&gt;    end&lt;br /&gt;    while @context.size &amp;gt; i&lt;br /&gt;      close(@context.last)&lt;br /&gt;      @context.pop&lt;br /&gt;    end&lt;br /&gt;    while c.size &amp;gt; @context.size&lt;br /&gt;      @context.push( c[@context.size] )&lt;br /&gt;      open(@context.last)&lt;br /&gt;    end&lt;br /&gt;  end&lt;br /&gt;  &lt;br /&gt;  def format(lines)&lt;br /&gt;    lines.each {|line| self.send(line[:type], line[:line])&lt;br /&gt;    context(%w())&lt;br /&gt;    return @out&lt;br /&gt;  end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Used as:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="ruby" line="1"&gt;parser = Parser.new&lt;br /&gt;text.each_line {|line| parser.parse(line)}&lt;br /&gt;puts Generator.new.format(parser.lines)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So there you have it, the start of a custom documentation system, easy to extend with new tags in under 100 lines of Ruby code.&lt;br /&gt;&lt;br /&gt;There are some things I haven’t touched on here, like TOC generation or inline formatting (bold and emphasized text). But it is easy to write them as extensions of this basic system. For example, the TOC could be generated with an additional pass over the structured data. If there is enough interest I could show an example in a follow-up post.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-3177776006133553555?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/3177776006133553555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/09/simple-roll-your-own-documentation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3177776006133553555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3177776006133553555'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/09/simple-roll-your-own-documentation.html' title='A Simple Roll-Your-Own Documentation System'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-9177141622504759963</id><published>2011-08-25T15:55:00.000+02:00</published><updated>2011-08-25T15:55:13.312+02:00</updated><title type='text'>Code Snippet: Murmur hash inverse / pre-image</title><content type='html'>Today's caring by sharing. I needed this non-trivial code snippet today and couldn't find it anywhere on the internet, so here it is for future reference. It computes the inverse / pre-image of a murmur hash. I. e., given a 32 bit murmur hash value, it computes a 32 bit value that when hashed produces that hash value:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;/// Inverts a (h ^= h &gt;&gt; s) operation with 8 &lt;= s &lt;= 16&lt;br /&gt;unsigned int invert_shift_xor(unsigned int hs, unsigned int s)&lt;br /&gt;{&lt;br /&gt;	XENSURE(s &gt;= 8 &amp;&amp; s &lt;= 16);&lt;br /&gt;	unsigned hs0 = hs &gt;&gt; 24;&lt;br /&gt;	unsigned hs1 = (hs &gt;&gt; 16) &amp; 0xff;&lt;br /&gt;	unsigned hs2 = (hs &gt;&gt; 8) &amp; 0xff;&lt;br /&gt;	unsigned hs3 = hs &amp; 0xff;&lt;br /&gt;&lt;br /&gt;	unsigned h0 = hs0;&lt;br /&gt;	unsigned h1 = hs1 ^ (h0 &gt;&gt; (s-8));&lt;br /&gt;	unsigned h2 = (hs2 ^ (h0 &lt;&lt; (16-s)) ^ (h1 &gt;&gt; (s-8))) &amp; 0xff;&lt;br /&gt;	unsigned h3 = (hs3 ^ (h1 &lt;&lt; (16-s)) ^ (h2 &gt;&gt; (s-8))) &amp; 0xff;&lt;br /&gt;	return (h0&lt;&lt;24) + (h1&lt;&lt;16) + (h2&lt;&lt;8) + h3;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;unsigned int murmur_hash_inverse(unsigned int h, unsigned int seed)&lt;br /&gt;{&lt;br /&gt;	const unsigned int m = 0x5bd1e995;&lt;br /&gt;	const unsigned int minv = 0xe59b19bd;	// Multiplicative inverse of m under % 2^32&lt;br /&gt;	const int r = 24;&lt;br /&gt;&lt;br /&gt;	h = invert_shift_xor(h,15);&lt;br /&gt;	h *= minv;&lt;br /&gt;	h = invert_shift_xor(h,13);&lt;br /&gt;&lt;br /&gt;	unsigned int hforward = seed ^ 4;&lt;br /&gt;	hforward *= m;&lt;br /&gt;	unsigned int k = hforward ^ h;&lt;br /&gt;	k *= minv;&lt;br /&gt;	k ^= k &gt;&gt; r;&lt;br /&gt;	k *= minv;&lt;br /&gt;&lt;br /&gt;	#ifdef PLATFORM_BIG_ENDIAN&lt;br /&gt;		char *data = (char *)&amp;k;&lt;br /&gt;		k = (data[0]) + (data[1] &lt;&lt; 8) + (data[2] &lt;&lt; 16) + (data[3] &lt;&lt; 24);&lt;br /&gt;	#endif&lt;br /&gt;&lt;br /&gt;	return k;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;And for reference, here is the full code, with both the regular murmur hash and the inverses for 32- and 64-bit hashes:&lt;pre&gt;unsigned int murmur_hash ( const void * key, int len, unsigned int seed )&lt;br /&gt;{&lt;br /&gt;	// 'm' and 'r' are mixing constants generated offline.&lt;br /&gt;	// They're not really 'magic', they just happen to work well.&lt;br /&gt;&lt;br /&gt;	const unsigned int m = 0x5bd1e995;&lt;br /&gt;	const int r = 24;&lt;br /&gt;&lt;br /&gt;	// Initialize the hash to a 'random' value&lt;br /&gt;&lt;br /&gt;	unsigned int h = seed ^ len;&lt;br /&gt;&lt;br /&gt;	// Mix 4 bytes at a time into the hash&lt;br /&gt;&lt;br /&gt;	const unsigned char * data = (const unsigned char *)key;&lt;br /&gt;&lt;br /&gt;	while(len &gt;= 4)&lt;br /&gt;	{&lt;br /&gt;		#ifdef PLATFORM_BIG_ENDIAN&lt;br /&gt;			unsigned int k = (data[0]) + (data[1] &lt;&lt; 8) + (data[2] &lt;&lt; 16) + (data[3] &lt;&lt; 24);&lt;br /&gt;		#else&lt;br /&gt;			unsigned int k = *(unsigned int *)data;&lt;br /&gt;		#endif&lt;br /&gt;&lt;br /&gt;		k *= m;&lt;br /&gt;		k ^= k &gt;&gt; r;&lt;br /&gt;		k *= m;&lt;br /&gt;&lt;br /&gt;		h *= m;&lt;br /&gt;		h ^= k;&lt;br /&gt;&lt;br /&gt;		data += 4;&lt;br /&gt;		len -= 4;&lt;br /&gt;	}&lt;br /&gt;&lt;br /&gt;	// Handle the last few bytes of the input array&lt;br /&gt;&lt;br /&gt;	switch(len)&lt;br /&gt;	{&lt;br /&gt;	case 3: h ^= data[2] &lt;&lt; 16;&lt;br /&gt;	case 2: h ^= data[1] &lt;&lt; 8;&lt;br /&gt;	case 1: h ^= data[0];&lt;br /&gt;		h *= m;&lt;br /&gt;	};&lt;br /&gt;&lt;br /&gt;	// Do a few final mixes of the hash to ensure the last few&lt;br /&gt;	// bytes are well-incorporated.&lt;br /&gt;&lt;br /&gt;	h ^= h &gt;&gt; 13;&lt;br /&gt;	h *= m;&lt;br /&gt;	h ^= h &gt;&gt; 15;&lt;br /&gt;&lt;br /&gt;	return h;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;/// Inverts a (h ^= h &gt;&gt; s) operation with 8 &lt;= s &lt;= 16&lt;br /&gt;unsigned int invert_shift_xor(unsigned int hs, unsigned int s)&lt;br /&gt;{&lt;br /&gt;	XENSURE(s &gt;= 8 &amp;&amp; s &lt;= 16);&lt;br /&gt;	unsigned hs0 = hs &gt;&gt; 24;&lt;br /&gt;	unsigned hs1 = (hs &gt;&gt; 16) &amp; 0xff;&lt;br /&gt;	unsigned hs2 = (hs &gt;&gt; 8) &amp; 0xff;&lt;br /&gt;	unsigned hs3 = hs &amp; 0xff;&lt;br /&gt;&lt;br /&gt;	unsigned h0 = hs0;&lt;br /&gt;	unsigned h1 = hs1 ^ (h0 &gt;&gt; (s-8));&lt;br /&gt;	unsigned h2 = (hs2 ^ (h0 &lt;&lt; (16-s)) ^ (h1 &gt;&gt; (s-8))) &amp; 0xff;&lt;br /&gt;	unsigned h3 = (hs3 ^ (h1 &lt;&lt; (16-s)) ^ (h2 &gt;&gt; (s-8))) &amp; 0xff;&lt;br /&gt;	return (h0&lt;&lt;24) + (h1&lt;&lt;16) + (h2&lt;&lt;8) + h3;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;unsigned int murmur_hash_inverse(unsigned int h, unsigned int seed)&lt;br /&gt;{&lt;br /&gt;	const unsigned int m = 0x5bd1e995;&lt;br /&gt;	const unsigned int minv = 0xe59b19bd;	// Multiplicative inverse of m under % 2^32&lt;br /&gt;	const int r = 24;&lt;br /&gt;&lt;br /&gt;	h = invert_shift_xor(h,15);&lt;br /&gt;	h *= minv;&lt;br /&gt;	h = invert_shift_xor(h,13);&lt;br /&gt;&lt;br /&gt;	unsigned int hforward = seed ^ 4;&lt;br /&gt;	hforward *= m;&lt;br /&gt;	unsigned int k = hforward ^ h;&lt;br /&gt;	k *= minv;&lt;br /&gt;	k ^= k &gt;&gt; r;&lt;br /&gt;	k *= minv;&lt;br /&gt;&lt;br /&gt;	#ifdef PLATFORM_BIG_ENDIAN&lt;br /&gt;		char *data = (char *)&amp;k;&lt;br /&gt;		k = (data[0]) + (data[1] &lt;&lt; 8) + (data[2] &lt;&lt; 16) + (data[3] &lt;&lt; 24);&lt;br /&gt;	#endif&lt;br /&gt;&lt;br /&gt;	return k;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;uint64 murmur_hash_64(const void * key, int len, uint64 seed)&lt;br /&gt;{&lt;br /&gt;	const uint64 m = 0xc6a4a7935bd1e995ULL;&lt;br /&gt;	const int r = 47;&lt;br /&gt;&lt;br /&gt;	uint64 h = seed ^ (len * m);&lt;br /&gt;&lt;br /&gt;	const uint64 * data = (const uint64 *)key;&lt;br /&gt;	const uint64 * end = data + (len/8);&lt;br /&gt;&lt;br /&gt;	while(data != end)&lt;br /&gt;	{&lt;br /&gt;		#ifdef PLATFORM_BIG_ENDIAN&lt;br /&gt;			uint64 k = *data++;&lt;br /&gt;			char *p = (char *)&amp;k;&lt;br /&gt;			char c;&lt;br /&gt;			c = p[0]; p[0] = p[7]; p[7] = c;&lt;br /&gt;			c = p[1]; p[1] = p[6]; p[6] = c;&lt;br /&gt;			c = p[2]; p[2] = p[5]; p[5] = c;&lt;br /&gt;			c = p[3]; p[3] = p[4]; p[4] = c;&lt;br /&gt;		#else&lt;br /&gt;			uint64 k = *data++;&lt;br /&gt;		#endif&lt;br /&gt;&lt;br /&gt;		k *= m;&lt;br /&gt;		k ^= k &gt;&gt; r;&lt;br /&gt;		k *= m;&lt;br /&gt;		&lt;br /&gt;		h ^= k;&lt;br /&gt;		h *= m;&lt;br /&gt;	}&lt;br /&gt;&lt;br /&gt;	const unsigned char * data2 = (const unsigned char*)data;&lt;br /&gt;&lt;br /&gt;	switch(len &amp; 7)&lt;br /&gt;	{&lt;br /&gt;	case 7: h ^= uint64(data2[6]) &lt;&lt; 48;&lt;br /&gt;	case 6: h ^= uint64(data2[5]) &lt;&lt; 40;&lt;br /&gt;	case 5: h ^= uint64(data2[4]) &lt;&lt; 32;&lt;br /&gt;	case 4: h ^= uint64(data2[3]) &lt;&lt; 24;&lt;br /&gt;	case 3: h ^= uint64(data2[2]) &lt;&lt; 16;&lt;br /&gt;	case 2: h ^= uint64(data2[1]) &lt;&lt; 8;&lt;br /&gt;	case 1: h ^= uint64(data2[0]);&lt;br /&gt;			h *= m;&lt;br /&gt;	};&lt;br /&gt; &lt;br /&gt;	h ^= h &gt;&gt; r;&lt;br /&gt;	h *= m;&lt;br /&gt;	h ^= h &gt;&gt; r;&lt;br /&gt;&lt;br /&gt;	return h;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;uint64 murmur_hash_64_inverse(uint64 h, uint64 seed)&lt;br /&gt;{&lt;br /&gt;	const uint64 m = 0xc6a4a7935bd1e995ULL;&lt;br /&gt;	const uint64 minv = 0x5f7a0ea7e59b19bdULL; // Multiplicative inverse of m under % 2^64&lt;br /&gt;	const int r = 47;&lt;br /&gt;&lt;br /&gt;	h ^= h &gt;&gt; r;&lt;br /&gt;	h *= minv;&lt;br /&gt;	h ^= h &gt;&gt; r;&lt;br /&gt;	h *= minv;&lt;br /&gt;&lt;br /&gt;	uint64 hforward = seed ^ (8 * m);&lt;br /&gt;	uint64 k = h ^ hforward;&lt;br /&gt;&lt;br /&gt;	k *= minv;&lt;br /&gt;	k ^= k &gt;&gt; r;&lt;br /&gt;	k *= minv;&lt;br /&gt;&lt;br /&gt;	#ifdef PLATFORM_BIG_ENDIAN&lt;br /&gt;		char *p = (char *)&amp;k;&lt;br /&gt;		char c;&lt;br /&gt;		c = p[0]; p[0] = p[7]; p[7] = c;&lt;br /&gt;		c = p[1]; p[1] = p[6]; p[6] = c;&lt;br /&gt;		c = p[2]; p[2] = p[5]; p[5] = c;&lt;br /&gt;		c = p[3]; p[3] = p[4]; p[4] = c;&lt;br /&gt;	#endif&lt;br /&gt;	&lt;br /&gt;	return k;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-9177141622504759963?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/9177141622504759963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/08/code-snippet-murmur-hash-inverse-pre.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/9177141622504759963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/9177141622504759963'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/08/code-snippet-murmur-hash-inverse-pre.html' title='Code Snippet: Murmur hash inverse / pre-image'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-3503951135457086019</id><published>2011-08-24T23:48:00.000+02:00</published><updated>2011-08-24T23:48:27.031+02:00</updated><title type='text'>An idea for better watch windows</title><content type='html'>Watch windows suck. I’ve spent a large part of my career looking at them (that’s how those bugs get fixed) and it’s often a frustrating experience.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-7FP3PoQXPkc/TlVxEBhbnzI/AAAAAAAAAGw/8fnVWS-Lw9I/s1600/image1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="96" width="400" src="http://3.bp.blogspot.com/-7FP3PoQXPkc/TlVxEBhbnzI/AAAAAAAAAGw/8fnVWS-Lw9I/s400/image1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Visual Studio’s watch window is one of the better ones, but it still has many issues that make the debugging experience a lot less pleasant than it could be.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;	&lt;li&gt;Custom data types such as &lt;em&gt;MyTree&lt;/em&gt;, &lt;em&gt;MyHashSet&lt;/em&gt; and &lt;em&gt;MyLinkedList&lt;/em&gt; are difficult to look at. To get to the content you have to understand the internal data layout and expand the links by hand.&lt;/li&gt;	&lt;li&gt;I like to pack my resource data into &lt;a href="http://bitsquid.blogspot.com/2010/02/blob-and-i.html" title="The Blob and I"&gt;tight static blobs&lt;/a&gt; -- &lt;em&gt;file formats for memory&lt;/em&gt;. A simple such blob might have a header with a variable number of offsets into a buffer of tightly packed strings. Such memory layouts cannot be described with just C structs and the watch window can’t inspect them. You have to cast pointers by hand or use the &lt;em&gt;Memory&lt;/em&gt; view.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-EF7butDHj-s/TlVxSdWHXII/AAAAAAAAAG4/_nnXkWRWBIE/s1600/image2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="191" width="400" src="http://4.bp.blogspot.com/-EF7butDHj-s/TlVxSdWHXII/AAAAAAAAAG4/_nnXkWRWBIE/s400/image2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;em&gt;I don’t even see the code. All I see is a hermite curve fitted, time key sorted, zlib compressed reload animation.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;	&lt;li&gt;If I have an array with 10 000 floats and one of them is a &lt;em&gt;#NaN&lt;/em&gt;, I have no way of finding out except to expand it and scroll through the numbers until I find the bad one.&lt;/li&gt;	&lt;li&gt;The watch window can’t do reverse lookup of string hashes, so when I see a hash value in the data I have no idea what it refers to.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Yes, I know that some of these things can be fixed. I know that you can get the Visual Studio Debugger to understand your own data types by editing &lt;em&gt;autoexp.dat&lt;/em&gt;. And since I’ve done that for all our major collection types (&lt;em&gt;Vector&lt;/em&gt;, &lt;em&gt;Deque&lt;/em&gt;, &lt;em&gt;Map&lt;/em&gt;, &lt;em&gt;SortMap&lt;/em&gt;, &lt;em&gt;HashMap&lt;/em&gt;, &lt;em&gt;Set&lt;/em&gt;, &lt;em&gt;SortSet&lt;/em&gt;, &lt;em&gt;HashSet&lt;/em&gt;, &lt;em&gt;ConstConfigValue&lt;/em&gt; and &lt;em&gt;DynamicConfigValue&lt;/em&gt;) I know what a pain it is, and I know I don’t want to do it any more. Also, it doesn’t help the debuggers for the other platforms.&lt;br /&gt;&lt;br /&gt;I also know that you can do some tricks with Visual Studio extensions. At my previous company we had reverse hash lookup through a Visual Studio extension. That was also painful to write, and a single platform solution.&lt;br /&gt;&lt;br /&gt;So yes, you can fix some things and will make your work environment a little better. But I think we should aim higher.&lt;br /&gt;&lt;br /&gt;Consider this: The variable watcher has access to the entire game memory &lt;em&gt;and&lt;/em&gt; plenty of time to analyze it. (Variable watching is not a time critical task.)&lt;br /&gt;&lt;br /&gt;Imagine what a well written C program that knew the layout of all your data structures could do with that information. It could expand binary trees and display them in a nice view, reverse lookup your hashes, highlight uninitialized &lt;em&gt;0xdeadbeef&lt;/em&gt; variables, spell check your strings, etc.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;The idea&lt;/h2&gt;&lt;br /&gt;So this is my idea: instead of writing plug-ins and extensions for all the IDEs and platforms in the world, we write the watcher as a separate external program. The user starts the program, connects to a process, enters a memory address and a variable type and gets presented with a nice view of the data:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-PqsAOg5_LyQ/TlVxYgNuXtI/AAAAAAAAAHA/j-vENc_466k/s1600/image3.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="315" width="400" src="http://4.bp.blogspot.com/-PqsAOg5_LyQ/TlVxYgNuXtI/AAAAAAAAAHA/j-vENc_466k/s400/image3.png" /&gt;&lt;/a&gt;&lt;/div&gt;￼&lt;br /&gt;The connection backend would be customizable so that we could use it both for local processes and remote devices (Xbox/PS3). The front end sends an &lt;em&gt;(address, size)&lt;/em&gt; request and the backend replies with a bunch of data. So the platform doesn’t matter. As long as there is some way of accessing the memory of the device we can connect it to the watcher.&lt;br /&gt;&lt;br /&gt;We can even use it to look at file contents. All we need is a backend that can return data from different offsets in the file. This works especially well for &lt;a href="http://bitsquid.blogspot.com/2010/02/blob-and-i.html" title="The Blob and I"&gt;data blobs&lt;/a&gt;, where the file and memory formats are identical. The watcher would function as a general data viewer that could be used for both files and memory.&lt;br /&gt;&lt;br /&gt;For this to work, we need a way to describe our data structures to the program. It should understand regular C structs, of course, but we also need some way of describing more complex data, such as variable length objects, offsets, choices, etc. Essentially, what we need is a generic way to describe blobs of structured data, no matter what the format and layout.&lt;br /&gt;&lt;br /&gt;I’m not sure what such a description language might look like (or if one already exists), but it might be something loosely based on C structs and then extended to cover more cases. Perhaps something like:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="C"&gt;struct Data&lt;br /&gt;{&lt;br /&gt;	zero_terminated char[] name;&lt;br /&gt;	pad_to_4_bytes_alignment;&lt;br /&gt;	platform_endian unsigned count;&lt;br /&gt;	Entry entries[count];&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The program also needs an extension mechanism so that we can write custom code for processing objects that can’t be described using even this more advanced syntax. This could be used for things like reverse hash lookups, or other queries that depend on external data.&lt;br /&gt;&lt;br /&gt;Going further the program could be extended with more visualizers that could allow you to view and edit complex objects in lots of interesting ways:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-BG56JmdwxDY/TlVxfQ7TSsI/AAAAAAAAAHI/5FdvYoR3XZo/s1600/image4.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="315" width="400" src="http://4.bp.blogspot.com/-BG56JmdwxDY/TlVxfQ7TSsI/AAAAAAAAAHI/5FdvYoR3XZo/s400/image4.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I think this could be a really useful tool, both for debugging and for inspecting files (as a sort of beefed up hex editor). All I need is some time to write it.&lt;br /&gt;&lt;br /&gt;What do you think?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-3503951135457086019?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/3503951135457086019/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/08/idea-for-better-watch-windows.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3503951135457086019'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3503951135457086019'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/08/idea-for-better-watch-windows.html' title='An idea for better watch windows'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-7FP3PoQXPkc/TlVxEBhbnzI/AAAAAAAAAGw/8fnVWS-Lw9I/s72-c/image1.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1283778140961294373</id><published>2011-08-09T09:14:00.000+02:00</published><updated>2011-08-09T09:14:01.015+02:00</updated><title type='text'>Fixing memory issues in Lua</title><content type='html'>Garbage collection can be both a blessing and a curse. On the one hand, it frees you from manually managing memory. This saves development time, reduces bugs, and avoids tricky decisions about objects' ownerships and lifetimes.&lt;br /&gt;&lt;br /&gt;On the other hand, when you &lt;i&gt;do&lt;/i&gt; run into memory issues (and you most likely will), they can be a lot harder to diagnose and fix, because you don't have detailed control over how memory is allocated and freed.&lt;br /&gt;&lt;br /&gt;In this post I'll show some techniques that you can use to address memory issues in Lua (and by extension, in other garbage collected languages).&lt;br /&gt;&lt;br /&gt;All Lua memory issues essentially boil down to one of two things:&lt;br /&gt;&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;Lua uses too much memory&lt;/dt&gt;&lt;dd&gt;On consoles memory is a precious resource and sometimes Lua is just using too much of it. The root cause can either be memory leaks or badly constructed/bloated data structures.&lt;/dd&gt;&lt;dt&gt;Garbage collection is taking too long&lt;/dt&gt;&lt;dd&gt;Too much garbage collection is (not surprisingly) caused by having too much garbage. The code must be rewritten so that it generates less garbage.&lt;/dd&gt;&lt;/dl&gt;&lt;br /&gt;Let's look at each issue in turn and see how we can address it.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;1. Lua uses too much memory&lt;/h2&gt;&lt;br /&gt;The first step towards plugging leaks and reducing memory use is to find out where the memory is going. Once we know that, the problems are usually quite easy to fix.&lt;br /&gt;&lt;br /&gt;So how do we find out where the memory is going? One way would be to add tracing code to the &lt;i&gt;lua_Alloc()&lt;/i&gt; function, but actually there is a much simpler method that doesn't require any C code and is more in line with Lua's dynamic nature. We can just use Lua to count all the objects in the runtime image:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;function count_all(f)&lt;br /&gt;	local seen = {}&lt;br /&gt;	local count_table&lt;br /&gt;	count_table = function(t)&lt;br /&gt;		if seen[t] then return end&lt;br /&gt;		f(t)&lt;br /&gt;		seen[t] = true&lt;br /&gt;		for k,v in pairs(t) do&lt;br /&gt;			if type(v) == "table" then&lt;br /&gt;				count_table(v)&lt;br /&gt;			elseif type(v) == "userdata" then&lt;br /&gt;				f(v)&lt;br /&gt;			end&lt;br /&gt;		end&lt;br /&gt;	end&lt;br /&gt;	count_table(_G)&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here we just start with the global table &lt;i&gt;_G&lt;/i&gt; and recursively enumerate all subtables and userdata. For each object that we haven't seen before, we call the enumeration function &lt;i&gt;f&lt;/i&gt;. This will enumerate all the objects in the Lua runtime that can be reached from &lt;i&gt;_G&lt;/i&gt;. Depending on how you use Lua you may also want to add some code for enumerating objects stored in the registry, and recurse over metatables and function upvalues to make sure that you really count all the objects in the runtime.&lt;br /&gt;&lt;br /&gt;Once you have a function for enumerating all your Lua objects, there are lots of useful things you can do. When it comes to plugging leaks and reducing memory usage I find one of the most useful things is to count the number of objects of each type:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;function type_count()&lt;br /&gt;	local counts = {}&lt;br /&gt;	local enumerate = function (o)&lt;br /&gt;		local t = type_name(o)&lt;br /&gt;		counts[t] = (counts[t] or 0) + 1&lt;br /&gt;	end&lt;br /&gt;	count_all(enumerate)&lt;br /&gt;	return counts&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here &lt;i&gt;type_name()&lt;/i&gt; is a function that returns the name of an object's type. This function will depend on what kind of class/object system you use in your Lua runtime. One common approach is to have global class objects that also act as metatables for objects:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;-- A class&lt;br /&gt;Car = {}&lt;br /&gt;Car.__index = Car&lt;br /&gt;&lt;br /&gt;-- A method&lt;br /&gt;function Car.honk(self)&lt;br /&gt;	print "toot"&lt;br /&gt;end&lt;br /&gt;&lt;br /&gt;-- An object&lt;br /&gt;local my_car = {}&lt;br /&gt;setmetatable(my_car, Car)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In this case, the &lt;i&gt;type_name()&lt;/i&gt; function could look something like this:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;global_type_table = nil&lt;br /&gt;function type_name(o)&lt;br /&gt;	if global_type_table == nil then&lt;br /&gt;		global_type_table = {}&lt;br /&gt;		for k,v in pairs(_G) do&lt;br /&gt;			global_type_table[v] = k&lt;br /&gt;		end&lt;br /&gt;		global_type_table[0] = "table"&lt;br /&gt;	end&lt;br /&gt;	return global_type_table[getmetatable(o) or 0] or "Unknown"&lt;br /&gt;end&lt;/pre&gt;&lt;br /&gt;The object count usually gives you a good idea of where your memory problems lie. For example, if the number of &lt;i&gt;AiPathNode&lt;/i&gt; objects constantly rises, you can conclude that you are somehow leaking those objects. If you have 200&amp;nbsp;000 &lt;i&gt;GridCell&lt;/i&gt; objects you should write a smarter grid implementation.&lt;br /&gt;&lt;br /&gt;You can also use this enumeration technique to pinpoint problems further if necessary. For example, if you are hunting for leaks, you can rewrite the &lt;i&gt;count_all()&lt;/i&gt; function so that it keeps track of the sub keys where an object were found. In this way, you might see that the &lt;i&gt;AiPathNode&lt;/i&gt; objects can be accessed through paths like:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;_G.managers.ai_managers.active_paths[2027]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then you know that the source of the leak is that paths never get removed from the &lt;i&gt;active_paths&lt;/i&gt; table.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;2. Garbage collection is taking too long&lt;/h2&gt;&lt;br /&gt;Garbage collection is a very cache unfriendly task that can have a significant performance impact. This is especially frustrating since garbage collection doesn't really &lt;i&gt;do&lt;/i&gt; anything. Well, it lets your gameplay programmers work faster and with fewer bugs, but when you have reached the optimization phase you tend to forget about that and just swear at the slow collector.&lt;br /&gt;&lt;br /&gt;Lua's default garbage collection scheme is not adapted for realtime software and if you just run it straight up you will get lots of disturbing frame rate hitches. As has already been mentioned in previous #AltDevBlogADay articles, it is better to use a step size of 0 and just run the garbage collector for a certain number of milliseconds every frame:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;OpaqueTimeValue start = time();&lt;br /&gt;while (milliseconds_elapsed_since(start) &amp;lt; milliseconds_to_run)&lt;br /&gt;	lua_gc(L, LUA_GCSTEP, 0);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that you can run this garbage collection on any thread, as long as Lua is not running at the same time, so you might be able to offset some of the cost by running the garbage collection on a background thread while your main thread is doing something non-Lua related.&lt;br /&gt;&lt;br /&gt;How much time should you spend on garbage collection? A tricky question. If you spend too little, the garbage will grow and you will eventually run out of memory. If you spend too much, you are wasting precious milliseconds.&lt;br /&gt;&lt;br /&gt;My preferred solution is to use a feedback mechanism. I dynamically adjust the garbage collection time so that the amount of garbage always stays below 10 % of the total Lua memory. If the garbage goes above that, I increase the collection time. If the garbage goes below, I decrease the collection time. As with all feedback mechanisms is a good idea to plot the curves for memory use and garbage collection time as you tweak the feedback parameters. That way you can verify that the system behaves nicely and that the curves settle down in a stable state rather than going into oscillation.&lt;br /&gt;&lt;br /&gt;Choosing the figure 10 % is a balance between memory use and performance. If you choose a higher value, your program will use more memory (because of the increased amount of garbage). On the other hand, you can give the garbage collection a smaller time slice. I've chosen a pretty low number, because on consoles, memory is always precious. If you are targeting a platform with more memory, you can go higher.&lt;br /&gt;&lt;br /&gt;Let's compute how much time we need to spend on garbage collection to stay below a certain fraction &lt;i&gt;0&amp;nbsp;&lt;=&amp;nbsp;a&amp;nbsp;&lt;=&amp;nbsp;1&lt;/i&gt; of garbage. Assume that we complete a full garbage collection cycle (scan all Lua memory) in time &lt;i&gt;t&lt;/i&gt;. The amount of garbage generated in that time will be:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;t g&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Where &lt;i&gt;g&lt;/i&gt; is the garbage/s created by the program. To make sure that we stay below a fraction &lt;i&gt;a&lt;/i&gt; we must have (where &lt;i&gt;m&lt;/i&gt; is the total memory used by the program, including the garbage):&lt;br /&gt;&lt;br /&gt;&lt;i&gt;t g &lt;= a m&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Assume that we sweep &lt;i&gt;s&lt;/i&gt; bytes/s. Then the time &lt;i&gt;t&lt;/i&gt; required to sweep the entire memory &lt;i&gt;m&lt;/i&gt; will be:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;t = m / s&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Combining the two equations we get:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;s &lt;= g / a&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;So the amount of garbage collection work we need to do per frame is directly proportional to the amount of garbage / s generated by the program and inversely proportional to the fraction of garbage we are willing to accept. (Note that interestingly, &lt;i&gt;m&lt;/i&gt; cancels out of the equation.)&lt;br /&gt;&lt;br /&gt;So, if we are willing to spend more memory, we can address garbage collection problems by increasing &lt;i&gt;a&lt;/i&gt;. But since &lt;i&gt;a&lt;/i&gt; can never be higher than 1, there are limits to what we can achieve in this way. A better option, that doesn't cost any memory, is to reduce &lt;i&gt;g&lt;/i&gt; -- the amount of garbage generated.&lt;br /&gt;&lt;br /&gt;In my experience, most garbage generation problems are "easy mistakes" from sloppy and thoughtless programming. Once you know where the problems are, it is usually not hard to rewrite the code so that garbage generation is avoided. Some useful refactoring techniques are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Update the fields in an existing table instead of creating a new one.&lt;/li&gt;&lt;li&gt;Return a reference to an object member rather than a copy. Copy only when needed.&lt;/li&gt;&lt;li&gt;Write functions so that they take and return values rather than tables to avoid temporary tables. I. e., &lt;i&gt;make_point(2,3)&lt;/i&gt; rather than &lt;i&gt;make_point({2,3})&lt;/i&gt;.&lt;/li&gt;&lt;li&gt;If you need temporary objects, find a way of reusing them so you don't need to create so many of them.&lt;/li&gt;&lt;li&gt;Avoid excessive string concatenation.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Of course a key requirement for this to work is that your Lua-to-C bindings are written so that they don't generate garbage. Otherwise your poor gameplay programmer has no chance. In my opinion, it should be possible to call any C function in a "garbage free" way (though you may choose to also have a more convenient path that &lt;i&gt;does&lt;/i&gt; generate garbage). For tips on how to write garbage free bindings, see my previous posts on &lt;a href="http://altdevblogaday.com/2011/06/26/lightweight-lua-bindings/"&gt;Lightweight Lua Bindings&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To reduce garbage generation, you need to be able to pinpoint where in the program the garbage is being generated. Luckily, that is not difficult.&lt;br /&gt;&lt;br /&gt;Once the game has reached a stable state (total Lua memory doesn't grow or shrink) any allocation made can be considered garbage, because it will soon be freed again (otherwise the Lua memory would keep growing). So to find the garbage all you have to do is to add some tracing code to &lt;i&gt;lua_Alloc&lt;/i&gt; that you can trigger when you have reached a stable state.&lt;br /&gt;&lt;br /&gt;You can use  &lt;i&gt;lua_getstack()&lt;/i&gt; to get the current Lua stack trace from inside &lt;i&gt;lua_Alloc&lt;/i&gt; and use a &lt;i&gt;HashMap&lt;/i&gt; to count the number of allocations associated with each stack trace. If you then sort this data by the number of allocations it is easy to identify the "hotspots" that are generating the most garbage. A gameplay programmer can go through this list and reduce the amount of garbage generation using the tips above.&lt;br /&gt;&lt;br /&gt;The code may look something like this:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;struct TraceEntry {&lt;br /&gt;	TraceEntry() : alloc_count(0), alloc_bytes(0) {}&lt;br /&gt;	String trace;&lt;br /&gt;	unsigned alloc_count;&lt;br /&gt;	unsigned alloc_bytes;&lt;br /&gt;};&lt;br /&gt;HashMap&amp;lt;uint64, TraceEntry&amp;gt; _traces;&lt;br /&gt;&lt;br /&gt;if (_tracing_allocs) {&lt;br /&gt;	lua_Debug stack[5] = {0};&lt;br /&gt;	int count = lua_debugger::stack_dump(L, stack, 5);&lt;br /&gt;	uint64 hash = murmur_hash_64(&amp;amp;stack[0], sizeof(lua_Debug)*count);&lt;br /&gt;	TraceEntry &amp;amp;te = _traces[hash];&lt;br /&gt;	te.alloc_count += 1;&lt;br /&gt;	te.alloc_bytes += (new_size - old_size);&lt;br /&gt;	if (te.trace.empty())&lt;br /&gt;		lua_debugger::stack_dump_to_string(L, te.trace);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In my experience, spending a few hours on fixing the worst hot spots indicated by the trace can reduce the garbage collection time by an order of magnitude.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1283778140961294373?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1283778140961294373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/08/fixing-memory-issues-in-lua.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1283778140961294373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1283778140961294373'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/08/fixing-memory-issues-in-lua.html' title='Fixing memory issues in Lua'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1322411918456066537</id><published>2011-06-26T02:02:00.001+02:00</published><updated>2011-06-26T02:02:49.980+02:00</updated><title type='text'>Lightweight Lua Bindings</title><content type='html'>A scripting language, such as Lua, can bring huge productivity gains to a game project. Quick iterations, immediate code reloads and an in-game console with a &lt;a href="http://en.wikipedia.org/wiki/Read-eval-print_loop"&gt;read-eval-print-loop&lt;/a&gt; are invaluable tools. A less obvious benefit is that introducing a scripting language creates a clear dividing line between "engine" and "gameplay" code with a well defined API between them. This is often good for the structure of the engine, at least if you intend to use it for more than one game.&lt;br /&gt;&lt;br /&gt;The main drawback is of course performance. It is a scary thing to discover late in a project that the game is slow because the script is doing too much. Especially since bad script performance cannot always be traced back to bugs or bad algorithms. Sure, you get those as well, but you can also get problems with "overall slowness" that cannot easily be traced back to specific bottlenecks or hot spots. There are two reasons for this. First, the slowness of script code compared to C, which means that everything just takes more time. And second, the fact that gameplay code tends to be "connection" rather than "compute" heavy which means there is less to gain from algorithmic improvements.&lt;br /&gt;&lt;br /&gt;Part of this is a management issue. It is important to monitor the script performance (on the slowest target platform) throughout the production so that measures can be taken early if it looks like it will become a problem. But in this article I will focus on the technical aspects, specifically the C-to-Lua bindings.&lt;br /&gt;&lt;br /&gt;It is important to note that when I am talking about performance in this article I mean performance on current generation consoles, because that is where performance problems occur. PC processors are much more powerful (especially when running virtual machines, which tend to be brutal to the cache). The extra cores on the consoles don't help much with script execution (since scripts are connection heavy, they are hard to multithread). &lt;i&gt;And&lt;/i&gt; the PC can run LuaJIT which &lt;a href="http://luajit.org/performance_x86.html"&gt;changes the game completely&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This may of course change in future generation consoles. If anyone from Sony or Microsoft is reading this, &lt;i&gt;please&lt;/i&gt; add support for JITting to your next generation ventures.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Lua bindings&lt;/h2&gt;&lt;br /&gt;Apart from optimizing the Lua interpreter itself, optimizing the bindings between Lua and C is the best way of achieving a general performance improvement, since the bindings are used whenever Lua calls some function in the C code which in a typical game happens constantly.&lt;br /&gt;&lt;br /&gt;The standard way of binding an object on the C side to Lua is to use a &lt;i&gt;full userdata&lt;/i&gt; object. This is a heap allocated data blob with an associated &lt;i&gt;metatable&lt;/i&gt; that can be used to store the methods of the object. This allows the user to make a call like:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;game_world:get_camera():set_position(Vector3(0,0,0))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;In many ways, this is the easiest and most convenient way of using objects in Lua, but it comes with several performance problems:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Any time an object is passed from C to Lua, such as the camera in &lt;i&gt;get_camera()&lt;/i&gt;&lt;br /&gt;  or the vector created by &lt;i&gt;Vector3(0,0,0)&lt;/i&gt;, memory for the object must be allocated on the heap. This can be costly.&lt;/li&gt; &lt;li&gt;All the heap objects must be garbage collected by Lua. Calls such as &lt;tt&gt;get_camera()&lt;/tt&gt; create temporary objects that must be collected at some later time. The more garbage we create, the more time we need to spend in garbage collection.&lt;/li&gt; &lt;li&gt;Making use of many heap allocated objects can lead to bad cache performance. When the C side wants to use an object from Lua, it must first fetch it from Lua's heap, then (in most cases) extract an object pointer from its data and look up the object in the game heap. So each time there is an extra cache miss.&lt;/li&gt; &lt;li&gt;The colon method call syntax &lt;i&gt;world:get_camera()&lt;/i&gt; actually translates to something like (I've simplified this a bit, see the Lua documentation for details) &lt;i&gt;world._meta_table["get_camera"](world)&lt;/i&gt;. I.e., it creates an extra table lookup operation for every call.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;We can get rid of the first two issues by caching the Lua objects. I.e. instead of creating a new Lua object every time &lt;i&gt;get_camera()&lt;/i&gt; is called, we keep a reference to the object on the Lua side and just look it up and return it every time it is requested. But this has other disadvantages. Managing the cache can be tricky and it creates a lot more objects in the Lua heap, since the heap will now hold every object that has ever been touched by Lua. This makes garbage collection take longer and the heap can grow uncontrollably during the play of a level, depending on which objects the player interacts with. Also, this doesn't solve the issue with objects that are truly temporary, such as &lt;i&gt;Vector3(0,0,0)&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;A better option is to use what Lua calls &lt;i&gt;light userdata&lt;/i&gt;. A light userdata is essentially just a C pointer stored in Lua, with no additional information. It lives on the Lua stack (i.e. not the heap), does not require any memory allocations, does not participate in garbage collection and does not have an associated metatable. This addresses all our performance problems, but introduces new (not performance-related) issues:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Since the objects don't have metatables we cannot use the convenient colon syntax for calling their methods.&lt;/li&gt;  &lt;li&gt;Light user data objects do not carry any type information, they are just raw pointers. So on the C side we have no way of telling if we have been called with an object of the right type.&lt;/li&gt;  &lt;li&gt;Lifetime management is trickier since objects do not have destructors and are not garbage collected. How do we manage dangling pointers in Lua?&lt;/li&gt; &lt;/ul&gt;&lt;br /&gt;&lt;h3&gt;Colon syntax&lt;/h3&gt;&lt;br /&gt;With light user data we cannot use the colon syntax to look up methods. Instead we must call global functions and pass in the objects as parameters. But we can still make sure to organize our methods nicely, i.e., put all the functions that operate on &lt;i&gt;World&lt;/i&gt; objects in a table called &lt;i&gt;World&lt;/i&gt;. It might then look something like this:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;Camera.set_position(World.get_camera(game_world), Vector3(0,0,0))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If you are used to the object oriented style this way of writing can feel awkward at first. But in my experience you get accustomed to it quite quickly. It does have some implications which are not purely syntactical though. On the plus side, this style of writing makes it easy to cache the method lookups for better performance:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;local camera_set_position = Camera.set_position&lt;br /&gt;local world_get_camera = World.get_camera&lt;br /&gt;&lt;br /&gt;camera_set_position(world_get_camera(game_world), Vector3(0,0,0))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This transformation is so simple that you can easily write a script that performs it on your entire code base.&lt;br /&gt;&lt;br /&gt;The main drawback is that we are no longer doing dynamic method lookup, we are calling one specific C method. So we can't do virtual inheritance with method overrides. To me that is not a big problem because firstly, I think inheritance is vastly overrated as a design concept, and secondly, if you really need virtual calls you can always do the virtual method resolution on the C side and get the benefits while still having a static call in Lua.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Type checking&lt;/h3&gt;&lt;br /&gt;For full userdata we can check the type by looking at the metatable. The Lua library function &lt;i&gt;luaL_checkudata&lt;/i&gt; provides this service. Since light userdata is just a raw pointer to Lua, no corresponding functionality is offered. So we need to provide the type checking ourselves. But how can we know the type of an arbitrary C pointer?&lt;br /&gt;&lt;br /&gt;An important thing to notice is that type checking is only used for debugging. We only need to know if a function has been called with the right arguments or not. So we don't actually need to know the exact type of the pointer, we just need to know if it points to the thing we expect. And since this is only used for bug detection, it doesn't matter if we get a few false positives. And it is fine if the test takes a few cycles since we can strip it from our release builds.&lt;br /&gt;&lt;br /&gt;Since we just need to know "is the object of this type" we can make test different for each type. So for each type, we can just pick whatever test fits that type best. Some possibilities are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Store a known four byte type marker at the start of the object's memory. To verify the type, just dereference the pointer and check that the first four bytes match the expected marker. (This is the method I use most frequently.)&lt;/li&gt;  &lt;li&gt;Keep a hash table of all objects of the specified type and check if it is there.&lt;/li&gt;  &lt;li&gt;For objects that are allocated from a pool, check that the pointer lies within the range of the pool.&lt;/li&gt; &lt;/ul&gt;&lt;br /&gt;&lt;h3&gt;Object lifetimes&lt;/h3&gt;&lt;br /&gt;There are two approaches you can take to ownership of objects in the Lua interface. They can either be Lua owned and destroyed by the garbage collector or they can be owned by the C side and destroyed by explicit function calls. Both approaches have their advantages, but I usually lean towards the latter one. To me it feels more natural that Lua explicitly creates and destroys cameras with &lt;i&gt;World.destroy_camera()&lt;/i&gt; rather than cameras just popping out of existence when the garbage collector feels they are no longer used. Also, since in our engine, Lua is an option, not a requirement, it makes more sense to have the ownership on the C side.&lt;br /&gt;&lt;br /&gt;With this approach you have the problem that Lua can hold "dangling pointers" to C objects, which can lead to nasty bugs. (If you took the other approach, you would have the opposite problem, which is equally nasty.)&lt;br /&gt;&lt;br /&gt;Again, for debugging purposes, we would want to do something similar to what we did with the type information. We would like to know, in debug builds, if the programmer has passed us a pointer to a dead object, so that we can display an error message rather than exhibit undefined behavior.&lt;br /&gt;&lt;br /&gt;This is a trickier issue and I haven't found a clear cut solution, but here are some of the techniques I have used:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Clear out the marker field of the object when it is freed. That way if you attempt to use it later you will get a type error. Of course, checking this can cause an access violation if the memory has been returned to the system.&lt;/li&gt;  &lt;li&gt;For objects that get created and destroyed a lot, such as particles or sound instances, let Lua manage them by IDs rather than by raw pointers.&lt;/li&gt;  &lt;li&gt;Keep a hash table of all known live objects of the type.&lt;/li&gt;  &lt;li&gt;Let Lua point to the object indirectly through a handle. Use some bits of the pointer to locate the handle and match the rest to a counter in the handle so that you can detect if the handle has been released and repurposed for something else.&lt;/li&gt; &lt;/ul&gt;&lt;br /&gt;&lt;h2&gt;Conclusions&lt;/h2&gt;&lt;br /&gt;Using light instead of full userdata does make things more inconvenient. But as we have seen, there are tricks that help overcome many of these inconveniences.&lt;br /&gt;&lt;br /&gt;We still haven't looked at truly the temporary objects, such as &lt;i&gt;Vector3(0,0,0)&lt;/i&gt;. In my next article I will discuss what can be done about them.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:small"&gt;(This has also been posted to the &lt;a href="http://bitsquid.blogspot.com/"&gt;BitSquid blog&lt;/a&gt;.)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1322411918456066537?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1322411918456066537/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/06/lightweight-lua-bindings.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1322411918456066537'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1322411918456066537'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/06/lightweight-lua-bindings.html' title='Lightweight Lua Bindings'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-5541110657305398913</id><published>2011-06-10T08:55:00.000+02:00</published><updated>2011-06-10T08:55:35.914+02:00</updated><title type='text'>Strings Redux</title><content type='html'>Simpler programs are better programs. Today's target: strings. In this post I will show you three ways of improving your code by simplifying your strings.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;1. Use UTF-8 everywhere&lt;/h2&gt;&lt;br /&gt;When I issue programming tests I always have some question about different string encodings. It is a good way of testing if a candidate can distinguish what data represents from how it is represented. But when I write code I just use UTF-8 everywhere, both in memory and on disk. Why? UTF-8 has many advantages and no serious disadvantages.&lt;br /&gt;&lt;br /&gt;Advantages:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;Using the same encoding everywhere means there is never any confusion about what encoding a certain string or file should be in. If it is not in UTF-8, then it is &lt;i&gt;wrong&lt;/i&gt;. Period.&lt;/li&gt; &lt;li&gt;UTF-8 uses the standard C data types for strings: &lt;i&gt;char *&lt;/i&gt; and &lt;i&gt;char []&lt;/i&gt;.&lt;/li&gt; &lt;li&gt;ASCII strings look like ASCII strings and all functions, parsers, etc that operate on ASCII strings work on UTF-8 strings without modification.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The most common disadvantages claimed for UTF-8 are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;UTF-8 can waste memory.&lt;/li&gt; &lt;li&gt;Finding the i’th glyph in a UTF-8 string is expensive (O(n) rather than O(1)).&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;There is some truth to the first point. Yes, if your text is in Japanese, UTF-8 probably uses more memory than Shift-JIS. But I don’t think that is a major issue. First, while UTF-8 is worse than other encodings for some languages, it does pretty well on average. Second, strings aren’t a big part of a game’s memory usage anyway (if they are, you are most likely doing something wrong). And third, if you care that much about string memory usage you should probably compress your string data.&lt;br /&gt;&lt;br /&gt;Compression will pretty much nullify any differences in memory usage caused by using different encodings, since the entropy of the underlying data is the same regardless of how it is encoded. (At least in theory, it would be interesting to see someone test it in practice.)&lt;br /&gt;&lt;br /&gt;The second point is true but also moot, since accessing glyphs at random indices in a string is a much rarer operation than you might think. For most string operations: concatenation, parsing, etc you never have to access individual glyphs. You can just use the same implementation as you would use for an ASCII-string and it will work without modification.&lt;br /&gt;&lt;br /&gt;In the few cases where you &lt;i&gt;do&lt;/i&gt; need to convert to glyphs (for example for rendering) you typically do that &lt;i&gt;sequentially&lt;/i&gt;, from the start to the end. This is still a fast operation, it is only &lt;i&gt;random access&lt;/i&gt; of glyphs that is significantly slower with UTF-8 than with UTF-32. Another interesting thing to note is that since all continuation bytes in UTF-8 follow the pattern 10xxxxxx you can quickly find the start and end of the next or previous glyph given a &lt;i&gt;char *&lt;/i&gt; to anywhere within a UTF-8 string.&lt;br /&gt;&lt;br /&gt;In fact I can't think of any string operation that requires fast random access to glyphs other than completely contrived examples (given 10000 long strings, find the 1000th glyph in each). I urge my readers to try to come up with something.&lt;br /&gt; &lt;br /&gt;&lt;h2&gt;2. You do not need a string class&lt;/h2&gt;&lt;br /&gt;String classes are highly overrated.&lt;br /&gt;&lt;br /&gt;Generally speaking, code that deals with strings can be divided into two categories: code that looks at static strings (parsers, data compilers, script callbacks, etc) and code that builds dynamic strings (template formatters, debug logging, etc). In a typical game project there is a lot more of the first than the latter. Ironically, string classes don’t do a very good job with &lt;i&gt;either&lt;/i&gt;!&lt;br /&gt;&lt;br /&gt;For code that deals with static strings you should always use &lt;i&gt;const char *&lt;/i&gt; rather than &lt;i&gt;const string &amp;&lt;/i&gt;. The former is more flexible. It allows the caller to store her strings however she likes rather than adhering to some memory model imposed by the string class. It also means that if you call the function with a static string it doesn’t get pointlessly converted to a &lt;i&gt;string&lt;/i&gt; object.&lt;br /&gt;&lt;br /&gt;But string classes aren’t very good for dynamic strings either, as anyone who has written something like this can attest to:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;string a;&lt;br /&gt;for (i = 0; i&amp;lt;10000; ++i)&lt;br /&gt;    a += "xxx";&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Depending on how your string class is implemented this can be horribly inefficient, reallocating and copying the string memory for every iteration of the loop. There are various ways of addressing this: reserving memory for the string up front or using some kind of "rope" or "stringstream" class.&lt;br /&gt;&lt;br /&gt;The simpler approach is to just use:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;vector&amp;lt;char&amp;gt; a;&lt;br /&gt;for (i=0; i&amp;lt;10000; ++i)&lt;br /&gt; string::append(a, "xxx");&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;We represent the string as a vector of chars and provide a library of functions for performing "common string operations" on that representation.&lt;br /&gt;&lt;br /&gt;The advantage of this over using a regular string class is that it provides a clear distinction between strings that can grow (vector&amp;lt;char&amp;gt;) and strings that can't (char *) and emphasizes what the cost of growing is (amortized linear time). Do you know the cost of growing in your string class?&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;3. You should almost never use strings in your runtime&lt;/h2&gt;&lt;br /&gt;The variable length nature of strings make them slow, memory consuming and unwieldy (memory for them must be allocated and freed). If you use fixed length strings you will either use even more memory or annoy the content creators because they can't make their resource names as descriptive as they would like too.&lt;br /&gt;&lt;br /&gt;For these reasons I think that strings in the runtime should be reserved for two purposes:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt;User interface text&lt;/li&gt; &lt;li&gt;Debugging&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt; In particular, you shouldn't use strings for object/resource/parameter names in the runtime. Instead use string hashes. This lets you use user friendly names (strings) in your tools and fast ints in your runtime. It is also a lot easier to use than enums. Enums require global cooperation to avoid collisions. String hashes just require that you hash into a large enough key space.&lt;br /&gt;&lt;br /&gt;We hash names during our data compile stage into either 32-bit or 64-bit ints depending on the risk of collision. If it is a global object name (such as the name of a texture) we use 64-bit ints. If it is a local name (such as the name of a bone in a character) we  use 32-bit ints. Hash collision is considered a compile error. (It hasn't happened yet.)&lt;br /&gt;&lt;br /&gt;Since user interface text should always be localized, all user interface strings are managed by the localizer. The localized text is fetched from the localizer with a string lookup key, such as "menu_file_open" (hashed to a 64-bit int of course).&lt;br /&gt;&lt;br /&gt;This only leaves debugging. We use formatted strings for informative assert messages when something goes wrong. Our profiler and monitoring tools use &lt;a href="http://altdevblogaday.org/2011/05/26/monitoring-your-game/"&gt;interned strings&lt;/a&gt; to identify data. Our game programmers use debug-prints to root out problems. Of course, non of this affects the end user, since the debugging strings are only used in debug builds.&lt;br /&gt;&lt;br /&gt;Hashes can be problematic when debugging. If there is an error in the resource 0x3e728af10245bc71 it is not immediately obvious that it is the object &lt;i&gt;vegetation/trees/larch_3.mesh&lt;/i&gt; that is at fault.&lt;br /&gt;&lt;br /&gt;We handle this with a lookup table. When we compile our data we also create a reverse lookup table that converts from a hash value back to the original string that generated it. This table is not loaded by the runtime, but it can be accessed by our tools. So our game console, for instance, uses this table to automatically translate any hash IDs that are printed by the game.&lt;br /&gt;&lt;br /&gt;However, recently I've started to also add small fixed-size debug strings to the resources themselves. Something like this:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp"&gt;HashMap&amp;lt;IdString64, MeshResource *&amp;gt; _meshes;&lt;br /&gt;&lt;br /&gt;struct MeshResource&lt;br /&gt;{&lt;br /&gt; char debug_name[32];&lt;br /&gt; …&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;As you can see, all the lookup tables etc, still use the 64-bit hash to identify the resource. But inside the resource is a 32-byte human friendly name (typically, the last 32 characters of the resource name), which is only used for debugging. This doesn't add much to the resource size (most resources are a lot bigger than 32 bytes) but it allows us to quickly identify a resource in the debugger or in a raw memory dump without having to open up a tool to convert hashes back to strings. I think the time saved by this is worth those extra bytes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-5541110657305398913?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/5541110657305398913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/06/strings-redux.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/5541110657305398913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/5541110657305398913'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/06/strings-redux.html' title='Strings Redux'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-6286372342659081884</id><published>2011-05-26T22:05:00.000+02:00</published><updated>2011-05-26T22:05:04.128+02:00</updated><title type='text'>Monitoring your game</title><content type='html'>Many bugs are easy to fix with debuggers, stack traces and printf-statements. But some are hard to even &lt;i&gt;see&lt;/i&gt; with such tools. I'm thinking of things like frame rate hitches, animation glitches and camera stutters. You can't put a breakpoint on the glitch because what constitutes a glitch is only defined in relation to what happened in the frame before or what will happen in the next frame. And even if you are able to break exactly when the glitch occurs, you might not be able to tell what is going on from the call stack.&lt;br /&gt;&lt;br /&gt;In these situations, some way of monitoring and visualizing your game's behavior can be invaluable. Indeed, if we graph the delta time for each frame, the hitches stand out clear as day.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-7_w66wupPrI/Td6wU5piAEI/AAAAAAAAAFg/LUfgolE_Q6A/s1600/monitor1.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="392" width="400" src="http://2.bp.blogspot.com/-7_w66wupPrI/Td6wU5piAEI/AAAAAAAAAFg/LUfgolE_Q6A/s400/monitor1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;p&gt;&lt;i&gt;Delta-time graph with frame rate drops.&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;br /&gt;A graph like this opens up many new ways of attacking glitch bugs. You can play the game with the graph displayed and try to see what game actions trigger the glitches. Do they happen when a certain enemy is spawned? When a particular weapon is fired? Another approach is to draw the total frame time together with the time spent in all the different subsystems. This immediately shows you which subsystem is causing the frame rate to spike. You can constrain the problem further by graphing the time spent in narrower and narrower profiler scopes.&lt;br /&gt;&lt;br /&gt;Visualization tools like these can help with many other issues as well. Want to find out where a weird camera stutter comes from? Plot the camera position, the position of its look-at target and any other variables that may influence its behavior to pin down the source of the problem. Draw a &lt;a href="http://altdevblogaday.org/2011/05/17/a-birds-eye-view-of-your-memory-map/"&gt;graph representing your memory fragmentation&lt;/a&gt; to find problematic allocations and get an overall feeling for how bad the situation is. Does something look slightly off with the animations? Graph the bone rotations to make sure that you don't have any vibrations or discontinuities. Graph your network usage to make sure you stay below the bandwidth cap.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-ZR_BReJuoOU/Td6w1ZjHLsI/AAAAAAAAAFo/UhAVeFktJe8/s1600/monitor2.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="393" width="400" src="http://1.bp.blogspot.com/-ZR_BReJuoOU/Td6w1ZjHLsI/AAAAAAAAAFo/UhAVeFktJe8/s400/monitor2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;p&gt;&lt;i&gt;Rotation of a bone during a jump animation.&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;br /&gt;When you study your game in this way, you will most likely learn things that surprise you. Games are highly complex systems built by a large number of people over a long period of time. As all complex systems they show emergent behavior.  You can be quite certain that at least someone has done at least done something that is &lt;i&gt;completely unexpected and totally weird&lt;/i&gt;. You can't hope to discover these things using just a bottom-up approach. There is too much code and too much data. Instead you must study your game as if it was an alien organism. Prod it and see how it reacts. Keep the graphs on screen and make sure that they look sane.&lt;br /&gt;&lt;br /&gt;There are many different kinds of data that can be interesting and many ways of visualizing them - graphs, bars, charts, etc. But in all cases the pattern is pretty much the same. We have some data that we record from the game and then we have a visualizer that takes this data and draws it in some interesting way. Schematically, we can represent it like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-G5OsWty0JGs/Td6xCUBspqI/AAAAAAAAAFw/3bh8RTGkjeI/s1600/monitor3.jpeg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="335" width="400" src="http://2.bp.blogspot.com/-G5OsWty0JGs/Td6xCUBspqI/AAAAAAAAAFw/3bh8RTGkjeI/s400/monitor3.jpeg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;p&gt;&lt;i&gt;Basic monitoring system schematic.&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;br /&gt;I will refine this picture shortly, but first lets do a little data-oriented design and ask ourselves how we can best store and process this data.&lt;br /&gt;&lt;br /&gt;If you have read any of my earlier blog posts you will know that I'm a fan of big dumb continuous memory buffers and data structures that look like "file formats for memory". And this approach works perfectly for this problem. We can just store the data as a big block of concatenated structs, where each struct represents some recorded data. We begin each record with an enum specifying the type of recorded event and follow that with a variable sized struct with data for that particular event.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-gxQjLzPdKl4/Td6xZFqujhI/AAAAAAAAAF4/xYBKzUc7kag/s1600/monitor4.jpeg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="112" width="400" src="http://3.bp.blogspot.com/-gxQjLzPdKl4/Td6xZFqujhI/AAAAAAAAAF4/xYBKzUc7kag/s400/monitor4.jpeg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;p&gt;&lt;i&gt;Data buffer layout.&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;br /&gt;The event types might be things such as ENTER_PROFILER_SCOPE, LEAVE_PROFILER_SCOPE, ALLOCATE_MEMORY, FREE_MEMORY, RECORD_GLOBAL_FLOAT, etc.&lt;br /&gt;&lt;br /&gt;RECORD_GLOBAL_FLOAT is the event type used for all kinds of data that we want to draw in graphs. We record the data with calls like these:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="c"&gt;record_global_float("application.delta_time", dt);&lt;br /&gt;record_global_float("application.frame_rate", 1.0f / dt);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The corresponding data struct is just:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="c"&gt;struct RecordGlobalFloatEvent {&lt;br /&gt;    const char *name;&lt;br /&gt;    float value;&lt;br /&gt;};&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Note that there is an interesting little trick being used here. When we record the events, we just record the string &lt;i&gt;pointers&lt;/i&gt;, not the complete string data. This saves memory, makes the struct fixed size and gives us faster string compares. This works because &lt;i&gt;record_global_float()&lt;/i&gt; is called with static string data that is always at the same address and kept in memory throughout the lifetime of the application. (In the rare case where you want to call &lt;i&gt;record_global_float()&lt;/i&gt; with a dynamic string, you must allocate a copy of that string at some permanent location, i.e. do a form of &lt;a href="http://en.wikipedia.org/wiki/String_interning"&gt;string interning&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Now, let's refine the picture slightly. There is a problem with recording all data to a single memory buffer and that is multithreading. If all threads record their data to the same memory buffer then we need lots of mutex locking to make sure they don't step on each other's toes.&lt;br /&gt;&lt;br /&gt;We might also want to add support for some kind of off-line (i.e., not in-game) visualization. Off-line visualizers can take advantage of the full power of your development PC to implement more powerful visualization algorithms. And since they have near unlimited memory, they can record the entire data history so that you can explore it back and forth after the game session has ended.&lt;br /&gt;&lt;br /&gt;With these refinements our monitoring system now looks like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-IPrvCW_XxUo/Td6xnyFk1-I/AAAAAAAAAGA/SYLV5aytaZs/s1600/monitor5.jpeg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="222" width="400" src="http://4.bp.blogspot.com/-IPrvCW_XxUo/Td6xnyFk1-I/AAAAAAAAAGA/SYLV5aytaZs/s400/monitor5.jpeg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;p&gt;&lt;i&gt;Advanced monitoring system schematic.&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;br /&gt;Each thread has a small TLS (thread-local-storage) cache with 64 K or so of debug memory where it records its events. When the cache gets full or we reach the end of the frame, the thread acquires the lock to the global event buffer and flushes its data there.&lt;br /&gt;&lt;br /&gt;The active on-line visualizers process the events in the buffer and visualize them. Simulatenously, we send the data over TCP so that it can be processed by any off-line visualizers. In the process we consume the buffer data and the buffer can be filled with new data from the threads.&lt;br /&gt;&lt;br /&gt;(We allocate all the buffers we use on a special debug heap, so that we separate the allocations which we only do for debugging purposes from the allocations done by the main game.)&lt;br /&gt;&lt;br /&gt;Recording float data requires just a few lines of code.&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="c"&gt;enum RECORD_GLOBAL_FLOAT_EVENT = 17;&lt;br /&gt;enum THREAD_BUFFER_SIZE = 64*1024;&lt;br /&gt;__thread char *_thread_buffer;&lt;br /&gt;__thread unsigned _thread_buffer_count;&lt;br /&gt;&lt;br /&gt;inline void record_global_float(const char *name, float value)&lt;br /&gt;{&lt;br /&gt;     if (_thread_buffer_count + 12 &amp;gt; THREAD_BUFFER_SIZE)&lt;br /&gt;         flush_thread_buffer();&lt;br /&gt;     &lt;br /&gt;     char *p = _thread_buffer + _thread_buffer_count&lt;br /&gt;     *(unsigned *)p = GLOBAL_FLOAT;&lt;br /&gt;     *(RecordGlobalFloatEvent *)(p+4).name = name;&lt;br /&gt;     *(RecordGlobalFloatEvent *)(p+4).value = value;&lt;br /&gt;    thread_buffer_count += 12;&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;When you have the data, writing the graph visualizer is not much work. Just save the data over a couple of frames and plot it using a line drawer.&lt;br /&gt;&lt;br /&gt;In the BitSquid engine, we also expose all the data recording functions to Lua scripting. This makes it possible to dynamically create graphs for all kinds of data while the game is running.&lt;br /&gt;&lt;br /&gt;As an example of this, a couple of days ago a game programmer suspected that some problematic behavior was caused by a low update frequency in the mouse driver. We quickly bashed out a couple of lines in the game console to produce a graph of the mouse data and could immediately confirm that this indeed was the case:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="lua"&gt;Core.Debug.add_updator(&lt;br /&gt;  function ()&lt;br /&gt;    Profiler.record_statistics("mouse", Mouse.axis(0))&lt;br /&gt;  end &lt;br /&gt;)&lt;br /&gt;graph make mousegraph&lt;br /&gt;graph add_vector3 mousegraph mouse&lt;br /&gt;graph range mousegraph -20 20&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-9ymQHAIhIXw/Td6yKPGyZBI/AAAAAAAAAGI/tMG0EnvUC0U/s1600/monitor6.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="400" width="399" src="http://3.bp.blogspot.com/-9ymQHAIhIXw/Td6yKPGyZBI/AAAAAAAAAGI/tMG0EnvUC0U/s400/monitor6.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;p&gt;&lt;i&gt;Graph of mouse input showing frames with no input.&lt;/i&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-6286372342659081884?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/6286372342659081884/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/05/monitoring-your-game.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6286372342659081884'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6286372342659081884'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/05/monitoring-your-game.html' title='Monitoring your game'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-7_w66wupPrI/Td6wU5piAEI/AAAAAAAAAFg/LUfgolE_Q6A/s72-c/monitor1.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-8370991487891578137</id><published>2011-05-06T09:56:00.002+02:00</published><updated>2011-05-06T10:48:44.688+02:00</updated><title type='text'>Flow -- Data-Oriented Implementation of a Visual Scripting Language</title><content type='html'>Presentation made at Stockholm Game Developer Forum, 5 May 2011:&lt;br /&gt;&lt;br /&gt;&lt;iframe frameborder='0' style='width:460px;height:375px;' src='http://public.iwork.com/embed/?d=Flow.key&amp;a=p1373296943&amp;h=768&amp;w=1024&amp;sw=458'&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;Download:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://bitsquid.se/presentations/flow.key"&gt;Keynote&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://bitsquid.se/presentations/flow.pdf"&gt;PDF&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://bitsquid.se/presentations/flow-notes.pdf"&gt;PDF with speaker notes&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;View online fullscreen:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://public.iwork.com/document/?d=Flow.key&amp;a=p1373296943"&gt;at iwork.com&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-8370991487891578137?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/8370991487891578137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/05/flow-data-oriented-implementation-of.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8370991487891578137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8370991487891578137'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/05/flow-data-oriented-implementation-of.html' title='Flow -- Data-Oriented Implementation of a Visual Scripting Language'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-8645442190800110177</id><published>2011-04-26T22:47:00.000+02:00</published><updated>2011-04-26T22:47:12.993+02:00</updated><title type='text'>Universal Undo, Copy and Paste</title><content type='html'>Undo, Copy and Paste are the bane of any tools programmer. Especially when they are bolted on to an already existing program. But even when they are properly planned from the start, these small (but essential) features can consume a lot of development time and be the source of many bugs.&lt;br /&gt;&lt;br /&gt;Wouldn't it be nice if all that could be eliminated?&lt;br /&gt;&lt;br /&gt;In an &lt;a href="http://altdevblogaday.org/2011/03/27/collaboration-and-merging/"&gt;earlier post&lt;/a&gt; I presented a generic model for storing data: objects-with-properties. As any model it consists of a combination of generalizations and restrictions. The generalizations make the model broadly applicable. The restrictions let us reason about it and prevents it from becoming an &lt;a href="http://en.wikipedia.org/wiki/Inner-platform_effect"&gt;"inner platform"&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To quickly recap, here is the gist of the model:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The data consists of a set of objects-with-properties.&lt;/li&gt;&lt;li&gt;Each object is identified by a GUID.&lt;/li&gt;&lt;li&gt;Each property is identified by a string.&lt;/li&gt;&lt;li&gt;The property value can be null, a bool, a double, a vector3, a quaternion, a string, a data blob, a GUID or a set of GUIDs.&lt;/li&gt;&lt;li&gt;The data has a root object with GUID 0.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;We need only five operations to manipulate data stored using this model:&lt;br /&gt;&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;create(guid)&lt;/dt&gt;    &lt;dd&gt;creates the object with the specified GUID&lt;/dd&gt;&lt;dt&gt;destroy(guid)&lt;/dt&gt;    &lt;dd&gt;destroys the object with the specified GUID&lt;/dd&gt;&lt;dt&gt;set_property(guid, key, value)&lt;/dt&gt;    &lt;dd&gt;sets the specified property of the object to the value (set to nil to remove the property)&lt;/dd&gt;&lt;dt&gt;add_to_set(guid, key, item_guid)&lt;/dt&gt;    &lt;dd&gt;adds the item to the GUID set property identified by the key&lt;/dd&gt;&lt;dt&gt;remove_from_set(guid, key, item_guid)&lt;/dt&gt;    &lt;dd&gt;removes the item from the GUID set property identified by the key&lt;/dd&gt;&lt;/dl&gt;&lt;br /&gt;The interesting thing about this model is that it is generic enough to represent almost any kind of data, yet restricted enough to make it possible to define and perform a variety of interesting operations on the data. For example, in the previous post we saw that it was possible to define a property-based merge operation on the data (which for content files is much more useful than the line-based merge used by most version control systems).&lt;br /&gt;&lt;br /&gt;Other operations that are easy to perform on this data are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;referential integrity checks (check that all GUIDs used exist in the database)&lt;/li&gt;&lt;li&gt;checks for "dangling" objects&lt;/li&gt;&lt;li&gt;object replacement (replace all references to an object's GUID with references to another object)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;And, of course, the topic for the day: Undo, Copy and Paste.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Undo&lt;/h2&gt;&lt;br /&gt;To implement undo in this model, note that each of the five mutating operations we can perform on the data has a simple inverse:&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;Operation&lt;/th&gt;   &lt;th&gt;Inverse&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;create(guid)&lt;/td&gt;     &lt;td&gt;destroy(guid)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;destroy(guid)&lt;/td&gt;     &lt;td&gt;create(guid)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;set_property(guid, key, value)&lt;/td&gt;     &lt;td&gt;set_property(guid, key, old_value)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;add_to_set(guid, key, item_guid)&lt;/td&gt;     &lt;td&gt;remove_from_set(guid, key, item_guid)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;remove_from_set(guid, key, item_guid)&lt;/td&gt;     &lt;td&gt;add_to_set(guid, key, item_guid)&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;To implement Undo, all we have to do is to make sure that whenever the user performs one of the mutating operations, we save the corresponding inverse operation to a stack. To undo the latest action, we pop that last action from the stack and perform it. (We also save &lt;em&gt;its&lt;/em&gt; inverse operation to a redo queue, so the user can redo it.)&lt;br /&gt;&lt;br /&gt;Since the Undo operation is implemented on the low-level data model, all high-level programs that use it will automatically get "Undo" for free.&lt;br /&gt;&lt;br /&gt;In the high level program you typically want to group together all the mutations that resulted from a single user action as one "undo item", so the user can undo them with a single operation. You can do that by recording "restore points" in the undo stack whenever your program is idle. To undo an action, you undo all operations up to the last restore point.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Copy&lt;/h2&gt;&lt;br /&gt;To copy a set of objects, create a new database that holds just the copied objects. Copy the objects with their keys and values to the new database. Also copy all the objects they reference. (Use a set to remember the GUIDs of the objects you have already copied.)&lt;br /&gt;&lt;br /&gt;In the root object of the new database, store the GUIDs of all the copied objects under some suitable key (for example: "copied-models").&lt;br /&gt;&lt;br /&gt;Then serialize the database copy to the clipboard (using your standard method for serialization).&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Paste&lt;/h2&gt;&lt;br /&gt;To paste data, first unserialize it from the clipboard to a new temporary database. Then rename all the objects (give them new GUIDs) to make sure they don't collide with existing objects.&lt;br /&gt;&lt;br /&gt;Renaming is simple, just generate a new GUID for every object in the database. Use a dictionary to record the mapping from an object's old GUID to the new GUID. Then, using that dictionary, translate all the references in the object properties from the old GUIDs to the new ones.&lt;br /&gt;&lt;br /&gt;Finally, copy the objects from the temporary database to your main database.&lt;br /&gt;&lt;br /&gt;Again, since Copy and Paste were implemented on the underlying data model and don't depend on the high level data (what kind of objects you actually store) you get them for free in all programs that use the data model.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-8645442190800110177?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/8645442190800110177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/04/universal-undo-copy-and-paste.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8645442190800110177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8645442190800110177'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/04/universal-undo-copy-and-paste.html' title='Universal Undo, Copy and Paste'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2301468077416163249</id><published>2011-04-12T00:48:00.000+02:00</published><updated>2011-04-12T00:48:16.469+02:00</updated><title type='text'>Extreme Bug Hunting</title><content type='html'>Put on your camouflage vest and step out onto the hot motherboard plains. Squint against the searing rays of burning processor cycles and feel the warm wind of chassi fans fill the air with anticipation. Today we go bug hunting.&lt;br /&gt;&lt;br /&gt;Our prey: the worst kind. Crashes only in release builds. Only on PS3. At different places every time. With a low reproduction rate. And there are only a few days left until submission. (Aren't there always?)&lt;br /&gt;&lt;br /&gt;What can we do? Luckily, the situation is not as hopeless as it might seem. I recently dealt with a bug of this kind and here are my tips and tricks for bringing down such beasts:&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Don't Panic!&lt;/h2&gt;&lt;br /&gt;No bug is impossible to fix. The reason you feel that way is because you don't know anything about it. The more you learn, the less scary the bug will seem.&lt;br /&gt;&lt;br /&gt;Instead of focusing on fixing the bug, something you can't possibly do at this point, focus on finding out more about it. Gather information. Take a sheet of paper and write down everything you know and don't know about the bug. Write down ideas of what might be causing the bug as you think of them and cross them out as you eliminate them. Don't get stressed out by the fact that you are not fixing the bug &lt;em&gt;right now&lt;/em&gt;. Instead be confident that everything you learn about the bug takes you one step closer to finding the cause.&lt;br /&gt;&lt;br /&gt;Actually, the very things that make tricky bugs tricky already tells you some things about them:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Only in release builds.&lt;/em&gt; There can be several reasons for this. It could be that some of the code that is stripped out in release builds protects against the bug. The bug could be timing related, making it disappear in slower debug builds. Or the bug could be caused by uninitialized variables.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Only on PS3.&lt;/em&gt; This indicates that the bug might be in a PS3 specific system.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Low reproduction rate.&lt;/em&gt; This indicates that the bug depends on something random. Could be uninitialized memory (can contain random data) or a thread timing issue.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Different call stacks.&lt;/em&gt; This indicates that a bad system is causing failures in multiple other systems. The most likely explanation is that the bad system is overwriting the memory used by the other systems.&lt;br /&gt;&lt;br /&gt;All taken together. This gives us a pretty decent working hypothesis:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Timing issues or uninitialized variables is causing a system (possibly a PS3 only system) to overwrite memory that doesn't belong to it.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Get a Stable Repro Case&lt;/h2&gt;&lt;br /&gt;To learn more about the bug, you need to be able to do experiments. I.e., change something and see if the bug is still there or not.&lt;br /&gt;&lt;br /&gt;To do that effectively you need a reliable way of reproducing the bug. Can you isolate the behavior that produces the bug? Can you find a way of getting a better reproduction rate? Can you script what you just did, so that you have a way of reproducing the bug that doesn't require user input?&lt;br /&gt;&lt;br /&gt;Even if you can't find a 100 % reliable repro case, an automated test is still useful. If the bug has a 30 % chance of occurring and you run the test 20 times without seeing the bug you can be pretty certain that it has disappeared. And if you have a completely automated test process, it should be able to run the tests while you procure a tasty beverage of your choice.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Gather Information&lt;/h2&gt;&lt;br /&gt;As already mentioned, the next step is to try to gather as much information about the bug as possible. The more you learn about the bug, the better chance you have of fixing it.&lt;br /&gt;&lt;br /&gt;Just running the same repro case again and again quickly leads to diminishing returns. Instead, try manipulating the system slightly on each attempt and see what happens to the bug. Does it disappear? Does it become more frequent? Does it move to a different place? What does this tell you about the bug? Below are some useful manipulations to try.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Turn off System by System&lt;/h2&gt;&lt;br /&gt;Try turning of system by system in the engine until the bug disappears. Disable the sound system. Is the bug still there? Disable rendering. Can you still get the bug? And so on. If you have a modular engine design, it should be easy to turn off individual engine systems.&lt;br /&gt;&lt;br /&gt;When a bug has a random component you can't be certain that a fix that made the bug disappear &lt;em&gt;really&lt;/em&gt; fixed the bug. It might have just masked it. Still, if you don't make any assumptions at all you won't get anywhere. Just as when you solve a difficult crossword puzzle, you may have to make some guesses to get started. So if the bug disappears when you disable a particular system and reappears when you enable it, you can assume as a &lt;em&gt;working hypothesis&lt;/em&gt; that the bug is caused by something in that system. But you should be ready to abandon that hypothesis if you find evidence to the contrary.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Search the Version History&lt;/h2&gt;&lt;br /&gt;Was the bug discovered recently? Try reverting to an earlier version of the code/data and see if the bug is still there.&lt;br /&gt;&lt;br /&gt;If the bug disappears in an earlier version you can do a binary search of the revisions until you find the point where the bug was introduced. Git even has a cool command for this: &lt;em&gt;git bisect&lt;/em&gt;. When you find the revision that introduced the bug it should be easy to spot the error.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Look at the Data&lt;/h2&gt;&lt;br /&gt;When you get a crash because of overwritten memory, look at the data that was written. If you are really lucky, you might recognize it and can make a decent guess of what system it came from.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Memory Breakpoint&lt;/h2&gt;&lt;br /&gt;Another lucky break is if it is the same memory location that is being trashed every time you run the program. In that case, you can just place a data breakpoint at that location and get the compiler to break when the memory is being overwritten.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Fill Allocated Memory with Bad Data&lt;/h2&gt;&lt;br /&gt;Could the error be caused by uninitialized data? One way of finding out is to fill memory with specific values on malloc() and see if the behavior of the bug changes. This requires that you have implemented your own memory allocators, but you &lt;a href="http://bitsquid.blogspot.com/2010/09/custom-memory-allocation-in-c.html"&gt;should do that anyway&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Try changing malloc() (or whatever function you use to allocate memory) to always memset() the allocated memory to zero. Does the behavior of the bug change? Try a different pattern: 0xffffffff or 0x12345678. Does anything happen?&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Disable Multi-Threading&lt;/h2&gt;&lt;br /&gt;Could the error be caused by race conditions between execution threads? Try running your systems synchronously instead of asynchronously. Run them all on the same processor. Is the bug still there?&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Clear on Free&lt;/h2&gt;&lt;br /&gt;The two most common causes of random memory overwrites are:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt; &lt;li&gt;Code that writes to a memory address after having called free().&lt;/li&gt; &lt;li&gt;Code that allocates a buffer of a certain size and writes beyond that size (buffer overflow).&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Errors of type (1) can sometimes be found by clearing the memory when free() is called. If a system is accessing memory after having called free(), you might trigger an error in that system by clearing out the memory or filling it with a pattern.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Canary Values&lt;/h2&gt;&lt;br /&gt;Buffer overflow problems can be detected with something called "canary values" (named after the way canary birds were used to detect gas leaks in mines).&lt;br /&gt;&lt;br /&gt;The idea is that every time you allocate memory, you allocate some extra bytes and fill them with a "canary value", a known pattern, such as 0x12345678. In the call to free() you check that the canary value is still intact. If some code is writing beyond the end of its buffers, it will overwrite the canary value and cause an assert() in the call to free().&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Memory Verification&lt;/h2&gt;&lt;br /&gt;Many memory allocators have some kind of internal consistency check. For example in dlmalloc you can check that you are able to walk through all allocated memory blocks. If something is trashing the block headers, the consistency check will fail. By running the consistency check at regular intervals you can find out when the corruption occurs.&lt;br /&gt;&lt;br /&gt;Once you have a time interval where the memory is okay at the start and corrupted at the end you can do a binary search of that interval by inserting more and more consistency checks until you find the exact point where the headers are overwritten.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Change Allocators&lt;/h2&gt;&lt;br /&gt;Sometimes just changing what allocator you use can move the crash to a different place and make it easier to see the real problem. Try switching between dlmalloc, the system allocator and your own allocators (if you have any).&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Use the Virtual Memory System&lt;/h2&gt;&lt;br /&gt;Using virtual memory allocations is a good way of finding out if memory is being accessed after free(), since access to a page that has been freed results in a page fault.&lt;br /&gt;&lt;br /&gt;If you suspect that the error is in a particular system, you can switch its allocations over to using the virtual memory allocator. Typically, you can't switch the entire engine over to virtual allocations since it has huge overheads. (You must round up all allocations to the page size.)&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;The Bug That Inspired This Article&lt;/h2&gt;&lt;br /&gt;Using these techniques we were able to hunt down a really tricky bug reasonably quickly. We wrote a script that could reproduce the bug with a rate of about 30 %. System shutdown and version history tests indicated that the bug was in the SPU decompression library, a relatively new system. This indication was strengthened by the fact that the bug occurred only on PS3. Switching that system to using the virtual memory allocator gave us a DMA error when the bad write occurred (from an SPU). From that we could immediately see the problem -- a race condition could cause the SPUs to continue DMAing decompressed data even after the destination buffer had been freed. With that information, the problem was easily fixed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2301468077416163249?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2301468077416163249/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/04/extreme-bug-hunting.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2301468077416163249'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2301468077416163249'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/04/extreme-bug-hunting.html' title='Extreme Bug Hunting'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-8185846173146169242</id><published>2011-03-27T23:38:00.000+02:00</published><updated>2011-03-27T23:38:05.366+02:00</updated><title type='text'>Collaboration and Merging</title><content type='html'>(We are looking for a tools programmer.)&lt;br /&gt;&lt;br /&gt;Games are huge collaborative efforts, but usually they are not developed that way. Mostly, assets can only be worked on by one person at a time and need to be locked in version control to prevent conflicting changes. This can be a real time sink, especially for level design, but all assets would benefit from more collaborative workflows. As tool developers, it is time we start thinking seriously about how to support that.&lt;br /&gt;&lt;br /&gt;Recently I faced this issue while doing some work on our localization tools. (Localization is interesting in this context because it involves collaboration over long distances -- a game studio in one country and a translation shop in another.) In the process I had a small epiphany: the key to collaboration is merging. When data merges nicely, collaborative work is easy. If you can't merge changes it is really hard to do collaboration well, no matter what methods you use.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why databases aren't a magic solution&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A central database can act as backend storage for a collaborative effort. But that, by itself, does not solve all issues of synchronization and collaboration.&lt;br /&gt;&lt;br /&gt;Consider this: if you are going to use a database as your &lt;em&gt;only&lt;/em&gt; synchronization mechanism then all clients will have to run in lockstep with the database. If you change something, you have to verify with the database that the change hasn't been invalidated by something done by somebody else, perform the change as a single transaction and then wait for the database to acknowledge it before continuing. Every time you change something, you will have to wait for this round trip to the database and the responsiveness of your program is now completely at its mercy.&lt;br /&gt;&lt;br /&gt;Web applications have faced this issue for a long time and they all use the same solution. Instead of synchronizing every little change with the database, they gather up their changes and send them to the database asynchronously. This change alone is what have made "web 2.0" applications competitive with desktop software.&lt;br /&gt;&lt;br /&gt;But once you start talking to the database asynchronously, you have already entered "merge territory". You send your updates to the server, they arrive at some later point, potentially after changes made by other users. When you get a reply back from the server you may already have made other, potentially conflicting, changes to your local data. Both at the server and in the clients, changes made by different users must be merged.&lt;br /&gt;&lt;br /&gt;So you need merging. But you don't necessarily need a database. If your merges are robust you can just use an ordinary version control system as the backend instead of a database. Or you can work completely disconnected and send your changes as patch files. The technology you use for the backend storage doesn't matter that much, it is the ability to merge that is crucial.&lt;br /&gt;&lt;br /&gt;A merge-based solution has another nice property that you don't get with a "lockstep database": the possibility of keeping a local changeset and only submitting it to others when it is "done". This is of course crucial for code (imagine keeping all your source files in constantly mutating Google Documents). But I think it applies to other assets as well. You don't want half-finished, broken assets all over your levels. An update/commit workflow is useful here as well.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Making assets mergable&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;If you have tried to merge assets in regular version control systems you will know that they usually don't do so well. The merge tool can mess up the JSON/XML structure, mangle the file in other ways or just plain fail (because of a merge conflict). All of these problems arise because the merge tool treats the data as "source code" -- a line-oriented text document with no additional structure. The reason for this is of course historic, version control systems emerged as a way of managing source code and then grew into other areas.&lt;br /&gt;&lt;br /&gt;The irony of this is that source code is one of the hardest things to merge. It has complicated syntax and even more complicated semantics. Source code is so hard to merge that even humans with all their intelligency goodness find it taxing. In contrast, most assets are easy to merge, at least conceptually.&lt;br /&gt;&lt;br /&gt;Take localization, for instance. The localization data is just a bunch of strings with translations for different languages. If one person has made a bunch of German translations, another person has made some Swedish translations and a third person has added some new source strings, we can merge all that without a hitch. The only time when we have any problem at all is if two people has provided different translations for the same string in the same language. We can solve such standoffs by just picking the most recent value. (Optionally, we could notify the user that this happened by hilighting the string in the tool.)&lt;br /&gt;&lt;br /&gt;Many other assets have a similar structure. They can be described as "objects-with-properties". For example, in a level asset the objects are the entities placed in the level and their properties are position, rotation, color, etc. All data that has this structure is easy to merge, because there are essentially just three types of operations you can perform on it: create an object, destroy an object and change a property of an object. All these operations are easy to merge. Again, the only problem is if two different users have changed the same property of the same object.&lt;br /&gt;&lt;br /&gt;So when we try to merge assets using regular merge tools we are doing something rather silly. We are taking something that is conceptually very easy to merge, completely ignoring that and trying to merge it using rather complex algorithms that were designed for something completely different, something that is conceptually very hard to merge. Silly, when you think about it.&lt;br /&gt;&lt;br /&gt;The solution to this sad state of affairs is of course to write custom merge tools that take advantage of the fact that assets are very easy to merge. Tools that understand the objects-with-properties model and know how to merge that.&lt;br /&gt;&lt;br /&gt;A first step might be to write a merge program that understands XML or &lt;a href="http://bitsquid.blogspot.com/2010/06/avoiding-content-locks-and-conflicts-3.html"&gt;JSON&lt;/a&gt; files (the program in the link has some performance issues -- I will deal with that in my next available time slot) and can interpret them as objects-with-properties.&lt;br /&gt;&lt;br /&gt;This only goes half the way though, because you will need some kind of extra markup in the file for the tool to understand it as a set of objects-with-properties. For example, you probably need some kind of id field to mark object identity. Otherwise you can't tell if a user has changed some properties of an old object or deleted the old object and created a new one. And that matters when you do the merge.&lt;br /&gt;&lt;br /&gt;Instead of adding this extra markup, which can be a bit fragile, I think it is better to explicitly represent your data as objects-with-properties. &lt;a href="http://bitsquid.blogspot.com/2010/08/new-data-storage-model.html"&gt;I've blogged about this before&lt;/a&gt;, but since then I feel my thoughts on the subject have clarified and I've also had the opportunity to try it out in practice (with the localization tool). Such a representation could have the following key elements.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt; &lt;li&gt; The data consists of a set of objects-with-properties.&lt;/li&gt; &lt;li&gt; Each object is identified by a GUID.&lt;/li&gt; &lt;li&gt; Each property is identified by a string.&lt;/li&gt; &lt;li&gt; The property value can be null, a bool, a double, a vector3, a quaternion, a string, a data blob, a GUID or a set of GUIDs.&lt;/li&gt; &lt;li&gt;The data has a root object with GUID 0.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;We use a GUID to identify the object, since that means the ids of objects created by different users won't collide. GUID values are used to make links between objects. Note that we don't allow arrays, only sets. That is because array operations (move object from 5th place to 3rd place) are hard to merge. Set operations (insert object, remove object) are easy to merge.&lt;br /&gt;&lt;br /&gt;Here is what a change set for creating a player entity in a level might look like using this model. (I have shortened the GUIDs to 2 bytes to make the example more readable.)&lt;br /&gt;&lt;br /&gt;create #f341&lt;br /&gt;change_key #f341 "entity-type" "player"&lt;br /&gt;change_key #f341 "position" vector3(0,0,0)&lt;br /&gt;add_to_set #0000 "entities" #f341&lt;br /&gt;&lt;br /&gt;Note that the root object (which represents the level) has a property "entities" that contains the set of all entities in the level.&lt;br /&gt;&lt;br /&gt;To merge two such change sets, you could just append one to the other. You could even use the change set itself as your data format, if you don't want to use a database backend (that is actually what I did for the localization tool).&lt;br /&gt;&lt;br /&gt;I think most assets can be represented in the objects-with-properties model and it is a rather powerful way of making sure that they are mergable and collaboration-friendly. I will write all the new BitSquid tools with the object-with-properties model in mind and retrofit it into our older tools.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-8185846173146169242?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/8185846173146169242/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/03/collaboration-and-merging.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8185846173146169242'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/8185846173146169242'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/03/collaboration-and-merging.html' title='Collaboration and Merging'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1735105195054113981</id><published>2011-03-13T10:49:00.001+01:00</published><updated>2011-03-13T10:57:48.738+01:00</updated><title type='text'>A Tiny Expression Language</title><content type='html'>Putting some of the power of programming into the hands of artists and designers can be a great thing. When they can customize the behavior of an object directly, without making the roundtrip through a programmer, there is a lot more room for experimentation and iteration. As a result you get better looking things with more interesting interactions.&lt;br /&gt;&lt;br /&gt;Plus, if the artists do their own damn programming it means less work for me, so everybody wins.&lt;br /&gt;&lt;br /&gt;Of course I don’t expect artists to actually program, but rather to use tools that expose that power, such as shader graphs, &lt;a href="http://bitsquid.blogspot.com/2010/09/visual-scripting-data-oriented-way.html"&gt;visual scripting systems&lt;/a&gt;, or — the topic of this post — expression languages.&lt;br /&gt;&lt;br /&gt;By an expression language I mean a tiny little programming language that can be used to (and only used to) write one-line mathematical expressions, such as:&lt;br /&gt;&lt;br /&gt;sin(t)  + 0.1 * cos(10 * t)&lt;br /&gt;&lt;br /&gt;So  it is a really simple little calculator language. Simpler than &lt;a href="http://en.wikipedia.org/wiki/Lisp_(programming_language)"&gt;Lisp&lt;/a&gt;. Simpler than &lt;a href="http://en.wikipedia.org/wiki/Forth_(programming_language)"&gt;Forth&lt;/a&gt;. (Well maybe not, but simpler than trying to teach artists Lisp or Forth.) This simplicity has two advantages. First, it makes it easier to write and understand the expressions. Second, it makes it possible to compute the expressions efficiently, which is important, because it allows us to use them in more places without worrying too much about the performance or memory costs.&lt;br /&gt;&lt;br /&gt;The expression language can be used to replace static values where we want the artist to be able to specify more unique behaviors. Some examples:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;In the particle system it can be used to script complicated custom particle behaviors that are hard to produce with other types of controllers.&lt;/li&gt;&lt;li&gt;In the animation system it can be used to compute the play speed and blend values of animations based on controller variables. &lt;/li&gt;&lt;li&gt;In the physics system it can be used to define custom force fields to achieve special effects, such as tornados, explosions or whirlwinds.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Computing the Expressions&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Since the expressions are so simple, usually not more than a few operators, we need to be able to evaluate them with as little overhead as possible. Otherwise, the overhead will dominate the execution cost. This means that we should use a simple design, such as a &lt;a href="http://en.wikipedia.org/wiki/Stack_machine"&gt;stack-based virtual machine&lt;/a&gt;. That may sound complicated, but the concepts are really quite simple. What it means is that we convert our expression to a sequence of operations that pushes or pops data from a computation stack. So our example from above:&lt;br /&gt;&lt;br /&gt;sin(t)  + 0.1 * cos(10 * t)&lt;br /&gt;&lt;br /&gt;Gets converted into:&lt;br /&gt;&lt;br /&gt;t sin 0.1 10 t * cos * +&lt;br /&gt;&lt;br /&gt;Here &lt;em&gt;t&lt;/em&gt; pushes the value of the variable &lt;em&gt;t&lt;/em&gt; to the stack. &lt;em&gt;sin&lt;/em&gt; pops the top value from the stack, computes it and pushes the result to the stack. 0.1 pushes the value 0.1 to the stack. + pops two values from the stack, adds them together and pushes the result to the stack. * works the same way. If you go through the operations in the example you see that it computes the same result as the original expression.&lt;br /&gt;&lt;br /&gt;This way of writing expressions is called &lt;a href="http://en.wikipedia.org/wiki/Reverse_Polish_notation"&gt;Reverse Polish notation&lt;/a&gt; (RPN) or postfix notation and it’s the basis for the programming language &lt;a href="http://en.wikipedia.org/wiki/Forth_(programming_language)"&gt;Forth&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If we examine the issue, we see that we really just need three types of operations in our byte code:&lt;br /&gt;&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;PUSH_VARIABLE&lt;/dt&gt;&lt;dd&gt;pushes the content of a variable to the stack&lt;/dd&gt;&lt;dt&gt;PUSH_FLOAT&lt;/dt&gt;&lt;dd&gt;pushes a floating point number to the stack&lt;/dd&gt;&lt;dt&gt;COMPUTE_FUNCTiON&lt;/dt&gt;&lt;dd&gt;pops the arguments of the stack, computes the result and pushes it to the stack&lt;/dd&gt;&lt;dt&gt;END&lt;/dt&gt;&lt;dd&gt;marks the end of the byte code&lt;/dd&gt; &lt;/dl&gt;&lt;br /&gt;For simplicity I use 32 bits for each bytecode word. The upper 8 bits specify the type of the operation and the lower 24 bits is the data. For a variable the data is the index of the variable in a variable list. When compiling the bytecode you specify a list of variable names: {“t”, “x”}. And when executing you specify a corresponding list of variable values: {0.5, 20.1}. Similarly, for COMPUTE_FUNCTION, the data is an index into a function table. For PUSH_FLOAT we need an extra code word to hold the data, since we want 32 bit floats.&lt;br /&gt;&lt;br /&gt;We can now write the function that runs the virtual machine, it is not much code at all:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" lang="cpp" line="1"&gt;struct Stack&lt;br /&gt;{&lt;br /&gt; float *data;&lt;br /&gt; unsigned size;&lt;br /&gt; unsigned capacity;&lt;br /&gt;}; &lt;br /&gt;&lt;br /&gt;bool run(const unsigned *byte_code, const float *variables, Stack &amp;amp;stack)&lt;br /&gt;{&lt;br /&gt; const unsigned *p = byte_code;&lt;br /&gt; while (true) {&lt;br /&gt;  unsigned bc = *p++;&lt;br /&gt;  unsigned op = (bc &amp;gt;&amp;gt; 24);&lt;br /&gt;  int i = bc &amp;amp; 0xffffff;&lt;br /&gt;  switch (op) {&lt;br /&gt;   case BC_PUSH_FLOAT:&lt;br /&gt;    if (stack.size == stack.capacity) return false;&lt;br /&gt;    stack.data[stack.size++] = unsigned_to_float(*p++);&lt;br /&gt;    break;&lt;br /&gt;   case BC_PUSH_VAR:&lt;br /&gt;    if (stack.size == stack.capacity) return false;&lt;br /&gt;    stack.data[stack.size++] = variables[i];&lt;br /&gt;    break;&lt;br /&gt;   case BC_FUNCTION:&lt;br /&gt;    compute_function((OpCode)i, stack);&lt;br /&gt;    break;&lt;br /&gt;   case BC_END:&lt;br /&gt;    return true;&lt;br /&gt;  }&lt;br /&gt; }&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;&lt;strong&gt;Compiling the Byte Code&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Compiling an expression involves three phases, tokenizing the data to a stream of input symbols, transforming that stream from infix to postfix notation and finally generating the byte code from that.&lt;br /&gt;&lt;br /&gt;Tokenization means matching the identifiers in the expressions against a list of variable names and function names. We can also support contants that get converted to floats directly in the tokenization process. That is useful for things like &lt;em&gt;pi&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;The tokenization process converts our sample expression to something like this:&lt;br /&gt;&lt;br /&gt;{ sin, (, t, ), +, 0.1, *, cos, (, 10, *, t, ) }&lt;br /&gt;&lt;br /&gt;Now we need to convert this to infix notation. One way would be to write a full blown yacc parser with all that entails, but for this kind of simple expressions we can get away with something simpler, such as Dijkstra's &lt;a href="http://en.wikipedia.org/wiki/Shunting-yard_algorithm"&gt;Shunting Yard algorithm&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I actually use an even simpler variant that doesn't support right-associative operators, where I just process the input tokens one by one. If the token is a value or a variable I put it directly in the output. If the token is a function or an operator I push it to a function stack. But before I do that, I pop all functions with higher precedence from the function stack and put them in the output. Precedence takes parenthesis level into account, so a + nested in three parentheses has higher precedence than a * nested in two.&lt;br /&gt;&lt;br /&gt;Let us see how this works for our simple example:&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;Input&lt;/th&gt; &lt;th&gt;Output&lt;/th&gt; &lt;th&gt;Stack&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;sin ( t ) + 0.1 * cos ( 10 * t )&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;( t ) + 0.1 * cos ( 10 * t )&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;sin&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;+ 0.1 * cos ( 10 * t )&lt;/td&gt; &lt;td&gt;t&lt;/td&gt; &lt;td&gt;sin&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;0.1 * cos ( 10 * t )&lt;/td&gt; &lt;td&gt;t sin&lt;/td&gt; &lt;td&gt;+&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;* cos ( 10 * t )&lt;/td&gt; &lt;td&gt;t sin 0.1&lt;/td&gt; &lt;td&gt;+&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;cos ( 10 * t )&lt;/td&gt; &lt;td&gt;t sin 0.1&lt;/td&gt; &lt;td&gt;+ *&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;( 10 * t )&lt;/td&gt; &lt;td&gt;t sin 0.1&lt;/td&gt; &lt;td&gt;+ * cos&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;* t&lt;/td&gt; &lt;td&gt;t sin 0.1 10&lt;/td&gt; &lt;td&gt;+ * cos&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;t&lt;/td&gt; &lt;td&gt;t sin 0.1 10&lt;/td&gt; &lt;td&gt;+ * cos (*)&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt; &lt;td&gt;t sin 0.1 10 t&lt;/td&gt; &lt;td&gt;+ * cos (*)&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt; &lt;td&gt;t sin 0.1 10 t *&lt;/td&gt; &lt;td&gt;+ * cos&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt; &lt;td&gt;t sin 0.1 10 t * cos&lt;/td&gt; &lt;td&gt;+ *&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt; &lt;td&gt;t sin 0.1 10 t * cos *&lt;/td&gt; &lt;td&gt;+&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt; &lt;td&gt;t sin 0.1 10 t * cos * +&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;&amp;nbsp;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Constant Folding&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;To further improve efficiency we may want to distinguish the cases where the users have actually written an expression (such as “sin x”) from the cases where they have just written a constant (“0.5”) or a constant valued expression (“2*sin(pi)”). Luckily, constant folding is really easy to do in an RPL expression. &lt;br /&gt;&lt;br /&gt;After tokenizing and RPL conversion, the expression “2 * sin(pi)” has been converted to:&lt;br /&gt;&lt;br /&gt;2 3.14159265 sin *&lt;br /&gt;&lt;br /&gt;We can constant fold a function of arity n if the n argument that preceedes it are constants. So in the sample above we can constant fold &lt;em&gt;sin&lt;/em&gt; to:&lt;br /&gt;&lt;br /&gt;2 &lt;strong&gt;3.14159265 sin&lt;/strong&gt; *&lt;br /&gt;2 0 *&lt;br /&gt;&lt;br /&gt;Continuing, we can fold *&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;2 0 *&lt;/strong&gt;&lt;br /&gt;0&lt;br /&gt;&lt;br /&gt;If we end up with a constant expression, the byte code will used be a single PUSH_FLOAT operation. We can detect that and bypass the expression evaluation all together for that case.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Source Code&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;If you want to start playing with these things you can start with my &lt;a href="https://bitbucket.org/bitsquid/expression_language/src"&gt;expression language source code&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1735105195054113981?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1735105195054113981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/03/putting-some-of-power-of-programming.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1735105195054113981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1735105195054113981'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/03/putting-some-of-power-of-programming.html' title='A Tiny Expression Language'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4602516950322770212</id><published>2011-03-08T13:39:00.002+01:00</published><updated>2011-03-08T13:59:00.397+01:00</updated><title type='text'>BitSquid Tech: Benefits of data-driven renderer</title><content type='html'>Here are the slides from the talk I did last Wednesday at GDC in Nvidia's Game Technology Theater:&lt;br /&gt;&lt;br /&gt;&lt;div style="width:510px" id="__ss_7180753"&gt; &lt;strong style="display:block;margin:12px 0 4px"&gt;&lt;a href="http://www.slideshare.net/tobias_persson/bstech-gdc2011" title="BitSquid Tech: Benefits of a data-driven renderer"&gt;BitSquid Tech: Benefits of a data-driven renderer&lt;/a&gt;&lt;/strong&gt; &lt;object id="__sse7180753" width="510" height="426"&gt; &lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=bstechgdc2011-110307142040-phpapp01&amp;stripped_title=bstech-gdc2011&amp;userName=tobias_persson" /&gt; &lt;param name="allowFullScreen" value="true"/&gt; &lt;param name="allowScriptAccess" value="always"/&gt; &lt;embed name="__sse7180753" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=bstechgdc2011-110307142040-phpapp01&amp;stripped_title=bstech-gdc2011&amp;userName=tobias_persson" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="510" height="426"&gt;&lt;/embed&gt; &lt;/object&gt; &lt;div style="padding:5px 0 12px"&gt; &lt;/div&gt; &lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The presentation is also available with synced audio &lt;a href="http://nvidia.fullviewmedia.com/gdc2011/03-bitsquid.html"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4602516950322770212?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4602516950322770212/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/03/bitsquid-tech-benefits-of-data-driven.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4602516950322770212'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4602516950322770212'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/03/bitsquid-tech-benefits-of-data-driven.html' title='BitSquid Tech: Benefits of data-driven renderer'/><author><name>Tobias</name><uri>http://www.blogger.com/profile/16240529312060411542</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1045386620402268088</id><published>2011-02-25T23:28:00.002+01:00</published><updated>2011-02-28T13:26:33.547+01:00</updated><title type='text'>Managing Decoupling Part 3 - C++ Duck Typing</title><content type='html'>&lt;div style="background-attachment: initial; background-clip: initial; background-color: white; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; font: normal normal normal 13px/19px Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0.6em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.6em;"&gt;Some systems need to manipulate objects whose exact nature are not known. For example, a particle system has to manipulate particles that sometimes have mass, sometimes a full 3D rotation, sometimes only 2D rotation, etc. (A&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;good&lt;/span&gt;&amp;nbsp;particle system anyway, a bad particle system could use the same struct for all particles in all effects. And the struct could have some fields called&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;custom_1&lt;/span&gt;,&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;custom_2&lt;/span&gt;&amp;nbsp;used for different purposes in different effects. And it would be both inefficient, inflexible and messy.)&lt;br /&gt;&lt;br /&gt;Another example is a networking system tasked with synchronizing game objects between clients and servers. A very general such system might want to treat the objects as open JSON-like structs, with arbitrary fields and values:&lt;br /&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;&lt;/pre&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;{&lt;br /&gt;    "score" : 100,&lt;br /&gt;    "name": "Player 1"&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;We want to be able to handle such “general” or “open” objects in C++ in a nice way. Since we care about structure we don’t want the system to be strongly coupled to the layout of the objects it manages. And since we are performance junkies, we would like to do it in a way that doesn’t completely kill performance. I.e., we&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;don’t&lt;/span&gt;&amp;nbsp;want everything to inherit from a base class Object and define our JSON-like objects as:&lt;br /&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;&lt;/pre&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;typedef std::map&lt;std::string, *="" object=""&gt; OpenStruct;&lt;/std::string,&gt;&lt;/pre&gt;&lt;br /&gt;Generally speaking, there are three possible levels of flexibility with which we can work with objects and types in a programming language:&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;1. Exact typing - Only ducks are ducks&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;We require the object to&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;be&lt;/span&gt;&amp;nbsp;of a specific type. This is the typing method used in C and for classes without inheritance in C++.&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;2. Interface typing - If it says it’s a duck&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;We require the object to inherit from and implement a specific interface type. This is the typing method used by default in Java and C# and in C++ when inheritance and virtual methods are used. It is more flexible that the exact approach, but still introduces a coupling, because it forces the objects we manage to inherit a type defined by us.&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;Side rant: My general opinion is that while inheriting&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;interfaces&lt;/span&gt;&amp;nbsp;(abstract classes) is a valid and useful design tool, inheriting&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;implementations&lt;/span&gt;&amp;nbsp;is usually little more than a glorified “hack”, a way of patching parent classes by inserting custom code here and there. You almost always get a cleaner design when you build your objects with composition instead of with implementation inheritance.&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;3. Duck typing - If it quacks like a duck&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;We don’t care about the type of the object at all, as long as it has the fields and methods that we need. An example:&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;      def integrate_position(o, dt):&lt;br /&gt;          o.position = o.position + o.velocity * dt&lt;/pre&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;&lt;/pre&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;This method integrates the position of the object&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;o&lt;/span&gt;. It doesn’t care what the type of o is, as long as it has a “position” field and a “velocity” field.&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;Duck typing is the default in many “scripting” languages such as Ruby, Python, Lua and JavaScript. The reflection interface of Java and C# can also be used for duck typing, but unfortunately the code tends to become far less elegant than in the scripting languages:&lt;/div&gt;&lt;div mce_style="padding-left: 30px;" style="padding-left: 30px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;      o.GetType().GetProperty(“Position”).SetValue(o, o.GetType().&lt;br /&gt;         GetProperty(“Position”).GetValue(o, null) + o.GetType().&lt;br /&gt;         GetProperty(“Velocity”).GetValue(o, null) * dt, null)&lt;/pre&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;&lt;/pre&gt;What we want is some way of doing “duck typing” in C++.&lt;br /&gt;&lt;br /&gt;Let’s look at inheritance and virtual functions first, since that is the standard way of “generalizing” code in C++. It is true that you could do general objects using the inheritance mechanism. You would create a class structure looking something like:&lt;br /&gt;&lt;br /&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;class Object {...};&lt;br /&gt;class Int : public Object {...};&lt;br /&gt;class Float : public Object{...};&lt;/pre&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;&lt;/pre&gt;and then use&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;dynamic_cast&lt;/span&gt;&amp;nbsp;or perhaps your own hand-rolled RTTI system to determine an object’s class.&lt;br /&gt;But there are a number of drawbacks with this approach. It is quite verbose. The virtual inheritance model requires objects to be treated as pointers so they (probably) have to be heap allocated. This makes it tricky to get a good memory layout. And that hurts performance. Also, they are not PODs so we will have to do extra work if we want to move them to a co-processor or save them to disk.&lt;br /&gt;&lt;br /&gt;So I prefer something much simpler. A generic object is just a type enum followed by the data for the object:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_1.png" mce_href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_1.png"&gt;&lt;img alt="" class="aligncenter size-medium wp-image-1231" height="72" mce_src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_1-300x72.png" src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_1-300x72.png" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; display: block; margin-left: auto; margin-right: auto;" title="duck_typing_1" width="300" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;To pass the object you just pass its pointer. To make a copy, you make a copy of the memory block. You can also write it straight to disk and read it back, send it over network or to an SPU for off-core processing.&lt;br /&gt;&lt;br /&gt;To extract the data from the object you would do something like:&lt;br /&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;&lt;/pre&gt;&lt;pre escaped="true" style="font: normal normal normal 12px/18px Consolas, Monaco, 'Courier New', Courier, monospace;"&gt;unsigned type = *(unsigned *)o;&lt;br /&gt;if (type == FLOAT_TYPE)&lt;br /&gt;    float f = *(float *)(o + 4);&lt;/pre&gt;&lt;br /&gt;You don’t really need that many different object types:&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;bool&lt;/span&gt;,&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;int&lt;/span&gt;,&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;float&lt;/span&gt;,&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;vector3&lt;/span&gt;,&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;quaternion&lt;/span&gt;,&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;string&lt;/span&gt;,&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;array&lt;/span&gt;&amp;nbsp;and&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;dictionary&lt;/span&gt;&amp;nbsp;is usually enough. You can build more complicated types as aggregates of those, just as you do in JSON.&lt;br /&gt;&lt;br /&gt;For a dictionary object we just store the name/key and type of each object:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_2.png" mce_href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_2.png"&gt;&lt;img alt="" class="aligncenter size-large wp-image-1232" height="97" mce_src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_2-1024x138.png" src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_2-1024x138.png" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; display: block; margin-left: auto; margin-right: auto;" title="duck_typing_2" width="725" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I tend to use a four byte value for the name/key and not care if it is an integer, float or a 32-bit string hash. As long as the data is queried with the same key that it was stored with, the right value will be returned. I only use this method for small structs, so the probability for a hash collision is close to zero and can be handled by “manual resolution”.&lt;br /&gt;&lt;br /&gt;If we have many objects with the same “dictionary type” (i.e. the same set of fields, just different values) it makes sense to break out the definition of the type from the data itself to save space:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_3.png" mce_href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_3.png"&gt;&lt;img alt="" class="aligncenter size-large wp-image-1233" height="220" mce_src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_3-1024x312.png" src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_3-1024x312.png" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; display: block; margin-left: auto; margin-right: auto;" title="duck_typing_3" width="725" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here the&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;offset&lt;/span&gt;&amp;nbsp;field stores the offset of each field in the data block. Now we can efficiently store an array of such data objects with just one copy of the dictionary type information:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_4.png" mce_href="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_4.png"&gt;&lt;img alt="" class="aligncenter size-large wp-image-1234" height="114" mce_src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_4-1024x162.png" src="http://altdevblogaday.org/wp-content/uploads/2011/02/duck_typing_4-1024x162.png" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; display: block; margin-left: auto; margin-right: auto;" title="duck_typing_4" width="725" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Note that the storage space (and thereby the cache and memory performance) is exactly the same as if we were using an array of regular C structs, even though we are using a completely open free form JSON-like struct. And extracting or changing data just requires a little pointer arithmetic and a cast.&lt;br /&gt;&lt;br /&gt;This would be a good way of storing particles in a particle system. (Note: This is an array-of-structures approach, you can of course also use duck typing with a sturcture-of-arrays approach. I leave that as an exercise to the reader.)&lt;br /&gt;&lt;br /&gt;If you are a graphics programmer all of this should look pretty familiar. The “dictionary type description” is very much like a “vertex data description” and the “dictionary data” is awfully similar to “vertex data”. This should come as no big surprise. Vertex data is generic flexible data that needs to be processed fast in parallel on in-order processing units. It is not strange that with the same design criterions we end up with a similar solution.&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="strong" mce_style="font-weight: bold;" style="font-weight: bold;"&gt;Morale and musings&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is OK to manipulate blocks of raw memory! Pointer arithmetic does not destroy your program! Type casts are not “dirty”! Let your freak flag fly!&lt;br /&gt;&lt;br /&gt;Data-oriented-design and object-oriented design are not polar opposites. As this example shows a data-oriented design can in a sense be “more object-oriented” than a standard C++ virtual function design, i.e., more similar to how objects work in high level languages such as Ruby and Lua.&lt;br /&gt;&lt;br /&gt;On the other hand, data-oriented-design and inheritance&amp;nbsp;&lt;span class="Apple-style-span" mce_fixed="1" mce_name="em" mce_style="font-style: italic;" style="font-style: italic;"&gt;are&lt;/span&gt;&amp;nbsp;enemies. Because designs based on base class pointers and virtual functions want objects to live individually allocated on the heap. Which means you cannot control the memory layout. Which is what DOD is all about. (Yes, you can probably do clever tricks with custom allocators and patching of vtables for moving or deserializing objects, but why bother, DOD is simpler.)&lt;br /&gt;&lt;br /&gt;You could also store function pointers in these open structs. Then you would have something very similar to Ruby/Lua objects. This could probably be used for something great. This is left as an exercise to the reader.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1045386620402268088?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1045386620402268088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/02/some-systems-need-to-manipulate-objects.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1045386620402268088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1045386620402268088'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/02/some-systems-need-to-manipulate-objects.html' title='Managing Decoupling Part 3 - C++ Duck Typing'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-7698091096731550649</id><published>2011-02-11T07:22:00.002+01:00</published><updated>2011-02-11T07:28:10.458+01:00</updated><title type='text'>Managing Coupling Part 2 — Polling, Callbacks and Events</title><content type='html'>&lt;span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;In my last post, I talked a bit about the importance of decoupling and how one of the fundamental challenges in system design is to keep systems decoupled while still allowing the necessary interactions to take place.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This time I will look at one specific such challenge: when a low level system needs to notify a high level system that something has happened. For example, the animation system may want to notify the gameplay system that the character’s foot has touched the ground, so that a footstep sound can be played.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Note that the reverse is not a problem. The high level system knows about the low level system and can call it directly. But the low level system shouldn’t know or care about the high level system.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are three common techniques for handling such notifications: polling, callbacks and events.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Polling&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A polling system calls some function every frame to check if the event it is interested in has occurred. Has the file been downloaded yet? What about now? Are we there yet?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Polling is often considered “ugly” or “inefficient”. And indeed, in the desktop world, polling is very impolite, since it means busy-waiting and tying up 100 % of the CPU in doing nothing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But in game development the situation is completely different. We are already doing a ton of stuff every 33 ms (or half a ton of stuff every 17 ms). As long as we don’t poll a huge amount of objects, polling won’t have any impact on the framerate.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And code that uses polling is often easier to write and ends up better designed than code that uses callbacks or events. For example, it is much easier to just check if the A key is pressed inside the character controller, than to write a callback that gets notified if A is pressed and somehow forward that information to the character controller.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, in my opinion, you should actually prefer to use polling whenever possible (i.e., when you don’t have to monitor a huge number of objects).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Some areas where polling work well are: file downloads, server browsing, game saving, controller input, etc.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An area less suited for polling is physics collisions, since there are N*N possible collisions that you would have to poll for. (You could argue that rather than polling for a collision between two&amp;nbsp;&lt;i&gt;specific&lt;/i&gt;&amp;nbsp;objects, you could poll for a collision between&amp;nbsp;&lt;i&gt;any&lt;/i&gt;&amp;nbsp;two objects. My reply would be that in that case you are no longer strictly polling, you are in fact using a rudimentary effect system.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Callbacks&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In a callback solution, the low level system stores a list of high level functions to call when certain events occur.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An important question when it comes to callbacks is if the callback should be called immediately when the event occurs, or if it should be queued up and scheduled for execution later in the frame.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I much prefer the latter approach. If you do callbacks immediately you not only trash your instruction and data caches. You also prevent multithreading (unless you use locks everywhere to prevent the callbacks from stepping on each other). And you open yourself up to the nasty bug where a callback through a chain of events ends up destroying the very objects you are looping over.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is much better to queue up all callbacks and only execute them when the high level system asks for it (with an&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;execute_callbacks()&lt;/span&gt;&amp;nbsp;call). That way you always know when the callbacks occur. Side effects can be minimized and the code flow is clearer. Also, with this approach there is no problem with generating callbacks on the SPU and merging the queue with other callback queues later.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The only thing you need to worry about with delayed callbacks is that the objects that the callback refers to might have been destroyed between the time when the callback was generated and the time when it was actually called. But this is neatly handled by using the ID reference system that I talked about in the previous post. Using that technique, the callback can always determine if the objects still exist.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note that the callback system outlined here has some similarities with the polling system — in that the callbacks only happen when we explicitly poll for them.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is not self-evident how to represent a callback in C++. You might be tempted to use a member function pointer. Don’t. The casting and typing rules make it near impossible to use them for any kind of generic callback mechanism. Also, don’t use an “observer pattern”, where the callback must be some object that inherits from an AnimationEventObserver class and overrides handle_animation_event(). That just leads to tons of typing and unnecessary heap allocation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There is an interesting article about fast and efficient C++ delegates at&amp;nbsp;&lt;a href="http://www.codeproject.com/KB/cpp/FastDelegate.aspx" style="color: #0000cc;" target="_blank"&gt;http://www.codeproject.com/KB/&lt;wbr&gt;&lt;/wbr&gt;cpp/FastDelegate.aspx&lt;/a&gt;. It looks solid, but personally I’m not comfortable with making something that requires so many platform specific tricks one of the core mechanisms of my engine.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So instead I use regular C function pointers for callbacks. This means that if I want to call a member function, I have to make a little static function that calls the member function. That is a bit annoying, but better than the alternatives.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Isn’t it interesting that when you try to design a clean and flexible C++ API it often ends up as pure C.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When you use C callbacks you typically also want to pass some data to them. The typical approach in the C world is to use a&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;void *&lt;/span&gt;&amp;nbsp;to “user data” that is passed to the callback function. I actually prefer a slightly different approach. Since I sometimes want to pass more data than a single&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;void *&lt;/span&gt;&amp;nbsp;I use something like this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;struct Callback16&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre-wrap;"&gt;&amp;nbsp;   &lt;/span&gt;void (*f)(void);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre-wrap;"&gt;&amp;nbsp;   &lt;/span&gt;char data[12];&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;};&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There aren’t a huge amount of callbacks, so using 16 bytes instead of 8 to store them doesn’t matter. You could go to Callback32 if you want the option to store even more data.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When calling the callback, I cast the function pointer to the appropriate type and pass a pointer to its data as the first parameter.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;typedef void (*AnimationEventCallback)(void *, unsigned);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;AnimationEventCallback f = (AnimationEventCallback)&lt;wbr&gt;&lt;/wbr&gt;callback.f;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;f(callback.data, event_id);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I’m not worried about casting the function pointer back and forth between a generic type and a specific one or about casting the data in and out of a raw buffer. Type safety is nice, but there is an awful lot of power in juggling blocks of raw memory. And you don’t have to worry that much about someone casting the data to the wrong type, because doing so will 99% of the time cause a huge spectacular crash, and the error will be fixed immediately.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Events&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Event systems are in many ways similar to callback systems. The only difference is that instead of storing a direct pointer to a callback function, they store an event enum. The high level system that polls the events decides what action to take for each enum.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In my opinion, callbacks work better when you want to listen to specific notifications: “Tell me when this sound has finished playing.” Events work better when you process them in bulk: “Check all collision notifications to see if the forces involved are strong enough to break the objects.” But much of it is a matter of taste.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For storing the event queues (or callback queues) I just use a raw buffer (&lt;span style="font-family: 'courier new', monospace;"&gt;Vector&lt;char&gt;&lt;/char&gt;&lt;/span&gt;&amp;nbsp;or&lt;span style="font-family: 'courier new', monospace;"&gt;char[FIXED_SIZE]&lt;/span&gt;) where I concatenate all events and their data:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;[event_1_enum] [event_1_data] [event_2_enum] [event_2_data] …&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The high level system just steps through this buffer, processing each event in turn. Note that event queues like this are easy to move, copy, merge and transfer between cores. (Again, the power of raw data buffers.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this design there is only a single high level system that polls the events of a particular low level system. It understands what all the events mean, what data they use and knows how to act on them. The sole purpose of the event system (it is not even much of a “system”, just a stream of data) is to pass notifications from the low level to the high.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is in my opinion exactly what an event system should be. It should not be a magic global switchboard that dispatches events from all over the code to whoever wants to listen to them. Because that would be horrid!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-7698091096731550649?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/7698091096731550649/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/02/managing-decoupling-part-2-polling.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7698091096731550649'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7698091096731550649'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/02/managing-decoupling-part-2-polling.html' title='Managing Coupling Part 2 — Polling, Callbacks and Events'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-420607131797039630</id><published>2011-01-30T22:43:00.000+01:00</published><updated>2011-01-30T22:43:40.264+01:00</updated><title type='text'>Managing Coupling</title><content type='html'>(This post has also been posted to&amp;nbsp;&lt;a href="http://altdevblogaday.com/"&gt;http://altdevblogaday.com/&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: dejarip-1, dejarip-2, sans-serif; font-size: 16px; line-height: 22px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;The only way of staying sane while writing a large complex software system is to regard it as a collection of smaller, simpler systems. And this is only possible if the systems are properly decoupled.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Ideally, each system should be completely isolated. The effect system should be the only system manipulating effects and it shouldn’t do anything else. It should have its own&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;update()&lt;/span&gt;&amp;nbsp;call just for updating effects. No other system should care how the effects are stored in memory or what parts of the update happen on the CPU, SPU or GPU. A new programmer wanting to understand the system should only have to look at the files in the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;effect_system&lt;/span&gt;&amp;nbsp;directory. It should be possible to optimize, rewrite or drop the entire system without affecting any other code.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Of course, complete isolation is not possible. If anything interesting is going to happen, different systems will at some point have to talk to one another, whether we like it or not.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;The main challenge in keeping an engine “healthy” is to keep the systems as decoupled as possible while still allowing the necessary interactions to take place. If a system is properly decoupled, adding features is simple. Want a wind effect in your particle system? Just write it. It’s just code. It shouldn’t take more than a day. But if you are working in a tightly coupled project, such seemingly simple changes can stretch out into nightmarish day-long debugging marathons.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;If you ever get the feeling that you would prefer to test an idea out in a simple toy project rather than in “the real engine”, that’s a clear sign that you have too much coupling.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Sometimes, engines start out decoupled, but then as deadlines approach and features are requested that don’t fit the well-designed APIs, programmers get tempted to open back doors between systems and introduce couplings that shouldn’t really be there. Slowly, through this “coupling creep” the quality of the code deteriorates and the engine becomes less and less pleasant to work with.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Still, programmers cannot lock themselves in their ivory towers. “That feature doesn’t fit my API,” is never an acceptable answer to give a budding artist. Instead, we need to find ways of handling the challenges of coupling without destroying our engines. Here are four quick ideas to begin with:&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;strong&gt;1. Be wary of “frameworks”.&lt;/strong&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;By a “framework” I mean any kind of system that requires all your other code to conform to a specific world view. For example, a scripting system that requires you to add a specific set of macro tags to all your class declarations.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Other common culprits are:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li style="list-style-type: disc; margin-top: 5px;"&gt;Root classes that every object must inherit from&lt;/li&gt;&lt;li style="list-style-type: disc; margin-top: 5px;"&gt;RTTI/reflection systems&lt;/li&gt;&lt;li style="list-style-type: disc; margin-top: 5px;"&gt;Serialization systems&lt;/li&gt;&lt;li style="list-style-type: disc; margin-top: 5px;"&gt;Reference counting systems&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;Such global systems introduce a coupling across the entire engine. They rudely enforce certain design choices on all subsystems, design choices which might not be appropriate for them. Sometimes the consequences are serious. A badly thought out reference system may prevent subsystems from multithreading. A less than stellar serialization system can make linear loading impossible.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Often, the motivation given for such global systems is that they increase maintainability. With a global serialization system, we just have to make changes at a single place. So refactoring is much easier, it is claimed.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;But in practice, the reverse is often true. After a while, the global system has infested so much of the code base that making any significant change to it is virtually impossible. There are just too many things that would have to be changed, all at the same time.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;You would be much better off if each system just defined its own&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;save()&lt;/span&gt;&amp;nbsp;and&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;load()&lt;/span&gt;&amp;nbsp;functions.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;strong&gt;2. Use high level systems to mediate between low level systems.&lt;/strong&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Instead of directly coupling low level systems, use a high level system to shuffle data between them. For example, handling footstep sounds might involve the animation system, the sound system and the material system. But none of these systems should know about the others.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;So instead of directly coupling them, let the gameplay system handle their interactions. Since the gameplay system knows about all three systems, it can poll the animation system for events defined in the animation data, sample the ground material from the material system and then ask the sound system to play the appropriate sound.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Make sure that you have a clear separation between this messy gameplay layer, that can poke around in all other systems, and your clean engine code that is isolated and decoupled. Otherwise there is always a risk that the mess propagates downwards and infects your clean systems.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;In the BitSquid Tech we put the messy stuff either in Lua or in Flow (our visual scripting tool, similar to Unreal’s Kismet). The language barrier acts as a firewall, preventing the spread of the messiness.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;strong&gt;3. Duplicating code is sometimes OK!&lt;/strong&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Avoiding duplicated code is one of the fundamentals of software design. Entities should not be needlessly multiplied. But there are instances when you are better off breaking this rule.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;I’m not advocating copy-paste-programming or writing complicated algorithms twice. I’m saying that sometimes people can get a little overzealous with their code reuse. Code sharing has a price that is not always recognized, in that it increases system coupling. Sometimes a little judiciously applied code duplication can be a better solution.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;An typical example is the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;String&lt;/span&gt;&amp;nbsp;class (or&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;std::string&lt;/span&gt;&amp;nbsp;if you are thusly inclined). In some projects you see the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;String&lt;/span&gt;&amp;nbsp;class used almost everywhere. If something is a string, it should use the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;String&lt;/span&gt;class, the reasoning seems to be. But many systems that handle strings do not need all the features that you find in your typical&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;String&lt;/span&gt;&amp;nbsp;class: locales,&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;find_first_of()&lt;/span&gt;, etc. They are fine with just a&lt;span style="font-family: 'courier new', monospace;"&gt;const char *&lt;/span&gt;,&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;strcmp()&lt;/span&gt;&amp;nbsp;and maybe one custom written (potentially duplicated) three-line function. So why not use that, the code will be much simpler and easier to move to SPUs.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Another culprit is&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;FixedArray&lt;int, 5=""&gt; a&lt;/int,&gt;&lt;/span&gt;. Sure, if you write&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;int a[5]&lt;/span&gt;&amp;nbsp;instead you will have to duplicate the code for bounds checking if you want that. But your code can be understood and compiled without&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;fixed_array.h&lt;/span&gt;&amp;nbsp;and template instantiation.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;And if you have any method that takes a&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;const Vector&lt;t&gt; &amp;amp;v&lt;/t&gt;&lt;/span&gt;&amp;nbsp;as argument you should probably take&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;const T *begin, const T *end&lt;/span&gt;&amp;nbsp;instead. Now you don’t need the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;vector.h&lt;/span&gt;&amp;nbsp;header, and the caller is not forced to use a particular&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;Vector&lt;/span&gt;&amp;nbsp;class for storage.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;A final example: I just wrote a patching tool that manipulates our bundles (aka pak-files). That tool duplicates the code for parsing the bundle headers, which is already in the engine. Why? Well, the tool is written in C# and the engine in C++, but in this case that is kind of beside the point. The point is that sharing that code would have been a significant effort.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;First, it would have had to be broken out into a separate library, together with the related parts of the engine. Then, since the tool requires some functionality that the engine doesn’t (to parse bundles with foreign endianness) I would have to add a special function for the tool, and probably a&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;#define TOOL_COMPILE&lt;/span&gt;&amp;nbsp;since I don’t want that function in the regular builds. This means I need a special build configuration for the tool. And the engine code would forever be dirtied with the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;TOOL_COMPILE&lt;/span&gt;&amp;nbsp;flag. And I wouldn’t be able to rearrange the engine code as I wanted in the future, since that might break the tool compile.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;In contrast, rewriting the code for parsing the headers was only 10 minutes of work. It just reads a vector of string hashes. It's not rocket science. Sure, if I ever decide to change the bundle format, I might have to spend another 10 minutes rewriting that code. I think I can live with that.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;Writing code is not the problem. The messy, complicated couplings that prevent you from writing code is the problem.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;strong&gt;4. Use IDs to refer to external objects.&lt;/strong&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;At some point one of your systems will have to refer to objects belonging to another system. For example, the gameplay layer may have to move an effect around or change its parameters.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;I find that the most decoupled way of doing that is by using an ID. Let’s consider the alternatives.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;Effect *, shared_ptr&lt;effect&gt;&lt;/effect&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;blockquote class="webkit-indent-blockquote" style="border-bottom-style: none; border-color: initial; border-left-color: rgb(221, 221, 221) !important; border-left-style: none; border-left-width: 4px !important; border-right-style: none; border-top-style: none; border-width: initial; margin-bottom: 0px; margin-left: 40px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;div&gt;A direct pointer is no good, because it will become invalid if the target object is deleted and the effect system should have full control over when and how its objects are deleted. A standard&lt;span style="font-family: 'courier new', monospace;"&gt;shared_ptr&lt;/span&gt;&amp;nbsp;won’t work for the same reason, it puts the life time of&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;Effect&lt;/span&gt;&amp;nbsp;objects out of the control of the effect system.&lt;/div&gt;&lt;/blockquote&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;Weak_ptr&lt;effect&gt;, handle&lt;effect&gt;&lt;/effect&gt;&lt;/effect&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;blockquote class="webkit-indent-blockquote" style="border-bottom-style: none; border-color: initial; border-left-color: rgb(221, 221, 221) !important; border-left-style: none; border-left-width: 4px !important; border-right-style: none; border-top-style: none; border-width: initial; margin-bottom: 0px; margin-left: 40px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;div&gt;By this I mean some kind of reference-counted, indirect pointer to the object. This is better, but still too strongly coupled for my taste. The indirect pointer will be accessed both by the external system (for dereferencing and changing the reference count) and by the effect system (for deleting the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;Effect&lt;/span&gt;&amp;nbsp;object or moving it in memory). This has the potential for creating threading problems.&lt;/div&gt;&lt;/blockquote&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;blockquote class="webkit-indent-blockquote" style="border-bottom-style: none; border-color: initial; border-left-color: rgb(221, 221, 221) !important; border-left-style: none; border-left-width: 4px !important; border-right-style: none; border-top-style: none; border-width: initial; margin-bottom: 0px; margin-left: 40px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;div&gt;Also, this construct kind of implies that external systems can dereference and use the&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;Effect&lt;/span&gt;whenever they want to. Perhaps the effect system only allows that when its&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;update()&lt;/span&gt;&amp;nbsp;loop is not running and want to&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;assert()&lt;/span&gt;&amp;nbsp;that. Or perhaps the effect system doesn’t want to allow direct access to its objects at all, but instead double buffer all changes.&lt;/div&gt;&lt;/blockquote&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;So, in order to allow the effect system to freely reorganize its data and processing in any way it likes, I use IDs to identify objects externally. The IDs are just an integers uniquely identifying an object, that the user can throw away when she is done with them. They don’t have to be “released” like a&lt;span style="font-family: 'courier new', monospace;"&gt;weak_ptr&lt;/span&gt;, which removes a point of interaction between the systems. It also means that the IDs are PODs. We can copy and move them freely in memory, juggle them in Lua and DMA them back-and-forth to our heart’s content. All of this would be a lot more complicated if we had to keep reference counts.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;In the system we need a fast way of mapping IDs back to objects. Note that&amp;nbsp;&lt;span style="font-family: 'courier new', monospace;"&gt;std::map&lt;unsigned, *="" object=""&gt;&lt;/unsigned,&gt;&lt;/span&gt;&amp;nbsp;is not a fast way! But there are a number of possibilities. The simplest is to just use a fixed size array with object pointers:&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: 'courier new', monospace;"&gt;Object *lookup[MAX_OBJECTS];&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;If your system has a maximum of 4096 objects, use 12 bits from the key to store an index into this array and the remaining 20 bits as a unique identifier (i.e., to detect the case when the original object has been deleted and a new object has been created at the same index). If you need lots of objects, you can go to a 64 bit ID.&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0.8em; margin-left: 0px; margin-right: 0px; margin-top: 0.4em; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;/div&gt;&lt;div&gt;That's it for today, but this post really just scratches the surface of decoupling. There are a lot of other interesting techniques to look at, such as events, callbacks and “duck typing”. Maybe something for a future entry...&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-420607131797039630?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/420607131797039630/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2011/01/managing-coupling.html#comment-form' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/420607131797039630'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/420607131797039630'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2011/01/managing-coupling.html' title='Managing Coupling'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1122681008519885805</id><published>2010-12-16T19:40:00.000+01:00</published><updated>2010-12-16T19:40:07.652+01:00</updated><title type='text'>BitSquid C++ Coding Style</title><content type='html'>The BitSquid Coding Style Guidelines:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.bitsquid.se/files/coding_style.html"&gt;http://www.bitsquid.se/files/coding_style.html&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1122681008519885805?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1122681008519885805/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/12/bitsquid-c-coding-style.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1122681008519885805'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1122681008519885805'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/12/bitsquid-c-coding-style.html' title='BitSquid C++ Coding Style'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4537564456726666646</id><published>2010-10-20T23:44:00.000+02:00</published><updated>2010-10-20T23:44:39.175+02:00</updated><title type='text'>A* is Overrated</title><content type='html'>&lt;br /&gt;Open any textbook on AI and you are sure to find a description of the A*-algorithm. Why? Because it is a provably correct solution to well defined, narrow problem. And that just makes it so... scientific and... teachable. Universities teach Computer Science, because nobody knows how to teach Computer Artistry or Computer Craftsmanship. Plus science is important, everybody knows that.&lt;br /&gt;&lt;br /&gt;But I digress.&lt;br /&gt;&lt;br /&gt;A* is a useful algorithm. You should know it. But when we are talking about AI navigation, A* is just an imperfect solution to a tiny and not very difficult part of the problem. Here are three reasons why you shouldn't give A* so much credit:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. Pathfinding is often not that important.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In many games, the average enemy is only on-screen a couple of seconds before it gets killed. The player doesn't even have a chance to see if it is following a path or not. Make sure it does something interesting that the player can notice instead.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. A* is not necessarily the best solution to path finding problems.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A* finds the shortest path, but that is usually not that important. As long as the path "looks reasonable" and is "short enough" it works fine. This opens the floor for a lot of other algorithms.&lt;br /&gt;&lt;br /&gt;For long-distance searches you get the best performance by using a hierarchical search structure. A good design of the hierarchy structure is more important for performance than the algorithm you use to search at each level.&lt;br /&gt;&lt;br /&gt;Path-finding usually works on a time scale of seconds, which means that we can allow up to 30 frames of latency in answering search queries. To distribute the work evenly, we should use a search algorithm that runs incrementally. And it should of course parallelize to make the most use of modern hardware.&lt;br /&gt;&lt;br /&gt;So what we really want is an incremental, parallelizable, hierarchical algorithm to find a shortish path, not cookie cutter A*.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. Even if we use A*, there are a lot of other navigational issues that are more important and harder to solve than path finding.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Implementation, for starters. A* only reaches its theoretical performance when the data structures are implemented efficiently. You can still find A* implementations where the open nodes are kept in a sorted list rather than a heap. Ouch. Some implementation details are non-trivial. How do you keep track of visited nodes? A hashtable? (Extra memory and CPU.) A flag in the nodes themselves? (Meaning you can only run one search at a time over the nodes.)&lt;br /&gt;&lt;br /&gt;How is the graph created? Hand edited in the editor? How much work is it to redo it every time the level changes? Automatic? What do you do when the automatic generation fails? Can you modify it? Will these modifications stick if the level is changed and the graph is regenerated? Can you give certain nodes special properties when the graph is auto-generated?&lt;br /&gt;&lt;br /&gt;How do you handle dynamic worlds were paths can be blocked and new paths can be opened? Can you update the graph dynamically in an efficient way? What happens to running queries? How do past queries get invalidated when their path is no longer valid?&lt;br /&gt;&lt;br /&gt;Once you have a path, how do you do local navigation? I.e. how do you follow the path smoothly without colliding with other game agents or dynamic objects? Do AI units collide against the graph or against real physics? What happens if the two don't match up? How do you match animations to the path following movement?&lt;br /&gt;&lt;br /&gt;In my opinion, local navigation is a lot more important to the impression a game AI makes than path finding. Nobody will care that much if an AI doesn't follow the 100 % best part towards the target. To err is human. But everybody will notice if the AI gets stuck running against a wall because its local navigation system failed.&lt;br /&gt;&lt;br /&gt;In the next blog post I will talk a bit about how local navigation is implemented in the BitSquid engine.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4537564456726666646?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4537564456726666646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/10/is-overrated.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4537564456726666646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4537564456726666646'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/10/is-overrated.html' title='A* is Overrated'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-3306448225892415955</id><published>2010-10-18T09:54:00.001+02:00</published><updated>2010-10-19T11:01:55.845+02:00</updated><title type='text'>Time Step Smoothing</title><content type='html'>Today I'm going to argue for smoothing your update delta time, i.e. to apply a simple low pass filter to it before using it to update the game state:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;float dt = elapsed_time();&lt;br /&gt;dt = filter(dt);&lt;br /&gt;update_game(dt);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This assumes of course that you are using a variable time step. A fixed timestep is always constant and doesn't need smoothing.&lt;br /&gt;&lt;br /&gt;Some people will say that you should always use a fixed time step, and there is a lot of merit to that. A stable frame rate always looks better and with a fixed time step the gameplay code behaves more deterministically andis easier to test. You can avoid a whole class of bugs that "only appear whenthe frame rate drops below 20 Hz".&lt;br /&gt;&lt;br /&gt;But there are situations where using a fixed frame rate might not be possible.For example if your game is dynamic and player driven, the player can alwayscreate situations when it will drop below 30 fps (for example, by gathering all the physics objects in the same place or aggroing all the enemies on the level). If you are using a fixed time step, this will force the game into slow motion, which may not be acceptable.&lt;br /&gt;&lt;br /&gt;On the PC you have to deal with a wide range of hardware. From the über-gamerthat wants her million dollar rig to run the game in 250 fps (even though themonitor only refreshes at 90 Hz) to the poor min spec user who hopes that he canat least get the game to boot and chug along at 15 fps. It is hard to satisfyboth customers without using a variable time step.&lt;br /&gt;&lt;br /&gt;The BitSquid engine does not dictate a time step policy, that is entirely upto the game. You can use fixed, variable, smoothed or whatever elsefits your game.&lt;br /&gt;&lt;br /&gt;So, why am I arguing for smoothing variable time steps? A basic update loopwithout time smoothing looks something like this:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;float dt = elapsed_time();&lt;br /&gt;update_game(dt);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Each frame, we measure the time that has elapsed and use that to advancethe game timer. &lt;br /&gt;&lt;br /&gt;This seems straight forward enough. Every frame we advance the game clock by the time that has elapsed since the last frame. Even if the time step oscillates you would expect this to result in objects moving in a smooth and natural manner:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/TLsmHew-t2I/AAAAAAAAAE0/oP3JA9f9TZ0/s1600/time_step_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/TLsmHew-t2I/AAAAAAAAAE0/oP3JA9f9TZ0/s1600/time_step_1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Here the dots represents the times (x-axis) at which we sample an object's position (y-axis). As you can see the samples result in a straight line for objects moving at a constant velocity, even though the time steps vary. All seems well.&lt;br /&gt;&lt;br /&gt;Except this graph doesn't tell the entire truth. The time shown in this graph is the time when we start simulating a frame, not when that frame is drawn on screen. For each frame we simulate, there is an amount of latency from the point when we start simulating the frame until it is drawn. In fact, the variation in that latency is what accounts for the varying elapsed times that we measure. (Well not entirely, there can be GPU latency that doesn't show up in the delta time, but for the sake of simplicity, let's assume that the latency of the current frame is the delta time we measure in the next frame.)&lt;br /&gt;&lt;br /&gt;If the take the latency into account and offset each time value in the graph above with the latency (the elapsed time of the next frame), we get a truer representation of what the player actually sees in the game:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_D6mTIm8lbTo/TLsmT-uQuNI/AAAAAAAAAFA/Tf7t7ZKUvEE/s1600/time_step_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="278" src="http://3.bp.blogspot.com/_D6mTIm8lbTo/TLsmT-uQuNI/AAAAAAAAAFA/Tf7t7ZKUvEE/s320/time_step_2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Oops, the line is no longer straight and there is a good chance that the user will notice this as jerky motion.&lt;br /&gt;&lt;br /&gt;If we could predict perfectly what the latency of the next frame would be, we could use that rather than the elapsed time of the last frame to advance our timers. This would compensate perfectly for the latency and the user would again see a straight line.&lt;br /&gt;&lt;br /&gt;Unfortunately, perfectly predicting how long the next frame will take to simulate and render is impossible. But we can make a guess and that is one of the goals of our time step filter:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;To make a better guess of how long it will take to update and draw the     next frame.&lt;/li&gt;&lt;/ul&gt;When we are using the standard update loop, we are guessing that the next frame will take as long to draw as the current frame. This is a decent guess, but we can do better. For example, if we calculate the mean delta time of the last few frames and use that instead, we can reduce the averageerror by about 30 %. (Says my quick and dirty guesstimate calculation. If you want a real figure you have to presume a delta time probability distribution and do the math. Because I'm too lazy to.)&lt;br /&gt;&lt;br /&gt;Here is the same graph again, but here we have smoothed the time step that we use to evaluate the object's position.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_D6mTIm8lbTo/TLsmlyZ0S3I/AAAAAAAAAFE/D1KIFcvA_B4/s1600/time_step_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_D6mTIm8lbTo/TLsmlyZ0S3I/AAAAAAAAAFE/D1KIFcvA_B4/s1600/time_step_3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;As you can see, we get a straighter line than we got by using the "raw" variable time steps.&lt;br /&gt;&lt;br /&gt;Can we do anything to improve the prediction even further? Yes, if we know more about the data we are predicting. For example, on the PC, you can get occasional "glitch" frames with very long update times if you get interrupted by other processes. If you are using a straight mean, any time you encounter such a glitch frame you will predict that the next few frames will also take very long to update. But the glitch frame was an anomaly. You will get a better predictor by ignoring such outliers when you calculate the mean.&lt;br /&gt;&lt;br /&gt;Another pattern in the frame rate that is quite common is a zig-zag pattern where the game oscillates between two different frame rates:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_D6mTIm8lbTo/TLv2bIDfUSI/AAAAAAAAAFQ/ldhWW-pnsZs/s1600/time_step_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="306" src="http://1.bp.blogspot.com/_D6mTIm8lbTo/TLv2bIDfUSI/AAAAAAAAAFQ/ldhWW-pnsZs/s320/time_step_4.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;You could make a predictor for this. The predictor could detect if there is a strong correlation between the delta times for even and odd frames and if so, it could use only the even/odd frames to calculate the mean.&lt;br /&gt;&lt;br /&gt;But I don't recommend doing that. For two reasons. First, the zig-zag pattern is usually a result of a bad setup in the engine (or, if you are unfortunate, in the driver). It is better to fix that problem and get rid of the oscillations than to work around them. Second, heavily oscillating time steps, which you would get if you tried to follow the zig-zag pattern, tend to make gameplay code behave badly.&lt;br /&gt;&lt;br /&gt;So this gives us a second goal of time step filtering:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Keep the time step reasonably stable from frame to frame.&lt;/li&gt;&lt;/ul&gt;Why do I say that oscillating time steps make gameplay code behave badly? Certainly, at any good studio, the gameplay code is written to be time step independent. Whenever an object is moved, the length of the time step is taken into account, so that it moves with the same speed regardless of the frame rate:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;s = s + v*dt&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Yes, it is easy to make the update code for a single object frame rate independent in this manner. But once you start to have complex interactions between multiple objects, it gets a lot more difficult.&lt;br /&gt;&lt;br /&gt;An example: Suppose you want a camera to follow behind a moving object, but you don't want it to be completely stiff, you want some smoothness to it, like maybe lerping it to the right position, or using a dampened spring. Now try to write that in a completely time step independent manner. I.e., an oscillation in the time step should not cause the camera to oscillate with respect to the object it is following.&lt;br /&gt;&lt;br /&gt;Not so easy.&lt;br /&gt;&lt;br /&gt;Game play code faces many similar issues. With 20+ game play programmers banging a way at the code base you can be quite certain that there are several places where oscillations in the time step lead to badly looking oscillating visuals in the game. And the best way to prevent that, in my opinion, is to smooth the time step so that it doesn't oscillate.&lt;br /&gt;&lt;br /&gt;In summary, this is our default method for time step smoothing in the BitSquid engine (though as I said above, you are of course free to use whatever time step you want):&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Keep a history of the time step for the last 11 frames.&lt;/li&gt;&lt;li&gt;Throw away the outliers, the two highest and the two lowest values.&lt;/li&gt;&lt;li&gt;Calculate the mean of the remaining 7 values.&lt;/li&gt;&lt;li&gt;Lerp from the time step for the last frame to the calculated mean (adding more smoothness)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Here is an example of the filter in action.&amp;nbsp;The yellow line in this graph shows the raw time step. The red line is the smoothed time step.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_D6mTIm8lbTo/TLv7TxzkinI/AAAAAAAAAFU/FIM7dD_x6Qk/s1600/smooth.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="150" src="http://3.bp.blogspot.com/_D6mTIm8lbTo/TLv7TxzkinI/AAAAAAAAAFU/FIM7dD_x6Qk/s400/smooth.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;One last thing to mention. This calculation can cause your game clock to drift from the world clock. If you need them to be synchronized (for network play for example) you need to keep track of your "time debt" -- i.e., how far your clock has drifted from the world clock and "pay off" that debt over a number of frames by increasing or decreasing your time step, until you are back at a synchronized state.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-3306448225892415955?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/3306448225892415955/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/10/time-step-smoothing.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3306448225892415955'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/3306448225892415955'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/10/time-step-smoothing.html' title='Time Step Smoothing'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_D6mTIm8lbTo/TLsmHew-t2I/AAAAAAAAAE0/oP3JA9f9TZ0/s72-c/time_step_1.png' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-7775227446272883940</id><published>2010-10-05T17:08:00.000+02:00</published><updated>2010-10-05T17:08:19.021+02:00</updated><title type='text'>The Dependency Checker</title><content type='html'>Maintaining referential integrity between resources in a big game project can be challenging:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Someone might accidentally delete an entity that is used somewhere in a level.&lt;/li&gt;&lt;li&gt;Someone might change a texture to improve the look of one object, without knowing that that texture is shared by two other objects.&lt;/li&gt;&lt;li&gt;A resource may have a misspelled name, but no one dares change it, because it is used in too many places.&lt;/li&gt;&lt;li&gt;There may be "dead" resources in the project, that aren't used anywhere, but no one knows how to find them, so that they can be deleted.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;To help with these issues, I've created a new tool for the BitSquid tool chain, the &lt;i&gt;Dependency Checker&lt;/i&gt;. The Dependency Checker understands all BitSquid file formats and knows how they can refer to other resources. By parsing the source tree it is thus able to create the complete dependency graph of a project.&lt;br /&gt;&lt;br /&gt;This isn't as complicated as it sounds, because we don't have that many different file formats, they are all based on SJSON and they use a standardized way of referring to other resources (type, name). The entire code for parsing and understanding all the different file formats is just 500 lines long.&lt;br /&gt;&lt;br /&gt;Once we have the dependency graph, we can do lots of interesting things with it. The first is to find all missing and dangling resources:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_D6mTIm8lbTo/TKs4DxfMDzI/AAAAAAAAAEg/2Lf-0UTIcls/s1600/dep1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_D6mTIm8lbTo/TKs4DxfMDzI/AAAAAAAAAEg/2Lf-0UTIcls/s1600/dep1.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Missing resources are resources that are referred somewhere, but that don't actually exist in the source tree. Dangling resources are existing resources that aren't referred anywhere.&lt;br /&gt;&lt;br /&gt;We can click on any resource in the list (or any other resource in the project) to see its dependencies.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs408kqWWI/AAAAAAAAAEk/7hvG5y_chaM/s1600/dep2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs408kqWWI/AAAAAAAAAEk/7hvG5y_chaM/s1600/dep2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;But the real interesting thing is the ability to patch dependencies. The Dependency Checker does not only know how to parse dependencies, it also knows how to modify them. (Yes, that is included in the 500 lines of code.) That means that it can &lt;i&gt;replace&lt;/i&gt;, &lt;i&gt;move &lt;/i&gt;and &lt;i&gt;copy &lt;/i&gt;resources.&lt;br /&gt;&lt;br /&gt;For example, we can &lt;i&gt;replace &lt;/i&gt;the missing font texture with something else.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs6K8fNcUI/AAAAAAAAAEo/2CFvVcvo6GQ/s1600/dep3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs6K8fNcUI/AAAAAAAAAEo/2CFvVcvo6GQ/s1600/dep3.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;All files that used the font texture will be patched up to refer to the new resource.&lt;br /&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;br /&gt;&lt;i&gt;Move&lt;/i&gt;&amp;nbsp;is useful when we have given something a bad name and just want to clean up our resources a bit:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs7G-GrBeI/AAAAAAAAAEs/yD8yqGtAkqg/s1600/dep4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs7G-GrBeI/AAAAAAAAAEs/yD8yqGtAkqg/s1600/dep4.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;i&gt;Copy&lt;/i&gt;&amp;nbsp;can be used to quickly clone resources. But since the dependency checker lets you decide which (if any) references you want to redirect to the new copy, it can also be used as a quick way of splitting resources. For example if you decide that you want to use a different &lt;i&gt;key&lt;/i&gt;&amp;nbsp;entity on the &lt;i&gt;himalaya&lt;/i&gt;&amp;nbsp;level, you can make a copy of the &lt;i&gt;key &lt;/i&gt;entity for just those levels.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs9CjcyDJI/AAAAAAAAAEw/nhOZpuZy39M/s1600/dep5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/TKs9CjcyDJI/AAAAAAAAAEw/nhOZpuZy39M/s1600/dep5.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-7775227446272883940?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/7775227446272883940/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/10/dependency-checker.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7775227446272883940'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7775227446272883940'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/10/dependency-checker.html' title='The Dependency Checker'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_D6mTIm8lbTo/TKs4DxfMDzI/AAAAAAAAAEg/2Lf-0UTIcls/s72-c/dep1.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1636490414986157000</id><published>2010-10-01T14:35:00.000+02:00</published><updated>2010-10-01T14:35:01.223+02:00</updated><title type='text'>Static Hash Values</title><content type='html'>We use 32-bit string hashes instead of strings in many places to save memory and improve performance. (When there is a risk for collision we use 64-bit hashes instead.)&lt;br /&gt; &lt;br /&gt; At a number of places in the code we want to check these hashes against predefined values.For example, we may want to check if a certain object is the "root_point". With a straightforward implementation, you get code that looks like this:&lt;pre class="brush:cpp"&gt;&lt;br /&gt;const char *root_point_str = "root_point";&lt;br /&gt;static unsigned root_point_id = murmur_hash(root_point_str, &lt;br /&gt;    strlen(root_point_str), 0);&lt;br /&gt;if (object.name() == root_point_id)&lt;br /&gt;    ...&lt;br /&gt;&lt;/pre&gt;We use a static variable to avoid having to hash the string more than once, but thisis still pretty inefficient. There is the extra application data, the computation of the hash the first time the function is run. On subsequent invocations there is still the check to see if the static variable has been initialized.&lt;br /&gt; &lt;br /&gt; It would be a lot more efficient if we could precompute the hashes somehow to avoid that cost in the runtime. I can see three ways:&lt;ul&gt;&lt;li&gt; We could run a code generation pass in a pre-build step that generates the hash values and patches the code with them.&lt;li&gt; We could use the preprocessor to generate the values.&lt;li&gt; We could compute the values offline and hard-code them in the code.&lt;/ul&gt;I'm not too found of code generation. It is nice in theory, but to me it always seems kind of messy the way it interacts with the build system, the debugger, etc.&lt;br /&gt; &lt;br /&gt; Rewriting the murmur hash algorithm in the preprocessor requires me to bring out some serious preprocessor-fu. But it is fun. It is almost like functional programming:&lt;script type="syntaxhighlighter" class="brush: cpp"&gt;&lt;![CDATA[#define MURMUR_M   0x5bd1e995u#define KEY_TRANSFORM(k) (((unsigned(k)*MURMUR_M)^((unsigned(k)*MURMUR_M)&gt;&gt;24u))*MURMUR_M)#define FINAL_MIX_SUB(h) (((h) ^ ((h) &gt;&gt; 13)) * MURMUR_M)#define FINAL_MIX(h)  (FINAL_MIX_SUB(h) ^ (FINAL_MIX_SUB(h) &gt;&gt; 15))#define HFUN_0(seed)     (seed)#define HFUN_1(seed, k1)    ((HFUN_0(seed)*MURMUR_M)^KEY_TRANSFORM(k1))#define HFUN_2(seed, k1, k2)   ((HFUN_1(seed, k1)*MURMUR_M)^KEY_TRANSFORM(k2))#define HFUN_3(seed, k1, k2, k3)  ((HFUN_2(seed, k1, k2)*MURMUR_M)^KEY_TRANSFORM(k3))#define HFUN_4(seed, k1, k2, k3, k4) ((HFUN_3(seed, k1, k2, k3)*MURMUR_M)^KEY_TRANSFORM(k4))#define LASTBYTES_0(h)     (h)#define LASTBYTES_1(h, c1)    ((h^c1)*MURMUR_M)#define LASTBYTES_2(h, c1, c2)   ((h^(c2&lt;&lt;8)^c1)*MURMUR_M)#define LASTBYTES_3(h, c1, c2, c3)  ((h^(c3&lt;&lt;16)^(c2&lt;&lt;8)^c1)*MURMUR_M)#define PACKCHARS(c1, c2, c3, c4)  (c1 + (c2 &lt;&lt; 8) + (c3 &lt;&lt; 16) + (c4 &lt;&lt; 24))#define HASH_STR_1(c1)     FINAL_MIX(LASTBYTES_1(HFUN_0(1), c1))#define HASH_STR_2(c1,c2)    FINAL_MIX(LASTBYTES_2(HFUN_0(2), c1, c2))#define HASH_STR_3(c1,c2,c3)   FINAL_MIX(LASTBYTES_3(HFUN_0(3), c1, c2, c3))#define HASH_STR_4(c1,c2,c3,c4)    FINAL_MIX(LASTBYTES_0(HFUN_1(4, PACKCHARS(c1,c2,c3,c4))))#define HASH_STR_5(c1,c2,c3,c4,c5)   FINAL_MIX(LASTBYTES_1(HFUN_1(5, PACKCHARS(c1,c2,c3,c4)),c5))#define HASH_STR_6(c1,c2,c3,c4,c5,c6)  FINAL_MIX(LASTBYTES_2(HFUN_1(6, PACKCHARS(c1,c2,c3,c4)),c5,c6))#define HASH_STR_7(c1,c2,c3,c4,c5,c6,c7) FINAL_MIX(LASTBYTES_3(HFUN_1(7, PACKCHARS(c1,c2,c3,c4)),c5,c6,c7))#define HASH_STR_8(c1,c2,c3,c4,c5,c6,c7,c8)     FINAL_MIX(LASTBYTES_0(HFUN_2(8, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8))))#define HASH_STR_9(c1,c2,c3,c4,c5,c6,c7,c8,c9)    FINAL_MIX(LASTBYTES_1(HFUN_2(9, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8)),c9))#define HASH_STR_10(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10)   FINAL_MIX(LASTBYTES_2(HFUN_2(10, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8)),c9,c10))#define HASH_STR_11(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11)  FINAL_MIX(LASTBYTES_3(HFUN_2(11, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8)),c9,c10,c11))#define HASH_STR_12(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12)     FINAL_MIX(LASTBYTES_0(HFUN_3(12, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8), PACKCHARS(c9,c10,c11,c12))))#define HASH_STR_13(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13)    FINAL_MIX(LASTBYTES_1(HFUN_3(13, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8), PACKCHARS(c9,c10,c11,c12)),c13))#define HASH_STR_14(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14)   FINAL_MIX(LASTBYTES_2(HFUN_3(14, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8), PACKCHARS(c9,c10,c11,c12)),c13,c14))#define HASH_STR_15(c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15)  FINAL_MIX(LASTBYTES_3(HFUN_3(15, PACKCHARS(c1,c2,c3,c4), PACKCHARS(c5,c6,c7,c8), PACKCHARS(c9,c10,c11,c12)),c13,c14,c15))]]&gt;&lt;/script&gt;With these lovely macros in place, we can now write:&lt;pre class="brush:cpp"&gt;&lt;br /&gt;if (object.name() == HASH_STR_10('r','o','o','t','_','p','o','i','n','t'))&lt;br /&gt;    ...&lt;br /&gt;&lt;/pre&gt;Having completed this task I feel a bit empty. That is certainly a lot of macro code for an end result that still is kind of meh.&lt;br /&gt; &lt;br /&gt; I disregarded hard coding the values to begin with because no one wants to look at code like this:&lt;pre class="brush:cpp"&gt;&lt;br /&gt;if (object.name() == 0x5e43bd96)&lt;br /&gt;    ...&lt;br /&gt;&lt;/pre&gt;Even dressed up in comments, it is still kind of scary:&lt;pre class="brush:cpp"&gt;&lt;br /&gt;unsigned root_point_id = 0x5e43bd96; // hash of "root_point"&lt;br /&gt;if (object.name() == root_point_id)&lt;br /&gt;    ...&lt;br /&gt;&lt;/pre&gt;What if someone types in the wrong value? What if we decide to change hash algorithm at some later point? Scary. But maybe we can ameliorate those fears:&lt;pre class="brush:cpp"&gt;&lt;br /&gt;#ifdef _DEBUG&lt;br /&gt;    inline unsigned static_hash(const char *s, unsigned value) {&lt;br /&gt;        assert( murmur_hash(s, strlen(s), 0) == value );&lt;br /&gt;        return value;&lt;br /&gt;    }&lt;br /&gt;#else&lt;br /&gt;    #define static_hash(s,v) (v)&lt;br /&gt;#end&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;if (object.name() == static_hash("root_point", 0x5e43bd96)&lt;br /&gt;    ...&lt;br /&gt;&lt;/pre&gt;That looks better and is completely safe. If something goes wrong, the assert will trigger in the debug builds.&lt;br /&gt; &lt;br /&gt; I think I like this better than the preprocessor solution. It will make the debug builds run a bit slower, but that's what debug builds are for, right?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1636490414986157000?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1636490414986157000/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/10/static-hash-values.html#comment-form' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1636490414986157000'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1636490414986157000'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/10/static-hash-values.html' title='Static Hash Values'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-7399280474045495907</id><published>2010-09-30T17:31:00.000+02:00</published><updated>2010-09-30T17:31:02.066+02:00</updated><title type='text'>BitBucket for BitSquid</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;I have made some improvements to the Json Merge tool. It can now display diffs and merges visually:&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_D6mTIm8lbTo/TKSrjqy5hII/AAAAAAAAAEc/Tl6kYn4qnPI/s1600/jsondiff.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_D6mTIm8lbTo/TKSrjqy5hII/AAAAAAAAAEc/Tl6kYn4qnPI/s1600/jsondiff.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;To make the distribution of our public tools easier I've uploaded them as bitbucket repositories. You can find them at:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;a href="http://bitbucket.org/bitsquid/"&gt;http://bitbucket.org/bitsquid/&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial;"&gt;Currently their are three projects available, our Json Merger, our Distance Field Font Generator and our Motion Builder Exporter.&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-7399280474045495907?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/7399280474045495907/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/09/bitbucket-for-bitsquid.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7399280474045495907'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7399280474045495907'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/09/bitbucket-for-bitsquid.html' title='BitBucket for BitSquid'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_D6mTIm8lbTo/TKSrjqy5hII/AAAAAAAAAEc/Tl6kYn4qnPI/s72-c/jsondiff.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4404416081390261652</id><published>2010-09-21T10:41:00.000+02:00</published><updated>2010-09-21T10:41:38.222+02:00</updated><title type='text'>Custom Memory Allocation in C++</title><content type='html'>For console development, memory is a very precious resource. You want good locality of reference and as little fragmentation of possible. You also want to be able to track the&amp;nbsp;amount&amp;nbsp;of memory used by different subsystems and&amp;nbsp;eliminate&amp;nbsp;memory leaks. To do that, you want to write your own custom memory allocators. But the standard ways of doing that in C++ leave a lot to be desired.&lt;br /&gt;&lt;br /&gt;You can override global new and replace it with something else. This way you can get some basic memory tracking, but you still have to use the same allocation strategy for all allocations, which is far from ideal. Some systems work better with memory pools. Some can use simple frame allocation (i.e., pointer bump allocation). &amp;nbsp;You really want each system to be able to have its own custom allocators.&lt;br /&gt;&lt;br /&gt;The other option in C++ is to override &lt;i&gt;new &lt;/i&gt;on a per class basis. This has always has seemed kind of strange to me. Pretty much the only thing you can use it for are object pools. Global, per-class object pools. If you want one pool per thread, or one pool per streaming chunk -- you run into problems.&lt;br /&gt;&lt;br /&gt;Then you have the STL solution, where containers are templated on their allocator, so containers that use different allocators have different types. It also has fun things such as &lt;i&gt;rebind()&lt;/i&gt;. But the weirdest thing is that all instances of the allocator class must be equivalent. So you must put all your data in static variables. And if you want to create two separate memory pools you have to have two different allocator classes.&lt;br /&gt;&lt;br /&gt;I must admit that every time I run into something in STL that seems completely bonkers I secretly suspect that I have missed something. Because obviously STL has been created by some really clever people who have thought long and hard about these things. But I just don't understand the idea behind the design of the custom allocator interface at all. Can any one explain it to me? Does any one use it? Find it practical? Sane?&lt;br /&gt;&lt;br /&gt;If it weren't for the allocator interface I could almost use STL. Almost. There is also the pretty inefficient &lt;i&gt;map&lt;/i&gt; implementation. And the fact that &lt;i&gt;deque&lt;/i&gt; is not a simple ring buffer, but some horrible beast. And that many containers allocate memory even if they are empty... So my own version of everything it is. Boring, but what's a poor gal gonna do?&lt;br /&gt;&lt;br /&gt;Back to allocators. In conclusion, all the standard C++ ways of implementing custom allocators are (to me) strange and strangely useless. So what do I do instead? I use an abstract allocator interface and implement it with a bunch of concrete classes that allocate&amp;nbsp; memory in different ways:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;class Allocator&lt;br /&gt;{&lt;br /&gt;public:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;virtual void *allocate(size_t size, size_t align) = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;virtual void deallocate(void *p) = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;virtual size_t allocated_size(void *p) = 0;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I think this is about as sane as an allocator API can get. One possible point of contention is the &lt;i&gt;allocated_size()&lt;/i&gt; method. Some allocators (e.g., the frame allocator) do not automatically know the sizes of their individual allocations, and would have to use extra memory to store them. However, being able to answer questions about allocation sizes is very useful for memory tracking, so I require all allocators to provide that information, even if it means that a frame allocator will have to use a little extra memory to store it.&lt;br /&gt;&lt;br /&gt;I use an abstract interface with virtual functions, because I don't want to template my classes on the allocator type. I like my allocators to be actual objects that I can create more than one of, thank you very much. Memory allocation is expensive anyway, so I don't care about the cost of a virtual function call.&lt;br /&gt;&lt;br /&gt;In the BitSquid engine, you can &lt;b&gt;only&lt;/b&gt; allocate memory through an &lt;i&gt;Allocator &lt;/i&gt;object. If you call &lt;i&gt;malloc &lt;/i&gt;or &lt;i&gt;new &lt;/i&gt;the engine will &lt;i&gt;assert(false)&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Also, in the BitSquid engine all allocators keep track of the total number of allocations they have made, and the total size of those allocations. The numbers are decreased on &lt;i&gt;deallocate()&lt;/i&gt;. In the allocator destructor we &lt;i&gt;assert(_size == 0 &amp;amp;&amp;amp; _allocations == 0)&lt;/i&gt; and when we shut down the application we tear down all allocators properly. So we know that we don't have any memory leaks in the engine. At least not along any code path that has ever been run.&lt;br /&gt;&lt;br /&gt;Since everything must be allocated through an &lt;i&gt;Allocator&lt;/i&gt;, all our collection classes (and a bunch of other low-level classes) take an &lt;i&gt;Allocator &amp;amp;&lt;/i&gt; in the constructor and use that for all their allocations. Higher level classes either create their own allocator or use one of the globals, such as &lt;i&gt;memory_globals::default_allocator()&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;With this interface set, we can implement a number of different allocators. A&amp;nbsp;&lt;i&gt;HeapAllocator &lt;/i&gt;that&amp;nbsp;allocates from a heap. A&amp;nbsp;&lt;i&gt;PoolAllocator &lt;/i&gt;that uses an object pool. A&amp;nbsp;&lt;i&gt;FrameAllocator &lt;/i&gt;that pointer bumps. A&amp;nbsp;&lt;i&gt;PageAllocator &lt;/i&gt;that&amp;nbsp;allocates raw virtual memory. And so on.&lt;br /&gt;&lt;br /&gt;Most of the allocators are set up to use a backing allocator to allocate large chunks of memory which they then chop up into smaller pieces. The backing allocator is also an &lt;i&gt;Allocator&lt;/i&gt;. So a pool allocator could use either the heap or the virtual memory to back up its allocations.&lt;br /&gt;&lt;br /&gt;We use proxy allocators for memory tracking. For example, the sound system uses:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;ProxyAllocator("sound", memory_globals::default_allocator());&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;which forwards all allocations to the default allocator, but keeps track of how much memory has been allocated by the sound system, so that we can display it in nice memory overviews.&lt;br /&gt;&lt;br /&gt;If we have a hairy memory leak in some system, we can add a &lt;i&gt;TraceAllocator&lt;/i&gt;, another proxy allocator which records a stack trace for each allocation. Though, truth be told, we haven't actually had to use that much. Since our &lt;i&gt;assert &lt;/i&gt;triggers as soon as a memory leak is introduced, and the &lt;i&gt;ProxyAllocator &lt;/i&gt;tells us in which subsystem the leak occurred, we usually find them quickly.&lt;br /&gt;&lt;br /&gt;To create and destroy objects using our allocators, we have to use placement new and friends:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;void *memory = allocator.allocate( sizeof(MyClass), alignof(MyClass) );&lt;br /&gt;MyClass *m = new (memory) MyClass(10);&lt;br /&gt;&lt;br /&gt;if (m) {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;m-&amp;gt;~MyClass();&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;allocator.deallocate(m);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;My eyes! The pain! You certainly don't want to type or read that a lot. Thanks C++ for making my code so pretty. I've tried to make it less hurtful with some template functions in the allocator class:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;class Allocator&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;template&amp;nbsp;&amp;lt;class T, class P1&amp;gt; T *make_new(const P1 &amp;amp;p1) {return new (allocate(sizeof(T), alignof(T))) T(p1);}&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;template&amp;nbsp;&amp;lt;class T&amp;gt; void make_delete(T *p) {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (p) {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;p-&amp;gt;~T();&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;deallocate(p);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Add a bunch of other templates for constructors that take a different number of arguments that can be const or non-const and now you can at least write:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;MyClass *m = allocator.make_new&amp;lt;MyClass&amp;gt;(10);&lt;br /&gt;&lt;br /&gt;allocator.make_delete(m);&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;That's not too bad.&lt;br /&gt;&lt;br /&gt;One last interesting thing to talk about. Since we use the allocators to assert on memory leaks, we really want to make sure that we set them up and tear them down in a correct, deterministic order. Since we are not allowed to allocate anything without using allocators, this raises an interesting chicken-and-egg problem: who allocates the allocators? How does the first allocator get allocated?&lt;br /&gt;&lt;br /&gt;The first allocator could be &lt;i&gt;static&lt;/i&gt;, but I want deterministic creation and destruction. I don't want the allocator to be destroyed by some random &lt;i&gt;_exit()&lt;/i&gt;&amp;nbsp;callback god knows when.&lt;br /&gt;&lt;br /&gt;The solution -- use a chunk of raw memory and &lt;i&gt;new&lt;/i&gt;&amp;nbsp;the first allocator into that:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: cpp"&gt;&lt;br /&gt;char _buffer[BUFFER_SIZE];&lt;br /&gt;&lt;br /&gt;HeapAllocator *_static_heap = 0;&lt;br /&gt;PageAllocator *_page_allocator = 0;&lt;br /&gt;HeapAllocator *_heap_allocator = 0;&lt;br /&gt;&lt;br /&gt;void init()&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_static_heap = new (_buffer)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;HeapAllocator(NULL, _buffer + sizeof(HeapAllocator), BUFFER_SIZE - sizeof(HeapAllocator));&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_page_allocator = _static_heap-&amp;gt;make_new&amp;lt;PageAllocator&amp;gt;("page_allocator");&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_heap_allocator = _static_heap-&amp;gt;make_new&amp;lt;HeapAllocator&amp;gt;("heap_allocator", *_page_allocator);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;...&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void shutdown()&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;...&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_static_heap-&amp;gt;make_delete(_heap_allocator);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_heap_allocator = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_static_heap-&amp;gt;make_delete(_page_allocator);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_page_allocator = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_static_heap-&amp;gt;~HeapAllocator();&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;_static_heap = 0;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Note how this works&lt;i&gt;. _buffer&lt;/i&gt;&amp;nbsp;is initialized statically, but since that doesn't call any constructors or destructors, we are fine with that. Then we placement new&amp;nbsp;a &lt;i&gt;HeapAllocator &lt;/i&gt;at the start of that buffer. That heap allocator is a static heap allocator that uses a predefined memory block to create its heap in. And the memory block that it uses is the rest of the &lt;i&gt;_buffer&lt;/i&gt;&amp;nbsp;-- whatever remains after &lt;i&gt;_static_heap&lt;/i&gt;&amp;nbsp;has been placed in the beginning.&lt;br /&gt;&lt;br /&gt;Now we have our bootstrap allocator, and we can go on creating all the other allocators, using the bootstrap allocator to create them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4404416081390261652?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4404416081390261652/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/09/custom-memory-allocation-in-c.html#comment-form' title='32 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4404416081390261652'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4404416081390261652'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/09/custom-memory-allocation-in-c.html' title='Custom Memory Allocation in C++'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>32</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4199957923655765433</id><published>2010-09-17T02:14:00.001+02:00</published><updated>2010-09-17T02:25:27.646+02:00</updated><title type='text'>Visual Scripting the Data-Oriented Way</title><content type='html'>The BitSquid engine has two separate scripting systems. The first is Lua. The entire engine is exposed to Lua in such a way that you can (and are encouraged to) write an entire game in Lua without touching the C++ code at all.&lt;br /&gt;&lt;br /&gt;The second system, which is the focus of this post, is a visual scripting system that allows artists to add behaviors to levels and entities. We call this system &lt;i&gt;Flow&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Lua and Flow have different roles and complement each other. Lua allows gameplay programmers to quickly test designs and iterate over gameplay mechanics. Flow lets artists enrich their objects by adding effects, destruction sequences, interactivity, etc. When the artists can do this themselves, without the help of a programmer, they can iterate much faster and the quality is raised.&lt;br /&gt;&lt;br /&gt;Flow uses a pretty standard setup with nodes representing actions and events.&amp;nbsp;&amp;nbsp;Black links control how events cascade through the graph causing actions to be taken. Blue links represent variables that are fed into the flow nodes:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_D6mTIm8lbTo/TJKgWbA3gFI/AAAAAAAAAEM/CdlsqHKdvyU/s1600/snip.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="267" src="http://1.bp.blogspot.com/_D6mTIm8lbTo/TJKgWbA3gFI/AAAAAAAAAEM/CdlsqHKdvyU/s400/snip.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;In this graph, the green nodes represent events and the red node represents an action. The yellow node is an action implemented in Lua. The gameplay programmers can set up such actions on a per-project basis and make them available to the artists.&lt;br /&gt;&lt;br /&gt;Each unit (= entity) type can have its own flow graph, specifying that unit's behavior, e.g. an explosion sequence for a barrel. There is also a separate flow graph for the entire level that the level designers can use to set up the level logic.&lt;br /&gt;&lt;br /&gt;Since &lt;i&gt;Flow&lt;/i&gt; will be used a lot I wanted to implement it as efficiently as possible, both in terms of memory and performance. This means that I &lt;i&gt;don't&lt;/i&gt;&amp;nbsp;use a standard object-oriented design where each node is a separate heap-allocated object that inherits from an abstract &lt;i&gt;Node&lt;/i&gt;&amp;nbsp;class with a virtual &lt;i&gt;do_action()&lt;/i&gt; method. That way lies heap allocation and pointer chasing madness.&lt;br /&gt;&lt;br /&gt;Sure, we might use such a design in the &lt;i&gt;editor&lt;/i&gt;, where we don't care about performance and where the representation needs to be easy to use, modifiable, stable to version changes, etc. In the &lt;i&gt;runtime&lt;/i&gt;, where the graph is static and compiled for a particular platform, we can do a lot better.&lt;br /&gt;&lt;br /&gt;(This is why you should keep your editor (source) data format completely separate from your in-game data format. They have very different requirements. One needs to be dynamic, multi-platform and able to handle file format version changes. The other is static, compiled for a specific platform and does not have to care about versioning -- since we can just recompile from source if we change the format. If you don't already -- just use JSON for all your source data, and binary blobs for your in-game data, it's the only sane option.)&lt;br /&gt;&lt;br /&gt;The runtime data doesn't even have to be a graph at all. There are at least two other options:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We could convert the entire graph into a Lua program, representing the nodes as little snippets or functions of Lua code. This would be a very flexible approach that would be easy to extend. However, I don't like it because it would introduce significant overhead in both CPU and memory usage.&lt;/li&gt;&lt;li&gt;We could "unroll" the graph. For each possible external input event we could trace how it flows through the graph and generate a list of triggered actions as a sort of "bytecode". The runtime data would then just be a bytecode snippet for each external event. This has the potential of being very fast but can be complicated by nodes with state that may cause branching or looping. Also, it could potentially use a lot of extra memory if there are many shared code paths.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;So I've decided to use a graph. But not a stupid object-oriented graph. A data-oriented graph, where we put the entire graph in a single blob of cohesive memory:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/TJKpZIKbKsI/AAAAAAAAAEU/LmgbrqgTyWA/s1600/Data-orientedflow.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="110" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/TJKpZIKbKsI/AAAAAAAAAEU/LmgbrqgTyWA/s400/Data-orientedflow.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Oh, these blobs, how I love them. They are really not that complicated. I just concatenate all the node data into a single memory chunk. The data for each node begins with a node type identifier that lets me&amp;nbsp;&lt;i&gt;switch&lt;/i&gt;&amp;nbsp;on the node type and do the appropriate action for each node. (Yes, I'll take that over your virtual calls any day.) Pointers to nodes are stored as offsets within the blob. To follow a pointer, just add the offset to the blob's start address, cast the pointer to something useful and presto!&lt;br /&gt;&lt;br /&gt;Yes, it is ok to cast pointers. You can do it. You don't have to feel bad about it. You know that there is a &lt;i&gt;struct&amp;nbsp;ParticleEffectNode&lt;/i&gt;&amp;nbsp;at that address. Just cast the pointer and get it over with.&lt;br /&gt;&lt;br /&gt;The many nice things about blobs with offsets deserve to be repeated:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We can allocate the entire structure with a single memory allocation. Efficient! And we know exactly how much memory it will use.&lt;/li&gt;&lt;li&gt;We can make mem-copies of the blob without having to do any pointer patching, because there are no pointers, only offsets.&lt;/li&gt;&lt;li&gt;We can even DMA these copies over to a SPU. Everything is still valid.&lt;/li&gt;&lt;li&gt;We can write the data to disk and read it back with a single call. No pointer patching or other fixups needed.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;In fact, doesn't this whole blob thing look suspicously like a file format? Yes! That is exactly what it is. A file format for memory. And it shouldn't be surprising. The speed difference between RAM and CPU means that memory is the new disk!&lt;br /&gt;&lt;br /&gt;If you are unsure about how to do data-oriented design, thinking "file formats for memory" is not a bad place to start.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/TJKpZIKbKsI/AAAAAAAAAEU/LmgbrqgTyWA/s1600/Data-orientedflow.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="110" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/TJKpZIKbKsI/AAAAAAAAAEU/LmgbrqgTyWA/s400/Data-orientedflow.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Note that there is a separate blob for dynamic data. It is used for the nodes' state data, such as which actor collided with us for a collision event (the blue data fields in the flow graph). We build that blob in the same way. When we compile the graph, any node that needs dynamic data reserves a little space in the dynamic blob and stores the offset to that space from the start of the dynamic block.&lt;br /&gt;&lt;br /&gt;When we clone a new entity, we allocate a new dynamic data block and memcopy in the template data from the dynamic data block that is stored in the compiled file.&lt;br /&gt;&lt;br /&gt;Sharing of dynamic data (the blue links in the graph) is implemented by just letting nodes point to the same offset in the dynamic data blob.&lt;br /&gt;&lt;br /&gt;Since we are discussing performance, I should perhaps also say something about multithreading.&amp;nbsp;How do we parallelize the execution of flow graphs?&lt;br /&gt;&lt;br /&gt;The short answer: we don't.&lt;br /&gt;&lt;br /&gt;The flow graph is a high level system that talks to a lot of other high level systems. A flow graph may trigger an effect, play a sound, start an animation, disable a light, etc, etc. At the same time there isn't really any heavy CPU processing going on in the flow graph itself. In fact it doesn't really do anything other than calling out to other systems. Multithreading this wouldn't gain a lot (since the compute cost is low to begin with) and add significant costs (since all the external calls would have to be synchronized in some way).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4199957923655765433?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4199957923655765433/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/09/visual-scripting-data-oriented-way.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4199957923655765433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4199957923655765433'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/09/visual-scripting-data-oriented-way.html' title='Visual Scripting the Data-Oriented Way'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_D6mTIm8lbTo/TJKgWbA3gFI/AAAAAAAAAEM/CdlsqHKdvyU/s72-c/snip.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2360987107463685167</id><published>2010-09-08T14:29:00.000+02:00</published><updated>2010-09-08T14:29:33.659+02:00</updated><title type='text'>Code Share: Motion Builder Exporter</title><content type='html'>Writing exporters is boring. It mostly involves navigating huge&amp;nbsp;unwieldy and poorly documented APIs.&lt;br /&gt;&lt;br /&gt;I'd like to spend as little of my time as possible writing exporters, and I'm guessing most other developers feel the same. So, in that spirit I'm sharing the code of the simple Motion Builder exporter I just wrote:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.bitsquid.se/files/motionbuilder_bsi_exporter.txt"&gt;BitSquid Python Motion Builder Exporter&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Feel free to cannibalize the code for your own exporter. Hopefully you can save a little time and spend it doing something more interesting than writing exporters.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2360987107463685167?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2360987107463685167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/09/code-share-motion-builder-exporter.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2360987107463685167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2360987107463685167'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/09/code-share-motion-builder-exporter.html' title='Code Share: Motion Builder Exporter'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-6417388530396450223</id><published>2010-08-25T17:57:00.003+02:00</published><updated>2010-08-25T18:05:55.867+02:00</updated><title type='text'>BitSquid's Dual Mode GUIs</title><content type='html'>The BitSquid engine uses a dual mode GUI system. That is, the GUI system can be run both in retained and immediate mode. For GUIs with lots of static data, retained mode can be used for increased efficiency. For smaller or more dynamic GUIs it is simpler to work in immediate mode.&lt;br /&gt;&lt;br /&gt;The retained mode and the immediate mode use the same API and the same implementation with just a simple flag that controls the mode. To see how that is possible, it is easiest to begin by looking at how our GUIs are rendered.&lt;br /&gt;&lt;br /&gt;Despite their simplicity, ordinary 2D GUIs can be quite taxing to a renderer. The reason is that they often contain many small individual objects. You can easily have a GUI with 500 little icons, text strings, radar blips, etc. If you render them as individual objects your batch count will go through the roof. The key to efficient GUI rendering is thus to batch together similar objects into larger buffers to get fewer draw calls.&lt;br /&gt;&lt;br /&gt;In the BitSquid engine, the GUI batching works like this: When the main thread wants to render a GUI object it generates three pieces of data and sends them to the renderer.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;An &lt;i&gt;id&lt;/i&gt;&amp;nbsp;that uniquely identifies the object.&lt;/li&gt;&lt;li&gt;A &lt;i&gt;batch key &lt;/i&gt;consisting of &lt;i&gt;(gui layer, material)&lt;/i&gt;. Objects in the same layer with the same material can be batched together.&lt;/li&gt;&lt;li&gt;The &lt;i&gt;vertex data&lt;/i&gt;&amp;nbsp;(positions, normals, vertex colors, uv-coordinates) for the object to be rendered.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;The renderer finds an existing batch with a matching &lt;i&gt;batch key&lt;/i&gt;&amp;nbsp;and appends the vertex data to the vertex buffer of that batch. If no matching batch exists a new batch is created.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_D6mTIm8lbTo/THUylNJnVlI/AAAAAAAAAEE/usJLdh8WAWY/s1600/BitSquidGui+(1).png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="275" src="http://3.bp.blogspot.com/_D6mTIm8lbTo/THUylNJnVlI/AAAAAAAAAEE/usJLdh8WAWY/s640/BitSquidGui+(1).png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;When it is time to render, the renderer just renders all its batches with their corresponding data.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The main thread can modify an object by resending the same id with a new batch key and new vertex data. The renderer will delete the old data from the batch buffers and insert the new data. The main thread can also send an id to the renderer and request the object to be deleted. The renderer will delete the object's vertex data from the batch buffers.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To higher level systems, the GUI exposes an interface that looks something like this:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;create_text(pos, text, font, color) : id&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;update_text(id, pos, text, font, color)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;destroy_text(id)&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;create_text()&lt;/i&gt;&amp;nbsp;creates a new id, generates the vertex data for the text object and sends it to the renderer. &lt;i&gt;update_text&lt;/i&gt;&lt;i&gt;()&lt;/i&gt;generates new vertex data and sends it to the renderer to replace the old data. &lt;i&gt;destroy_text&lt;/i&gt;&lt;i&gt;()&lt;/i&gt;&amp;nbsp;tells the renderer to delete the vertex data corresponding to the object.&lt;/div&gt;&lt;div&gt;&lt;i&gt;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What is interesting about this API is that there are no separate &lt;i&gt;move()&lt;/i&gt;, &lt;i&gt;set_text()&lt;/i&gt;, &lt;i&gt;set_font()&lt;/i&gt;&amp;nbsp;and &lt;i&gt;set_color() &lt;/i&gt;functions. If you want to change the text object, you have to provide all the necessary data to the &lt;i&gt;update_text()&lt;/i&gt;&amp;nbsp;function. This means that&amp;nbsp;&lt;i&gt;update_text()&lt;/i&gt;&amp;nbsp;has all the data required to generate the object's data from scratch, so we don't have to retain any information about the objects in the main thread. The only data that is retained anywhere are the batch vertex buffers kept by the renderer. In this way we save memory, reduce the number of functions in the API and make the implementation a lot simpler. It also becomes easy to add new object types to the GUI, you just have to write a function that generates the batch key and the vertex data for the object.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You could argue that the API makes things more complicated for the user, since she now has to supply all the parameters to configure the text even if she just wants to change one of them (the color, for instance). &amp;nbsp;In my experience, that is usually not a problem. Typically, the user already has all the needed data stored somewhere and can just pass it to the &lt;i&gt;update()&lt;/i&gt;&amp;nbsp;function. For instance, the text to be displayed might be stored in a &lt;i&gt;player_name&lt;/i&gt;&amp;nbsp;variable. Retaining the data in the GUI would just mean that the data would be stored in two different places and add the burden of keeping them synchronized.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With all this in place, it is easy to see how we can support both retained mode and immediate mode in the same implementation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In retained mode everything works as described above. The user calls &lt;i&gt;create() &lt;/i&gt;to create an object, &lt;i&gt;update()&lt;/i&gt;&amp;nbsp;to modify it and &lt;i&gt;destroy()&lt;/i&gt;&amp;nbsp;to destroy it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In immediate mode, only the &lt;i&gt;create() &lt;/i&gt;function is used and the renderer is set to clear its batches every frame. Thus, any object drawn with &lt;i&gt;create()&lt;/i&gt;&amp;nbsp;will be drawn exactly one frame and then get cleared by the renderer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-6417388530396450223?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/6417388530396450223/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/08/bitsquids-dual-mode-guis.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6417388530396450223'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/6417388530396450223'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/08/bitsquids-dual-mode-guis.html' title='BitSquid&apos;s Dual Mode GUIs'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_D6mTIm8lbTo/THUylNJnVlI/AAAAAAAAAEE/usJLdh8WAWY/s72-c/BitSquidGui+(1).png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-7418721968779895789</id><published>2010-08-18T19:43:00.000+02:00</published><updated>2010-08-18T19:43:07.487+02:00</updated><title type='text'>A new data storage model</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;In my head, I am toying with an idea of a new data storage model that combines the flexibility and simplicity of JSON with the multi-user friendliness of a traditional database.&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;The basic idea is as follows:&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: 13px;"&gt;A database is a collection of objects.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: 13px;"&gt;Each object is identified by a GUID.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: 13px;"&gt;An object consists of a set of key-value pairs.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: 13px;"&gt;The keys are always strings.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;A value can be one of:&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;null (this is the same as the key not existing)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;true/false&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;a number&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;a string&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;a generic data blob (texture, vertex data, etc)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;a reference to another object (GUID)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;a set of references to other objects (GUIDs)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;A special object with GUID 0000-00000000-0000 acts as the root object of the database.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;This simple setup has many nice properties.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;It is easy to map objects back and forth between this storage representation and an in-memory representation in C++, C#, Lua, etc.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;We can perform referential integrity checks on the GUIDs to easily locate "dangling pointers" or "garbage objects".&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;We can add new fields to objects and still be "backwards compatible" with old code.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;It is easy to write batch scripts that operate on the database. For example, we can lookup the key "textures" in the root object to find all texture objects and then loop over them and examine their "height" and "width" to find any non-power-of-two textures.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;All modifications to the data can be represented by a small set of operations:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;create(guid)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;destroy(guid)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;change_key(guid, key, value)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;add_to_set(guid, key, object_guid)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;remove_from_set(guid, key, object_guid)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;These operations can also be used to represent a &lt;i&gt;diff &lt;/i&gt;between two different versions of the database.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;A user of the database can have a list of such operations that represents her local changes to the data (for testing purposes, etc). She can then commit all or some of these local changes to the central database. The database can thus be used in both online and offline mode. Versioning and branching systems can be built on this without too much effort.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Merge conflicts are eliminated in this system. The only possible conflict is when two users have changed the same key of the same object to two different values. In that case we resolve the conflict by letting the later change overwrite the value of the earlier one.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Note that this model only supports sets, not arrays. The reason is that array reordering operations are tricky to merge and in most cases the order of objects does not matter. In the few cases where order really does matter, you can use a key in the objects to specify the sort order.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-7418721968779895789?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/7418721968779895789/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/08/new-data-storage-model.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7418721968779895789'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/7418721968779895789'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/08/new-data-storage-model.html' title='A new data storage model'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4549633097477074069</id><published>2010-06-03T18:18:00.000+02:00</published><updated>2010-06-03T18:18:49.588+02:00</updated><title type='text'>Avoiding Content Locks and Conflicts -- 3-way Json Merge</title><content type='html'>Locking content files in a CVS is annoying, doesn't scale well and prevents multiple people from working on different parts of the same level (unless you split the level in many small files which have to be locked individually -- which is even more annoying).&lt;br /&gt;&lt;br /&gt;But having content conflicts is no fun either. A level designer wants to work in the level editor, not manage strange content conflicts in barely understandable XML-files. The level designer should never have to mess with WinMerging the engine's file formats.&lt;br /&gt;&lt;br /&gt;And conflicts shouldn't be necessary. Most content conflicts are not &lt;i&gt;actual &lt;span class="Apple-style-span" style="font-style: normal;"&gt;conflicts&lt;/span&gt;&lt;/i&gt;. It is not that often that two people have moved the exact same object or changed the exact same settings parameter. Rather, the conflicts occur because a line-based merge tool tries to merge hierarchical data (XML or JSON) and messes up the structure.&lt;br /&gt;&lt;br /&gt;In those rare cases when there is an actual conflict, the content people don't want to resolve it in WinMerge. If two level designers have moved the same object, we don't really help them address the issue by bringing up a dialog box with a ton of XML mumbo-jumbo. Instead, it is much better to just pick one of the two locations and go ahead with merging the file. Then, the level designers can fix any problems that might have occurred in the level editor -- the right tool for the job.&lt;br /&gt;&lt;br /&gt;At BitSquid we use JSON for all our content files (actually, a slightly simplified version of JSON that we call SJSON). So to get rid of our conflict issues, I have written a 3-way merger that understands the structure of JSON files and resolves any remaining &lt;i&gt;actual &lt;/i&gt;conflicts by always picking the right-hand branch.&lt;br /&gt;&lt;br /&gt;If we disregard arrays for the moment, merging JSON files is quite simple. A diff between two JSON files can be expressed as a list of &lt;i&gt;object[key] = value&lt;/i&gt;&amp;nbsp;operations. Deleting a key is represented by changing its value to &lt;i&gt;null&lt;/i&gt;. Adding a key is represented by changing a &lt;i&gt;null&lt;/i&gt;&amp;nbsp;value to something else. Merging these operations is simple. We only have trouble when the same key in the same object is changed to two different values, but then we just pick one of the values, as explained above.&lt;br /&gt;&lt;br /&gt;Arrays are trickier because without context, it is impossible to tell what a change to an array means semantically. If the array [1, 2, 3] is changed to [1, 2, 4] is that a single operation that changed the last value from 3 to 4. Or is it two operations, deleting the 3 from the array and inserting 4. How we interpret it will affect the result of our 3-way merges. For example, the 3-way merge of [1, 2, 3], [1, 2, 4] and [1, 2, 5] can give either the result [1, 2, 5] or [1, 2, 4, 5].&lt;br /&gt;&lt;br /&gt;I have resolved this by adding extra information to the arrays in our source files. Most of our arrays are arrays of objects. For such arrays, I require that the objects have an "id"-field with a GUID that uniquely identifies the object. With such an id in our array [ {x = 1, id = a}, {x = 2, id = b}, {x = 3, id = c} ] it becomes possible to distinguish between updating an existing value&amp;nbsp;&amp;nbsp;[ {x = 1, id = a}, {x = 2, id = b}, {x = 4, id = c} ] and removing + adding a value&amp;nbsp;[ {x = 1, id = a}, {x = 2, id = b}, {x = 4, id = d} ].&lt;br /&gt;&lt;br /&gt;The 3-way merge algorithm I'm using applies some heuristics to guess array transformations even when no id-field is present, but the recommendation is to always add id-fields to array elements to get perfect merges.&lt;br /&gt;&lt;br /&gt;You can download my 3-way Json merger &lt;a href="http://www.bitsquid.se/files/json_merge.7z"&gt;here&lt;/a&gt;.&amp;nbsp;I just wrote it today, so it haven't received much testing yet.&amp;nbsp;But it is public domain software, so free free to fix the bugs and do whatever else you like with it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4549633097477074069?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4549633097477074069/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/06/avoiding-content-locks-and-conflicts-3.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4549633097477074069'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4549633097477074069'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/06/avoiding-content-locks-and-conflicts-3.html' title='Avoiding Content Locks and Conflicts -- 3-way Json Merge'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4598840594206426131</id><published>2010-05-28T09:17:00.000+02:00</published><updated>2010-05-28T09:17:41.568+02:00</updated><title type='text'>Practical Examples in Data Oriented Design</title><content type='html'>Here are the slides from my talk at the Sthlm Game Developer Forum yesterday:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://docs.google.com/present/view?id=0AYqySQy4JUK1ZGNzNnZmNWpfMzJkaG5yM3pjZA&amp;amp;hl=en"&gt;Practical Examples in Data Oriented Design&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4598840594206426131?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4598840594206426131/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/05/practical-examples-in-data-oriented.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4598840594206426131'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4598840594206426131'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/05/practical-examples-in-data-oriented.html' title='Practical Examples in Data Oriented Design'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-9148636837880288441</id><published>2010-04-23T11:57:00.002+02:00</published><updated>2010-04-23T12:00:33.880+02:00</updated><title type='text'>Our Tool Architecture</title><content type='html'>The BitSquid tool architecture is based on two main design principles:&lt;br /&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Tools should use the "real" engine for visualization.&lt;/li&gt;&lt;li&gt;Tools should not be directly linked or otherwise strongly coupled to the engine.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Using the real engine for visualization means that everything will look and behave exactly the same in the tools as it does in-game. It also saves us the work of having to write a completely separate "tool visualizer" as well as the nightmare of trying to keep it in&amp;nbsp;sync&amp;nbsp;with changes to the engine.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;By decoupling the tools from the engine we achieve freedom and flexibility, both in the design of the tools and in the design of the engine. The tools can be written in any language (C#, Ruby, Java, Lisp, Lua, Python, C++, etc), using any methodology and design philosophy. The engine can be optimized and the runtime data formats changed without affecting the tools.&lt;br /&gt;&lt;br /&gt;What we envision is a Unix-like environment with a plethora of special purpose tools (particle editor, animation editor, level editor, material editor, profiler, lua debugger, etc) rather than a single monolithic Mega-Editor. We want it to be easy for our licensees to supplement our standard tool set with their own in-house tools, custom written to fit the requirements of their particular games. For example, a top-down 2D game may have a custom written &lt;i&gt;tile editor&lt;/i&gt;. Another&amp;nbsp;programmer may want to hack together a simple batch script that drops a MIP-step from all vegetation textures.&lt;br /&gt;&lt;br /&gt;At first glance, our two design goals may appear conflicting. How can we make our tools use the engine for all visualization without strongly coupling the tools to the engine? Our solution is shown in the image below:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/S9FUk-lbK_I/AAAAAAAAADs/kN4fzGR5174/s1600/BitSquidtoolarchitecture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="266" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/S9FUk-lbK_I/AAAAAAAAADs/kN4fzGR5174/s400/BitSquidtoolarchitecture.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Note that there is no direct linkage between the tool and the engine. The tool only talks to the engine through the network. All messages on the network connection are simple JSON structs, such as:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;"type" : "message",&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;"level" : "info",&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;"system" : "D3DRenderDevice",&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;"message" : "Resizing swap chain: 1626 1051"&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This applies for all tools. When the lua debugger wants to set a breakpoint, it sends a message to the engine with the lua file and line number. When the breakpoint is hit, the engine sends a message back. (So you can easily swap in your own lua debugger integrated with your favorite editor, by simply receiving and sending these messages.) When the engine has gathered a bunch of profiling data, it sends a profiler message. Et cetera.&lt;br /&gt;&lt;br /&gt;For visualization, the tool creates a window where it wants the engine to render and sends the window handle to the engine. The engine then creates a swap chain for that window and renders into it.&lt;br /&gt;&lt;br /&gt;(In the future we may also add support for a VNC-like mode where we instead let the engine send the content of the frame buffer over the network. This would allow the tools to work directly against consoles, letting the artists see, directly in their editors, how everything will look on the lead platform.)&lt;br /&gt;&lt;br /&gt;A tool typically boots the engine in a special mode where it runs a custom lua script designed to collaborate with that particular tool. For example, the particle editor boots the engine with &lt;i&gt;particle_editor_slave.lua &lt;/i&gt;which sets up a default scene for viewing particle effects with a camera, skydome, lights, etc.&amp;nbsp;The tool then sends script commands over the network connection that tells the engine what to do, for example to display a particular effect:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;type = "script",&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;script = "ParticleEditorSlave:test_effect('fx/grenade/explosion')"&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;These commands are handled by the slave script. The slave script can also send messages back if the tool is requesting information.&lt;br /&gt;&lt;br /&gt;The slave scripts are usually quite simple. The particle editor slave script is just 120 lines of lua code.&lt;br /&gt;&lt;br /&gt;To make the tools independent of the engine data formats we have separated the data into human-readable, extensible and backwards compatible &lt;i&gt;generic data&lt;/i&gt;&amp;nbsp;and fast, efficient, platform specific &lt;i&gt;runtime data&lt;/i&gt;. The tools always work with the generic data, which is pretty much all in JSON (exceptions are textures and WAVs). Thus, they never need to care about how the engine represents its runtime data and the engine is free to change and optimize the runtime format however it likes.&lt;br /&gt;&lt;br /&gt;When the tool has changed some data and wants to see the change in-engine, it launches the &lt;i&gt;data compiler&lt;/i&gt;&amp;nbsp;to generate the runtime data. (The data compiler is in fact just the regular Win32 engine started with a &lt;i&gt;-compile&lt;/i&gt; flag, so the engine and the data compiler are always in&amp;nbsp;sync. Any change of the runtime formats triggers a recompile.) The data compiler is clever about just compiling the data that has actually changed.&lt;br /&gt;&lt;br /&gt;When the compile is done, the tool sends a network message to the engine, telling it to reload the changed data file at which point you will see the changes in-game. All this happens nearly instantaneously allowing very quick tweaking of content and gameplay (by reloading lua files).&lt;br /&gt;&lt;br /&gt;This system has worked out really well for us. The decoupling has allowed for fast development of both the tools and the engine. Today we have about ten different tools that use this system and we have been able to make many optimizations to the engine and the runtime formats without affecting the tools or the generic data.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-9148636837880288441?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/9148636837880288441/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/04/our-tool-architecture.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/9148636837880288441'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/9148636837880288441'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/04/our-tool-architecture.html' title='Our Tool Architecture'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_D6mTIm8lbTo/S9FUk-lbK_I/AAAAAAAAADs/kN4fzGR5174/s72-c/BitSquidtoolarchitecture.png' height='72' width='72'/><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-373915833167946022</id><published>2010-04-09T16:17:00.000+02:00</published><updated>2010-04-09T16:17:28.813+02:00</updated><title type='text'>Distance Field Based Rendering of AngelCode Fonts</title><content type='html'>This morning, we added support for distance field based font rendering to the BitSquid engine (from Valve's paper&amp;nbsp;&lt;a href="http://www.valvesoftware.com/publications/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf"&gt;http://www.valvesoftware.com/publications/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf&lt;/a&gt;). An example is shown below:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_D6mTIm8lbTo/S78XlV7oPoI/AAAAAAAAADc/i8KDUccPlNU/s1600/distance_field_font.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="378" src="http://3.bp.blogspot.com/_D6mTIm8lbTo/S78XlV7oPoI/AAAAAAAAADc/i8KDUccPlNU/s400/distance_field_font.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;The top row shows the original font, below that is the original font rendered with alpha test. The third row is a distance field representation of the font and the final row shows the distance field representation rendered with alpha test. Note that the distance field version gives better quality in the diagonal lines.&lt;br /&gt;&lt;br /&gt;(Note: The last row looks thicker than the second row, because it was generated from a large font size and scaled down, while the second row was generated from a small font size. Because of true type font hinting at small sizes, the result is different. The last row gives a truer representation of the "actual" thickness of the font.)&lt;br /&gt;&lt;br /&gt;A quick Google search didn't show any good tools for generating distance field font maps, so I decided to write my own. We use the excellent AngelCode Bitmap Font Generator (&lt;a href="http://www.angelcode.com/products/bmfont/"&gt;http://www.angelcode.com/products/bmfont/&lt;/a&gt;) to generate our font maps, so I decided to make a tool that works with the files generated by AngelCode:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/S78vakjCNHI/AAAAAAAAADk/RoEgrzVPCP4/s1600/angelcode_font_converter.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="348" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/S78vakjCNHI/AAAAAAAAADk/RoEgrzVPCP4/s400/angelcode_font_converter.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;The tool takes a high resolution AngelCode &lt;i&gt;.fnt&lt;/i&gt; file as input. It scales it down by the specified scale factor and converts it to a distance field. The spread specifies how many pixels the distance field should extend outside the character outline before it clamps to zero. (It is useful if you want to add things such as glow effects to the font rendering.) After the conversion, the tool outputs new scaled down &lt;i&gt;.tga&lt;/i&gt; images of the fonts and a new &lt;i&gt;.fnt&lt;/i&gt; file with all measurements converted to work with the scaled down textures.&lt;br /&gt;&lt;br /&gt;So to use it, you first generate a font bitmap and &lt;i&gt;.fnt&lt;/i&gt; file using AngelCode at 8 x the font size and 8 x the texture size you want in the final image. (Make sure to add &lt;i&gt;8 x spread&lt;/i&gt; pixels of padding around the characters or else the distance fields will bleed into each other.) Then you run the tool to convert it to a distance field texture.&lt;br /&gt;&lt;br /&gt;The tool is a bit limited -- it only works with monochrome uncompressed &lt;i&gt;.tga&lt;/i&gt; files. It only reads and writes the XML version of the AngelCode font format. The distance field generation isn't particularly clever or fast. But I thought I should share it anyway since I couldn't find any other tools for generating distance field based font maps. Modifying it to support more formats shouldn't be much work.&lt;br /&gt;&lt;br /&gt;Grab a binary version here:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.bitsquid.se/files/distance_field.exe.7z"&gt;distance_field.exe.7z&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Or the C# project files here:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.bitsquid.se/files/distance_field.7z"&gt;distance_field.7z&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Feel free to do whatever you want with it!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-373915833167946022?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/373915833167946022/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/04/distance-field-based-rendering-of.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/373915833167946022'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/373915833167946022'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/04/distance-field-based-rendering-of.html' title='Distance Field Based Rendering of AngelCode Fonts'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_D6mTIm8lbTo/S78XlV7oPoI/AAAAAAAAADc/i8KDUccPlNU/s72-c/distance_field_font.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4252046351274042270</id><published>2010-03-25T11:09:00.001+01:00</published><updated>2010-03-25T11:16:43.331+01:00</updated><title type='text'>Task Management -- A Practical Example</title><content type='html'>I've spent the last couple of days rewriting the task manager in the BitSquid engine. Task management is an important topic in our glorious multicore future, but it is hard to find good practical information about it. GDC was also a bit of a disappointment in this regard. So I thought I should share some of my thoughts and experiences.&lt;br /&gt;&lt;br /&gt;The previous iteration of our task scheduler was based on Vista ThreadPools and mainly supported data parallelism. (Though we still had a degree of task parallelism from running two main threads -- an update thread and a render thread -- which both posted batches of jobs to the task manager.)&lt;br /&gt;&lt;br /&gt;For the rewrite, I had a number of goals:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Move away from Vista Thread Pools. We want complete control over our job threads.&lt;/li&gt;&lt;li&gt;Minimize context switching. This is a guessing game on Windows, since the OS will do what the OS will do, but minimizing oversubscription of threads should help.&lt;/li&gt;&lt;li&gt;Make a system that can run completely task based. I. e., everything in the system is run as a task and there are no explicit &lt;i&gt;wait()&lt;/i&gt; calls. Instead the entire code flow is controlled by task dependencies. Such a design allows us to exploit all possibilities for parallelism in the code which leads to maximum core utilization. &lt;/li&gt;&lt;li&gt;Still be "backwards compatible" with a system that uses one or more "main threads" that &lt;i&gt;wait()&lt;/i&gt; for data parallel jobs to complete, so that we can move incrementally to a more and more task based code flow.&lt;/li&gt;&lt;li&gt;Support tasks that run on external processors, such as SPUs or GPUs.&lt;/li&gt;&lt;li&gt;Support hierarchical decomposition of tasks.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;By hierarchical decomposition I mean that it should be possible to analyze the system in terms of tasks and subtasks. So that, at a higher level, we can regard the animation system as a single task that runs in parallel to other system tasks:&lt;/div&gt;&lt;div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/S6sjhNu0e-I/AAAAAAAAACk/Lgc4Oi05FL4/s1600/hierarchy1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="172" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/S6sjhNu0e-I/AAAAAAAAACk/Lgc4Oi05FL4/s400/hierarchy1.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But then we can zoom in on the animation task and see that in fact is composed of a number of subtasks which in turn parallelize:&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_D6mTIm8lbTo/S6sjoquh6JI/AAAAAAAAACs/eTthcGJj4jw/s1600/hierarchy2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="171" src="http://1.bp.blogspot.com/_D6mTIm8lbTo/S6sjoquh6JI/AAAAAAAAACs/eTthcGJj4jw/s400/hierarchy2.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hierarchical decomposition makes it possible to analyze systems and subsystems at different levels of abstraction rather than having to keep the entire task dependency graph in our heads. This is good because my head just isn't big enough.&lt;br /&gt;&lt;br /&gt;A task in the new implementation is a simple data structure:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_D6mTIm8lbTo/S6smCXVjGmI/AAAAAAAAAC0/1RAOc0LxfcI/s1600/task.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_D6mTIm8lbTo/S6smCXVjGmI/AAAAAAAAAC0/1RAOc0LxfcI/s320/task.png" /&gt;&lt;/a&gt;&lt;/div&gt;Here &lt;b&gt;work&lt;/b&gt; is a work item to be performed on an SPU, CPU or GPU. &lt;b&gt;affinity&lt;/b&gt; can be set for items that must be performed on particular threads.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;parent&lt;/b&gt; specifies child/parent relationships between tasks. A task can have any number of children/subtasks. A task is considered &lt;i&gt;completed&lt;/i&gt; when its work has been executed and all its children has completed. In practice this is implemented by the &lt;b&gt;open_work_items&lt;/b&gt;&amp;nbsp;counter. The counter is initially set to the number of child tasks + 1 (for the task's own work item).&amp;nbsp;When a task completes, it reduces the &lt;b&gt;open_work_items&lt;/b&gt;&lt;i&gt; &lt;/i&gt;count of its parent and when that figure reaches zero, the parent work is completed.&lt;br /&gt;&lt;br /&gt;I do not explicitly track completed task. Instead I keep a list of all open (i.e. not completed) tasks. Any task that is not in the open list is considered completed. Note that the open list is separate from the queue of work items that need to be performed. Items are removed from the queue when they are scheduled to a worker thread and removed from the open list when they have completed.&lt;br /&gt;&lt;br /&gt;The &lt;b&gt;dependency&lt;/b&gt; field specifies a task that the task depends on. The task is not allowed to start until its dependency task has completed. Note that a task can only have a single dependency. The reason for this is that I wanted the task structure to be a simple POD type and not include any arrays or other external memory references.&lt;br /&gt;&lt;br /&gt;Having a single dependency is not a limitation, because if we want to depend on more than one task we can just introduce an anonymous task with no work item that has all the tasks we want to depend on as children. That task will complete when all its children has completed, so depending on that task gives us the wanted dependencies.&lt;br /&gt;&lt;br /&gt;The &lt;b&gt;priority&lt;/b&gt; field specfies the importance of the task. When several tasks are available, we will pick the one with the highest priority. I will discuss this a bit more in a minute.&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;Task Manager&lt;/i&gt; has a number of threads for processing tasks. Some of these are "main threads" that are created by other parts of the system and registered with the thread manager (in our case, an &lt;i&gt;update&lt;/i&gt; thread and a &lt;i&gt;render&lt;/i&gt; thread). The rest are worker threads created internally by the task manager. The number of worker threads is:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;i&gt;worker_thread_count = number_of_cores - main_thread_count&lt;/i&gt;&lt;/div&gt;&lt;br /&gt;The total number of threads managed by the task manager thus equals the number of cores in the system, so we have no over- or undersubscription.&lt;br /&gt;&lt;br /&gt;The worker threads are in a constant loop where they check the task manager for work items to perform. If a work item is available, they perform it and then notify the task manager of its completion. If no work items are available, they sleep and are woken by the task manager when new work items become available.&lt;br /&gt;&lt;br /&gt;The main threads run their normal serial code path. As part of that code path, they can create tasks and subtasks that get queued with the task manager. They can also &lt;i&gt;wait()&lt;/i&gt; for tasks to complete. When a thread waits for a task it doesn't go idle. Instead it loops and helps the task manager with completing tasks. Only when there are no more tasks in the queue does the thread sleep. It wakes up again when there are more tasks to perform or when the task it originally waited for has completed.&lt;br /&gt;&lt;br /&gt;The main threads can also process tasks while waiting for other events by calling a special function in the task manager &lt;i&gt;do_work_while_waiting_for(Event &amp;amp;)&lt;/i&gt;. For example, the update thread calls this to wait for the frame synchronization event from the render thread.&lt;br /&gt;&lt;br /&gt;This means that all task manager threads are either running their serial code paths or processing jobs -- as long as there are jobs to perform and they don't get preempted by the OS. This means that as long as we have lots of jobs and few sync points we will achieve 100 % core utilization.&lt;br /&gt;&lt;br /&gt;This approach also allows us to freely mix serial code with a completely task based approach. We can start out with a serial main loop (with data parallelization in the &lt;i&gt;update()&lt;/i&gt;&amp;nbsp;functions):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;void World::update()&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;_animation-&amp;gt;update()&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;_scene_graph-&amp;gt;update();&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;_gui-&amp;gt;update();&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;render();&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;_sound-&amp;gt;update();&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;And gradually convert it to fully braided parallelism (this code corresponds to the task graph shown above):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;void&amp;nbsp;World::update()&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId animation = _tasks-&amp;gt;add( animation_task(_animation) );&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId scene_graph = _tasks-&amp;gt;add( scene_graph_task(_scene_graph) );&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;depends_on(scene_graph, animation);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId gui = _tasks-&amp;gt;add( gui_task(_gui) );&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId gui_scene = _tasks-&amp;gt;add_empty();&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;add_child(gui_scene, scene_graph);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;add_child(gui_scene, gui);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId render = _tasks-&amp;gt;add( render_task(this) );&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;depends_on(render, gui_scene);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId sound = _tasks-&amp;gt;add( sound_update_task(_sound) );&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;TaskId done = _tasks-&amp;gt;add_empty();&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;add_child(done, render);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;add_child(done, sound);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&amp;nbsp;&amp;nbsp;_tasks-&amp;gt;wait(done);&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Note that tasks, subtasks and dependencies are created dynamically as part of the execution of serial code or other tasks. I believe this "immediate mode" approach is more flexible and easier to work with than some sort of "retained" or "static" task graph building.&lt;br /&gt;&lt;br /&gt;A screenshot from our profiler shows this in action for a scene with 1000 animated characters with state machines:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/S6ssDX4NcGI/AAAAAAAAAC8/L9IwV7byzek/s1600/commented_profiler.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="313" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/S6ssDX4NcGI/AAAAAAAAAC8/L9IwV7byzek/s640/commented_profiler.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Notice how the main and render threads help with processing tasks while they are waiting for tasks to be completed.&lt;br /&gt;&lt;br /&gt;Once we have a task graph we want to make sure that our scheduler runs it as fast possible. Theoretically, we would do this by finding the critical path of the graph and making sure that tasks along the critical path are prioritized over other tasks. It's the classical &lt;i&gt;task scheduling problem&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;In a game, the critical path can vary a lot over different scenes. Some scenes are render bound, others are CPU bound. Of the CPU bound scenes, some may be bounded by script, others by animation, etc.&lt;br /&gt;&lt;br /&gt;To achieve maximum performance in all situations we would have to dynamically determine the critical path and prioritize the tasks accordingly. This is certainly feasible, but I am a bit vary of dynamically reconfiguring the priorities in this way, because it makes the engine harder to profile, debug and reason about. Instead I have chosen a simpler solution for now. Each job is given a priority and the highest priority jobs are performed first. The priorities are not fixed by the engine but configured per-game to match its typical performance loads.&lt;br /&gt;&lt;br /&gt;This seems like a resonable first approach. When we have more actual game performance data it would be interesting to compare this with the performance of a completely dynamic scheduler.&lt;br /&gt;&lt;br /&gt;In the current implementation, all tasks are posted to and fetched from a global task queue. There are no per thread task queues and thus no task stealing. At our current level of task granularity (heavy jobs are split into a maximum of &lt;i&gt;5 * thread_count&lt;/i&gt; tasks) the global task queue should not be a bottleneck. And a finer task granularity won't improve core utilization. When we start to have &amp;gt;32 cores the impact of the global queue may start to become significant, but until then I'd rather keep the system as simple as possible.&lt;br /&gt;&lt;br /&gt;OS context switching still hits us occasionally in this system. For example one of the animation blending tasks in the profiler screenshot takes longer than it should:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/S6ssDX4NcGI/AAAAAAAAAC8/L9IwV7byzek/s1600/commented_profiler.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="314" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/S6ssDX4NcGI/AAAAAAAAAC8/L9IwV7byzek/s640/commented_profiler.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I have an idea for minimizing the impact of such context switches that I may try out in the future. If a task is purely functional (idempotent) then it doesn't matter how many times we run the task. So if we detect a situation where a large part of the system is waiting for a task on the critical path (that has been switched out by the OS) we can allocate other threads to run the same task. As soon as &lt;i&gt;any&lt;/i&gt;&amp;nbsp;of the threads has completed the task we can continue.&lt;br /&gt;&lt;br /&gt;I haven't implemented this because it complicates the model by introducing two different completion states for tasks. One where &lt;i&gt;some&lt;/i&gt; thread has completed the task (and dependent jobs can run) and another where &lt;i&gt;all&lt;/i&gt; threads that took on the task have completed it (and buffers allocated for the task can be freed). Also, context switching is mainly a problem on PC which isn't our most CPU constrained platform anyway.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4252046351274042270?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4252046351274042270/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/03/task-management-practical-example.html#comment-form' title='24 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4252046351274042270'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4252046351274042270'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/03/task-management-practical-example.html' title='Task Management -- A Practical Example'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_D6mTIm8lbTo/S6sjhNu0e-I/AAAAAAAAACk/Lgc4Oi05FL4/s72-c/hierarchy1.png' height='72' width='72'/><thr:total>24</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1917076289077024275</id><published>2010-02-12T11:58:00.000+01:00</published><updated>2010-02-12T11:58:13.907+01:00</updated><title type='text'>The Blob and I</title><content type='html'>Having resource data in a single binary blob has many advantages over keeping it in a collection of scattered objects:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_D6mTIm8lbTo/S3Ul9ZhgzJI/AAAAAAAAABA/e1Zjbyf_XG0/s1600-h/blob.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="176" src="http://1.bp.blogspot.com/_D6mTIm8lbTo/S3Ul9ZhgzJI/AAAAAAAAABA/e1Zjbyf_XG0/s400/blob.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;Shorter load times.&lt;/i&gt; We can just stream the entire blob from disk to memory.&lt;/li&gt;&lt;li&gt;&lt;i&gt;Cache friendly.&lt;/i&gt; Related objects are at close locations in memory.&lt;/li&gt;&lt;li&gt;&lt;i&gt;DMA friendly.&lt;/i&gt; An entire blob can easily be transferred to a co-processor.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;In past engines I've used placement &lt;i&gt;new&lt;/i&gt; and pointer patching to initialize C++ objects from a loaded blob. To save a resource with this system all the objects are allocated after each other in memory, then their pointers are converted to local pointers (offsets from the start of the blob). Finally all the allocated data is written raw to disk.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When loading, first the raw data blob is loaded from disk. Then placement &lt;i&gt;new&lt;/i&gt;&amp;nbsp;is used with a special constructor to create the root object at the start of the blob. The constructor takes care of pointer-patching, converting the offsets back to pointers. Let's look at an example:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;class A&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp; &lt;/span&gt;int _x;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp; &lt;/span&gt;B *_b;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;public:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp; &lt;/span&gt;A(int x, B *b) : _x(x), _b(b) {}&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp; &lt;/span&gt;A(char* base) {&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp;   &lt;/span&gt;_b = (B*)( (char *)_b + (base - (char *)0) );&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp;   &lt;/span&gt;new (_b) B(base);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp; &lt;/span&gt;}&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre;"&gt;&amp;nbsp; &lt;/span&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;};&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;A *a = new (blob) A(blob);&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note that the constructor does not initialize &lt;i&gt;_x&lt;/i&gt;. &lt;i&gt;a&lt;/i&gt; is placement &lt;i&gt;new&lt;/i&gt;:ed into an area that already contains an &lt;i&gt;A&lt;/i&gt; object with the right value for &lt;i&gt;_x&lt;/i&gt; (the saved value). By not initializing &lt;i&gt;_x&lt;/i&gt; we make sure that it keeps its saved value. The constructor does three things:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Initializes the vtable pointer of &lt;i&gt;a&lt;/i&gt;. This is done "behind the scenes" by C++ when we call &lt;i&gt;new&lt;/i&gt;. It is necessary for us to be able to use &lt;i&gt;a&lt;/i&gt; as an &lt;i&gt;A&lt;/i&gt; object, since the vtable pointer of &lt;i&gt;A&lt;/i&gt; saved in the file during data compilation will typically not match the vtable pointer of &lt;i&gt;A&lt;/i&gt; in the runtime.&lt;/li&gt;&lt;li&gt;Pointer patches &lt;i&gt;_b&lt;/i&gt;, converting it from an offset from the blob base to its actual memory location.&lt;/li&gt;&lt;li&gt;Placement &lt;i&gt;new&lt;/i&gt;:s &lt;i&gt;B&lt;/i&gt; into place so that &lt;i&gt;B&lt;/i&gt; also gets the correct vtable, patched pointers, etc. Of course &lt;i&gt;B&lt;/i&gt;'s constructor may in turn create other objects.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Like many "clever" C++ constructs this solution gives a smug sense of satisfaction. Imagine that we are able to do this using our knowledge of vtables, placement &lt;i&gt;new&lt;/i&gt;, etc. Truly, we are Gods that walk the earth!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Of course it doesn't stay this simple. For the solution to be complete it must also be able to handle base class pointers (call a different &lt;i&gt;new&lt;/i&gt; based on the "real" derived class of the object, which must be stored somewhere), arrays and collection classes (we can't use &lt;i&gt;std::vector&lt;/i&gt;, etc because they don't fit into our clever little scheme).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Lately, I've really come to dislike these kinds of C++ "framework" solutions that require that every single class in a project conform to a particular world view (implement a special constructor, a special &lt;i&gt;save()&lt;/i&gt; function, etc). It tends to make the code very coupled and rigid. God forbid you ever had to change anything in the serialization system, because now the entire &lt;i&gt;WORLD&lt;/i&gt; depends on it. The special little placement constructors creep in everywhere and pollute a lot of classes that don't really want to care about serialization. This makes the entire code base complicated and ugly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Also, it should be noted that naively "blobbing" a collection of scattered objects by just concatenating them in memory does not necessarily lead to optimal memory access patterns. If the memory access order does not match the serialization order there can still be a lot of jumping around in memory. The serialization order with this kind of solution tends to be depth-first and can be tricky to change. (Since the entire &lt;i&gt;WORLD&lt;/i&gt; depends on the serialization system!)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the BitSquid engine I use a much simpler approach to resource blobs. The BitSquid engine is &lt;i&gt;data-centric&lt;/i&gt;&amp;nbsp;rather than &lt;i&gt;class-centric&lt;/i&gt;. The data design is done first -- laid out in simple structs, optimized for the typical access patterns and DMA transfers. Then functions are defined that operate on the data. Classes are used to organize higher level systems, not in the low level processing intensive systems or resource definitions. Inheritance is very rarely used. (Virtual function calls are always cache unfriendly since they resolve to different code locations for each object. It is better to keep objects sorted by type and then you don't really need virtual calls.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I believe this "old-school" C-like approach not only gives better performance, but also in many cases a better design. A looser coupling between data and processing makes it easier to modify things and move them around. And deep, bad inheritance structures are the main source of unnecessary coupling in C++ programs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Since the resource data is just simple structs, not classes with virtual functions, we can just write it to disk and read it back as we please. We don't need to initialize any vtable pointers, so we don't need to call &lt;i&gt;new&lt;/i&gt; on the data.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;The problem with pointer patching is solved in the simplest way possible -- I don't use pointers in the resource data. Instead, I just use offsets all the time, both in memory and on disk.&amp;nbsp;For example, the resource data for our particle systems looks something like this (simplified):&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/S3UqZ2pZlVI/AAAAAAAAABI/qNpF8B_0ARA/s1600-h/resource_layout.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="102" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/S3UqZ2pZlVI/AAAAAAAAABI/qNpF8B_0ARA/s400/resource_layout.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Yes, having offsets in the resource data instead of pointers means that I occasionally need to do a pointer add to find the memory location of an object. I'm sure someone will balk at this "unnecessary" computation, but I can't see it having any significant performance impact whatsoever. (If you have to do it a lot, then you are jumping around in memory a lot and then &lt;i&gt;that&lt;/i&gt; is the main source of your performance problem.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The advantage is that since I'm only storing offsets I don't need to do any pointer patching at all. I can move the data around in memory as I like, make copies of it, concatenate it to other blobs to make bigger blobs, save it to disk and read it back with a single operation and no need for pre- or post-processing. There is no complicated "serialization framework". No system in the engine needs to care about how any other system stores or reads it data.&lt;br /&gt;&lt;br /&gt;As in many other cases the data-centric approach gives a solution that is simpler, faster, more flexible and more modular.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1917076289077024275?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1917076289077024275/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/02/blob-and-i.html#comment-form' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1917076289077024275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1917076289077024275'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/02/blob-and-i.html' title='The Blob and I'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_D6mTIm8lbTo/S3Ul9ZhgzJI/AAAAAAAAABA/e1Zjbyf_XG0/s72-c/blob.png' height='72' width='72'/><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2223199635203174756</id><published>2010-01-19T18:33:00.000+01:00</published><updated>2010-01-19T18:33:00.679+01:00</updated><title type='text'>Content Repositories and Databases</title><content type='html'>I've been toying with the idea of replacing game content repositories (Perforce, Subversion) with something else. After all, nobody really likes content repositories -- they are slow, non-intuitive, give rise to merge problems, etc.&amp;nbsp;Version control systems were primarily designed for code, not for content, and that shows. So what could replace them? One option is to use a central database. There are a number of superficial advantages to that approach:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Simpler -- no need to update or check-in.&lt;/li&gt;&lt;li&gt;Changes are immediately visible to everyone.&lt;/li&gt;&lt;li&gt;No merge issues.&lt;/li&gt;&lt;li&gt;Collaborative editing (several designers working on the same level) is possible.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;But we would loose all the nice features of version control:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Accountability, history tracking and reversion.&lt;/li&gt;&lt;li&gt;Branching and tagging.&lt;/li&gt;&lt;li&gt;Having local, uncommitted changes in a working copy.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;How necessary are those features? I would say that they are essential. But I also have a small nagging doubt that maybe this opinion is just the result of my own prejudices as a programmer. After all, people in many industries do lots of serious collaborative work using databases without branching, reversion or working copies. Still, I'm not ready to take the plunge and give up on version control features. (Though if anyone has tried it, I would certainly like to hear about it.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Having those features by necessity implies some of the complexities&amp;nbsp;associated&amp;nbsp;with version control. For example, if we want a local working copy we need some explicit check-in/update mechanism. If we don't need a local copy we can just make the editor do &lt;i&gt;svn update, svn commit&lt;/i&gt;&amp;nbsp;on each change and the repository will be as "immediate" as a database.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Collaborative editing depends more on how the editor is implemented than on the storage backend. Regardless of&amp;nbsp;whether&amp;nbsp;we are using a database or a repository the editor will at some point have to fetch and display the changes made by other users as well as submit the changes made by the local user. With a repository backend, &lt;i&gt;svn update &lt;/i&gt;and &lt;i&gt;svn commit&lt;/i&gt;&amp;nbsp;could be used for that purpose.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The only issue then is to avoid merge conflicts as much as possible, since they force the user to interact with the &lt;i&gt;svn update&lt;/i&gt;&amp;nbsp;command and ruin the collaborative editing experience. Fortunately, that should be relatively easy. At BitSquid, we store most of our data in JSON-like structures. With a JSON-aware 3-way-merger, conflicts will only arise if the same field in the same JSON-object is changed, which should happen rarely.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, no great new way of storing content. Instead I just have to write a 3-way JSON-merger to protect the content people from merge conflicts. And then start working on the collaborative level editor...&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2223199635203174756?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2223199635203174756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2010/01/content-repositories-and-databases.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2223199635203174756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2223199635203174756'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2010/01/content-repositories-and-databases.html' title='Content Repositories and Databases'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2031990049229035750</id><published>2009-12-11T10:43:00.000+01:00</published><updated>2009-12-11T10:43:42.966+01:00</updated><title type='text'>Events</title><content type='html'>An event system can be both useful and dangerous. Useful, because it allows you to create loose couplings between systems in the engine (an &lt;i&gt;animation&lt;/i&gt;&amp;nbsp;foot step generates a &lt;i&gt;sound&lt;/i&gt;), which makes a more modular design possible and prevents different systems from polluting each other's interfaces.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Dangerous, because the loose coupling can sometimes hide the logical flow of the application and make it harder to understand, by obliterating call stacks and adding confusing layers of indirection. This is especially true the more "features" are added to the event system. For example, a typical nightmare event system could consist of:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;A global EventDispatcher singleton where everyone can post events, and everyone can listen to events, provided they (multiply) inherit from the EventPublisher and EventSubscriber interface classes.&lt;/li&gt;&lt;li&gt;Multiple listeners per event with a priority order and an option for a listener to say that it has fully processed an event and that it shouldn't be sent to the other listeners.&lt;/li&gt;&lt;li&gt;An option for posting delayed events, that should be delivered "in the future".&lt;/li&gt;&lt;li&gt;The possibility to block all events of a certain type during the processing of an event.&lt;/li&gt;&lt;li&gt;Additional horrors...&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;So much is wrong here: Global objects with too much responsibility that everything needs to tie into. Forcing all classes into a heavy-handed inheritance structure (no I don't want all my objects to inherit EventPublisher, EventDispatcher, Serializable, GameObject, etc). Strange control flow affecting commands providing spooky "action at a distance" (who blocked my event this time?).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Instead, I believe that the key to a successful event system is to make it as simple and straightforward as possible. You really don't need the "advanced" and "powerful" features. Such complex functionality should be implemented in high-level C or script code, where it can be properly examined, debugged, analyzed, etc. Not in a low level event manager.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note also that &lt;i&gt;callbacks/delegates&lt;/i&gt;&amp;nbsp;cannot completely replace events. While an event will probably generate some kind of callback as the final stage of its processing, we also need to be able to represent the event as an encapsulated data object. That is the only way to store it in a list for example. It is also the only way to pass it from one processing thread to another, which is crucial for a multithreaded engine.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, with this background, let's look at how events are treated in the BitSquid engine. In the BitSquid engine an event is just a struct:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;code&gt;&lt;/code&gt;&lt;/div&gt;&lt;code&gt;&lt;div&gt;struct CollisionEvent&lt;/div&gt;&lt;div&gt;{&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Actor *actors[2];&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Vector3 where;&lt;/div&gt;&lt;div&gt;};&lt;/div&gt;&lt;/code&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An &lt;i&gt;event stream&amp;nbsp;&lt;/i&gt;is a blob of binary data consisting of concatenated event structs. Each event struct in the blob is preceded by a header that specifies the event type (an integer uniquely identifying the event) and the size of the event struct:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;code&gt;&lt;/code&gt;&lt;/div&gt;&lt;code&gt;&lt;div&gt;[header 1][event 1][header 2][event 2] ... [header n][event n]&lt;/div&gt;&lt;/code&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Since the size of each event is included, an &lt;i&gt;event consumer&lt;/i&gt; that processes an event stream can simply skip over the events it doesn't understand or isn't interested in.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There is no global event dispatcher in the engine (globals are bad). Instead each system that can generate events produces its own event stream. So, each frame the physics system (for instance) generates a stream of physics events. A higher level system can extract the event stream and consume the events, taking appropriate actions for each event.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For example, the world manager connects physics events to script callbacks. It consumes the event list from the physics subsystem. For each event, it checks if the involved entity has a script callback mapped for the event type. If it has, the world manager converts the event struct to a Lua table and calls the callback. Otherwise, the event is skipped.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this way we get the full flexibility and loose coupling of an event system without any of the drawbacks of traditional heavy-weight event systems. The system is completely modular (no global queues or dispatchers) and thread friendly (each thread can produce its own event stream and events can be posted to different threads for processing). It is also very fast, since event streams are just cache-friendly blobs of data that are processed linearly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2031990049229035750?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2031990049229035750/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/12/events.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2031990049229035750'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2031990049229035750'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/12/events.html' title='Events'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-263747545961142963</id><published>2009-11-20T10:54:00.001+01:00</published><updated>2009-11-20T10:55:54.990+01:00</updated><title type='text'>The BitSquid low level animation system</title><content type='html'>In the BitSquid engine we differ between the low level and the high level animation system. The low level system has a simple task: given animation data, find the bone poses at a time &lt;i&gt;t&lt;/i&gt;. The high level system is responsible for blending animations, state machines, IK, etc.&lt;br /&gt;&lt;br /&gt;Evaluation of animation data is a memory intensive task, so to maximize performance means:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Touch as little memory as possible (i.e., compress the animations as much as possible)&lt;/li&gt;&lt;li&gt;Touch memory in a cache friendly way (i.e., linearly)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;In the BitSquid engine we do animation compression by curve fitting and data quantization.&lt;br /&gt;&lt;br /&gt;There are a lot of different possible ways to do curve fitting. Since we are curve fitting for compression it doesn't really matter what method we use as long as (a) we can keep the error below a specified threshold, (b) the curve representation is small (good compression rate), (c) the curve is reasonably smooth and (d) it does not take too long to evaluate.&lt;br /&gt;&lt;br /&gt;In the BitSquid engine we currently use a hermite spline with implicitly computed derivatives. I.e., we represent the curve with time and data points: (t_1, D_1), (t_2, D_2), ..., (t_n, D_n) and evaluate the curve at the time T in the interval t_i ... t_i+1, with t = (T - t_i) / (t_i+1 - t_i) by&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_D6mTIm8lbTo/SwZnkNRCGwI/AAAAAAAAAA0/sGsng4QfTlg/s1600/CodeCogsEqn.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_D6mTIm8lbTo/SwZnkNRCGwI/AAAAAAAAAA0/sGsng4QfTlg/s640/CodeCogsEqn.gif" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This formulation gives pretty good compression rates, but I haven't investigate all the possible alternatives (there are a lot!). It is possible that you could achieve better rates with some other curve. An advantage of this formulation is that it only uses the original data points of the curve and scaling constants in the range 0-1, which makes it easy to understand &amp;nbsp;the effects of quantization.&lt;br /&gt;&lt;br /&gt;To do the curve fitting we just check the error in all curve intervals, find the interval D_i D_i+1 with the largest error and split it in half by introducing a new data point at (t_i + t_i+1)/2. We repeat this until the error in all intervals is below a specified threshold value. Again, it is possible that more careful selection of split points could give slightly better compression rates, but we haven't bothered. Note also that we can support curve&amp;nbsp;discontinuities&amp;nbsp;by just inserting two different data points for the same time point.&lt;br /&gt;&lt;br /&gt;Animation compression can be done either in local space or in global space. The advantage of keeping the animations in global space is that there is no error propagation through the bone hierarchy, which means that you can use larger error thresholds when compressing the animations. On the other hand, the movement of a bone in global space is typically more complicated. (For a closed fist on a moving arm, the fingers will have no movement in local space, but a lot of movement in global space.) Since a more complicated movement is harder to compress, it might be that the global representation is more expensive, even though you can use a higher threshold. (I haven't actually tried this and compared - so much to do, so little time.)&lt;br /&gt;&lt;br /&gt;Also, if you are going to do any animation blending you will probably want to translate back to local space anyhow (unless you blend in global space). For this reason, the BitSquid engine does the compression in local space.&lt;br /&gt;&lt;br /&gt;For Vector3 quantization we use 16 bits per component and the range -10 m to 10 m which gives a resolution of 0.3 mm.&lt;br /&gt;&lt;br /&gt;For quaternions we use 2 bits to store the index of the largest component, then 10 bits each to store the value of the remaining three components. We use the knowledge that 1 = x^2 + y^2 + z^2 + w^2 to restore the largest component, so we don't actually have to store its value. Since we don't store the largest component we know that the remaining ones must be in the range (-1/sqrt(2), 1/sqrt(2)) (otherwise, one of them would be largest). So we use the 10 bits to quantize a value in that range, giving us a precision of 0.0014.&lt;br /&gt;&lt;br /&gt;So, to summarize, that gives us 48 bits per Vector3 curve point and 32 bits per quaternion curve point, plus 16 bits for the time stamp. Now the only thing remaining is to package all these curve points for all the bones in a cache friendly way. This will be the topic of another blog post, since this one is already long enough.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-263747545961142963?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/263747545961142963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/11/bitsquid-low-level-animation-system.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/263747545961142963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/263747545961142963'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/11/bitsquid-low-level-animation-system.html' title='The BitSquid low level animation system'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_D6mTIm8lbTo/SwZnkNRCGwI/AAAAAAAAAA0/sGsng4QfTlg/s72-c/CodeCogsEqn.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-9201024779793183167</id><published>2009-10-23T19:09:00.001+02:00</published><updated>2009-10-26T09:31:44.986+01:00</updated><title type='text'>Picking a scripting language</title><content type='html'>&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span style="font-size: 13px;"&gt;&lt;span style="font-family: 'Times New Roman';"&gt;&lt;span style="font-size: medium;"&gt;We are planning to make the BitSquid engine largely scripting language agnostic. We will expose a generic scripting interface from the engine and it should be relatively easy to bind that to whatever scripting language you desire.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Still, we have to pick some language to use for our own internal projects and recommend to others. I'm currently considering three candidates:&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;b&gt;C/C++&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Use regular C/C++ for scripting.&lt;/li&gt;&lt;li&gt;Run it dynamically either by recompiling and relinking DLLs or by running an x86 interpreter in the game engine and loading compiled libs directly.&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ Static typing&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ Syntax checking &amp;amp; compiling can be done with an ordinary compiler&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ When releasing the game we can compile to machine code and get full native speed&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- C is not that nice for scripting&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Huge performance differences between "fully compiled" and "interactive" code makes it difficult for the gameplay programmers to do performance estimates.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;b&gt;Lua&lt;/b&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Lua has the same feature set as Python and Ruby, but is smaller, more elegant and faster.&lt;/li&gt;&lt;li&gt;Other scripting langues such as Squirrel, AngelScript offer reference counting and static typing, but are not as well known / used&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ Dynamic, elegant, small&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ Something of a standard as a game scripting language&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ LuaJIT is very fast&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Non-native objects are forced to live on the heap&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Garbage collection can be costly for a realtime app&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Speed can be an issue compared to native code&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Cannot use LuaJIT on consoles&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;b&gt;Mono&lt;/b&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Use the Mono runtime and write scripts in C#, Boo, etc.&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ Static typing&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: #274e13;"&gt;+ Popular, fast&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Huge, scary runtime&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Garbage collection&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Requires license to run on console&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="color: red;"&gt;- Can probably not JIT on console&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-9201024779793183167?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/9201024779793183167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/10/picking-scripting-language.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/9201024779793183167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/9201024779793183167'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/10/picking-scripting-language.html' title='Picking a scripting language'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-107956214303119623</id><published>2009-10-23T17:15:00.000+02:00</published><updated>2009-10-23T17:15:02.070+02:00</updated><title type='text'>First profiler screenshot</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_D6mTIm8lbTo/SuHFlPzENiI/AAAAAAAAAAM/cGj9ZMcV62M/s1600-h/profiler.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_D6mTIm8lbTo/SuHFlPzENiI/AAAAAAAAAAM/cGj9ZMcV62M/s640/profiler.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;We now have the BitSquid thread profiler up and running. The profiler is a C# application that receives profiler events from the engine over a TCP pipe.&lt;br /&gt;&lt;br /&gt;The screen shot above shows a screen capture from a test scene with 1 000 individually animated 90-bone characters running on a four core machine. The black horizontal lines are the threads. The bars are profiler scopes. Multiple bars below each other represent nested scopes (so &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;Application::update&lt;/span&gt; is calling &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;MyGame::update&lt;/span&gt; for instance). Color represents the core that the scope started running on (we do not detect core switches within scopes).&lt;br /&gt;&lt;br /&gt;In the screen shot above, you can see &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;AnimationPlayer::update&lt;/span&gt; starting up 10 &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;animation_player_kernel&lt;/span&gt; jobs to evaluate the animations. Similarly &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;SceneGraphManager::update&lt;/span&gt; runs five parallel jobs to update the scene graph. &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;SceneGraphAnimators &lt;/span&gt;only copies the animation data from the animation output into the scene graphs. But even this takes some time, since we are copying 90 000 matrices.&lt;br /&gt;&lt;br /&gt;(Of course if we would make a 1 000 people crowd in a game we would use clever instancing, rather than run 1 000 animation and scene graph evaluations. This workload was just used to test the threading.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-107956214303119623?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/107956214303119623/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/10/first-profiler-screenshot.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/107956214303119623'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/107956214303119623'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/10/first-profiler-screenshot.html' title='First profiler screenshot'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_D6mTIm8lbTo/SuHFlPzENiI/AAAAAAAAAAM/cGj9ZMcV62M/s72-c/profiler.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-1769581464135765089</id><published>2009-10-14T23:00:00.010+02:00</published><updated>2009-10-15T09:45:25.736+02:00</updated><title type='text'>Parallel rendering</title><content type='html'>&lt;div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: left;"&gt;I've spent the last week designing and implementing the low-level parts of the renderer used in our new engine. One of the key design principles of the engine is to go as wide / parallel as possible whenever possible. To be able to do that in a clean and efficient way a good data streaming model with minimal pointer chasing is key.&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; min-height: 14px;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;With the rendering I've tackled that by splitting the batch processing in three passes: batch gathering, merge-n-sort and display list building.&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; min-height: 14px;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;In the batch gathering pass we walk over the visible objects (objects that have survived visibility culling) and let them queue their draw calls to a RenderContext. A RenderContext is a platform independent package stream that holds all data needed for draw calls (and other render jobs/events/state changes etc). This step is easily divided into any number of jobs, by letting each job have its own RenderContext.&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; min-height: 14px;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;After the batch gathering is done we have all data needed to draw the scene in &lt;i&gt;n&lt;/i&gt; number of RenderContexts. The purpose of the merge-n-sort step is to take those RenderContexts, merge them to one while at the same time sorting all batches into the desired order (with respect to "layers", minimizing state changes, depth sorting etc).&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; min-height: 14px;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;We now have one sorted package stream containing all the draw calls that we can send off to the rendering back-end. At this point we can again go wide and build the display list in parallel. Here's a small sketch illustrating the data flow:&lt;br /&gt;&lt;/div&gt;&lt;div style="font: normal normal normal 12px/normal Helvetica; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_79Mk3_H7bBw/StY8veXsdGI/AAAAAAAAAAY/SWRHI_12bNk/s1600-h/renderer_flow.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5392564390126711906" src="http://2.bp.blogspot.com/_79Mk3_H7bBw/StY8veXsdGI/AAAAAAAAAAY/SWRHI_12bNk/s400/renderer_flow.png" style="cursor: hand; cursor: pointer; height: 107px; width: 400px;" /&gt;&lt;/a&gt;&lt;a href="http://2.bp.blogspot.com/_79Mk3_H7bBw/StY8veXsdGI/AAAAAAAAAAY/SWRHI_12bNk/s1600-h/renderer_flow.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font: 12.0px Helvetica; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font: 12.0px Helvetica; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;Red sections belongs to the platform independent renderer. Blue sections belongs to the rendering back-end (in this illustration D3D11).&lt;br /&gt;&lt;/div&gt;&lt;div style="font: 12.0px Helvetica; margin: 0.0px 0.0px 0.0px 0.0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-1769581464135765089?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/1769581464135765089/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/10/parallel-rendering.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1769581464135765089'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/1769581464135765089'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/10/parallel-rendering.html' title='Parallel rendering'/><author><name>Tobias</name><uri>http://www.blogger.com/profile/16240529312060411542</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_79Mk3_H7bBw/StY8veXsdGI/AAAAAAAAAAY/SWRHI_12bNk/s72-c/renderer_flow.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-832556016572414563</id><published>2009-10-13T17:32:00.000+02:00</published><updated>2009-10-13T17:32:50.248+02:00</updated><title type='text'>Simplified JSON notation</title><content type='html'>JSON is human-editable, but not necessarily human-friendly. A typical JSON configuration file:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span style="background-color: white;"&gt;{&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span style="background-color: white;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;"ip" : "127.0.0.1",&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span style="background-color: white;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "port" : 666&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span style="background-color: white;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A more Lua-inspired syntax is friendlier:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;ip = "127.0.0.1"&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;port = 666&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This syntax corresponds 1-1 with regular JSON syntax and can be trivially converted back and forth with the following rules:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Assume an object definition at the root level (no need to surround entire file with { } ).&lt;/li&gt;&lt;li&gt;Commas are optional&lt;/li&gt;&lt;li&gt;Quotes around object keys are optional if the keys are valid identifiers&lt;/li&gt;&lt;li&gt;Replace : with =&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;On the other hand, all syntax wars are pointless and will only send us into an early grave.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-832556016572414563?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/832556016572414563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/10/simplified-json-notation.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/832556016572414563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/832556016572414563'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/10/simplified-json-notation.html' title='Simplified JSON notation'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-5920377066839021046</id><published>2009-10-13T17:22:00.000+02:00</published><updated>2009-10-13T17:22:35.425+02:00</updated><title type='text'>Multithreaded gameplay</title><content type='html'>&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;How do we multithread gameplay without driving gameplay programmers insane?&lt;/span&gt;&lt;/span&gt;&lt;div&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;My current idea is:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Do all gameplay processing as events reacting to stuff (such as &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;collide_with_pickup_object&lt;/span&gt;), not through a generic &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;update()&lt;/span&gt;&lt;span style="font-family: inherit;"&gt; c&lt;/span&gt;all.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Each event concerns a number of entities (e.g., &lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;[player, ammo_pack]&lt;/span&gt;). The processing function for an event is allowed to touch the entities it concerns freely, but not any other entities.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Each frame, consider all events. Let two entities being in the same event define an equivalence relation between those two entities. The corresponding equivalence classes then define "islands" of entities that can be processed safely on separate cores.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Assign each island to a core, process the events for that island one by one on the core.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Provide a thread-safe interface to any global entitites that the event processors may need to touch for effect spawning, sound play, etc. (Preferrably through a queue so that the global entities don't have to be touched directly from the event processors.)&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Some concerns:&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Will the islands become "too big". I.e., if almost everything interacts with the player, there is a risk that everything ends up in a single big "player island".&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;Will it be reasonable for gameplay programmers to write code that follows these restrictions.&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-5920377066839021046?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/5920377066839021046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/10/multithreaded-gameplay.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/5920377066839021046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/5920377066839021046'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/10/multithreaded-gameplay.html' title='Multithreaded gameplay'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-830219707540824231</id><published>2009-10-02T10:30:00.009+02:00</published><updated>2009-10-06T13:27:35.624+02:00</updated><title type='text'>Two way serialization function</title><content type='html'>A trick to avoid having to keep the serialization code for input and output in sync is to use the same code for both input and output:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;code&gt;struct Object {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;template &lt;&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;STREAM &amp;amp; serialize(STREAM &amp;amp; stream) {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;return stream &amp;amp; a &amp;amp; b &amp;amp; c;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;}&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;int a, b, c;&lt;/div&gt;&lt;div&gt;};&lt;/code&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here we have used &lt;code&gt;&amp;amp;&lt;/code&gt; as our serialization operator. We could use any operator we like.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We then just implement the operator to do the right thing for our input and output streams:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;code&gt;template &lt; &gt; InputArchive &amp;amp; operator &amp;amp;(InputArchive &amp;amp;a, int &amp;amp;v) {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;a.read(&amp;amp;v, sizeof(v));&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;return a;&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;template &lt; &gt; OutputArchive &amp;amp; operator &amp;amp; (OutputArchive &amp;amp;a, int &amp;amp;v) {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;a.write(&amp;amp;v, sizeof(v));&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;return a;&lt;/div&gt;&lt;div&gt;}&lt;/code&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;These are both template specializations of a generic streaming template.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;code&gt;template &lt;&gt;&lt;/div&gt;&lt;div&gt;STREAM &amp;amp; operator &amp;amp;(STREAM &amp;amp; stream, T &amp;amp; t) {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;t.serialize(stream);&lt;/div&gt;&lt;div&gt;}&lt;/code&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now we can stream all kinds of types either by implementing &lt;code&gt;serialize&lt;/code&gt; in the type or by defining a template specialization of &lt;code&gt;operator &amp;amp;&lt;/code&gt; for that type.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-830219707540824231?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/830219707540824231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/10/two-way-serialization-function.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/830219707540824231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/830219707540824231'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/10/two-way-serialization-function.html' title='Two way serialization function'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-4561992295793100334</id><published>2009-09-30T18:05:00.022+02:00</published><updated>2009-10-13T17:36:06.401+02:00</updated><title type='text'>Simple perfect murmur hashing</title><content type='html'>A simple way of finding a perfect (collision free) murmur hash for a set of keys &lt;span style="font-style: italic;"&gt;S&lt;/span&gt; is to simply iterate over the seed values until we find one that doesn't produce any collisions:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;seed := 0&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;while true&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;H[i] := murmur_hash(S[i], seed) for all i&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;return seed if no_duplicates(H)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;seed := seed + 1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As long as the size of the key set &lt;span style="font-style: italic;"&gt;S&lt;/span&gt; is not much bigger than the square root of the output range of the hash function, the algorithm above will terminate quickly. For example, for a 32 bit hash this algorithm works well for sets up to about 65 000 elements. (In fact we can go up to 100 000 elements and still find a good seed by just making a couple of extra iterations.)&lt;br /&gt;&lt;br /&gt;With a perfect hash function we only need to compare the hash values to dermine if two keys are equal, we never have to compare (or even store) the original keys themselves. We just have to store the 32-bit seed and the hash values. This saves both memory and processing time.&lt;br /&gt;&lt;br /&gt;In the BitSquid engine this simple perfect hashing scheme is used to generate 32-bit resource IDs from resource names and types.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-4561992295793100334?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/4561992295793100334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/09/simple-perfect-murmur-hashing.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4561992295793100334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/4561992295793100334'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/09/simple-perfect-murmur-hashing.html' title='Simple perfect murmur hashing'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-2684285747337706373</id><published>2009-09-30T18:05:00.021+02:00</published><updated>2009-10-06T18:00:07.537+02:00</updated><title type='text'>JSON configuration data</title><content type='html'>&lt;div style="text-align: left;"&gt;The BitSquid engine will use JSON as an intermediate format for all generic configuration data.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;JSON is better than a custom binary format because:&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;ul&gt;&lt;li&gt;The data can be inspected and debugged manually.&lt;/li&gt;&lt;li&gt;There are lots of editors.&lt;/li&gt;&lt;li&gt;Changes merge nicer in SVN.&lt;/li&gt;&lt;li&gt;The data is platform independent.&lt;/li&gt;&lt;li&gt;As long as you are just adding data fields, the data is both backward and forward compatible.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;JSON files are slower to parse than binary files, but that doesn't matter because it is only an &lt;i&gt;intermediate &lt;/i&gt;format. They are bigger, but not that much bigger, and again it doesn't matter because it is only an intermediate format. We will generate efficient binary data for the runtime.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;JSON is better than XML because:&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;ul&gt;&lt;li&gt;It is a lot simpler and easier to parse.&lt;/li&gt;&lt;li&gt;It maps directly to native data structures.&lt;/li&gt;&lt;li&gt;It is typed, meaning you can understand (more of) it without needing a DTD.&lt;/li&gt;&lt;li&gt;It is more "normalized". (In XML you have to choose whether to put information in attributes or in text nodes.&lt;/li&gt;&lt;/ul&gt;XML is good for marking up text, but not so good for describing data.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-2684285747337706373?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/2684285747337706373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/09/json-configuration-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2684285747337706373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/2684285747337706373'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/09/json-configuration-data.html' title='JSON configuration data'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1994130783874175266.post-5873537660870842074</id><published>2009-09-30T13:05:00.000+02:00</published><updated>2009-09-30T13:17:08.880+02:00</updated><title type='text'>Welcome to the BitSquid blog</title><content type='html'>This blog will collect rants, ideas and random thoughts about the development of the BitSquid game engine.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;See: &lt;a href="http://www.bitsquid.se"&gt;http://www.bitsquid.se&lt;/a&gt; for more information.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1994130783874175266-5873537660870842074?l=bitsquid.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bitsquid.blogspot.com/feeds/5873537660870842074/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://bitsquid.blogspot.com/2009/09/welcome-to-bitsquid-blog.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/5873537660870842074'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1994130783874175266/posts/default/5873537660870842074'/><link rel='alternate' type='text/html' href='http://bitsquid.blogspot.com/2009/09/welcome-to-bitsquid-blog.html' title='Welcome to the BitSquid blog'/><author><name>Niklas</name><uri>http://www.blogger.com/profile/10055379994557504977</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
