Wednesday, September 30, 2009

Simple perfect murmur hashing

A simple way of finding a perfect (collision free) murmur hash for a set of keys S is to simply iterate over the seed values until we find one that doesn't produce any collisions:

seed := 0
while true
    H[i] := murmur_hash(S[i], seed) for all i
    return seed if no_duplicates(H)
    seed := seed + 1

As long as the size of the key set S is not much bigger than the square root of the output range of the hash function, the algorithm above will terminate quickly. For example, for a 32 bit hash this algorithm works well for sets up to about 65 000 elements. (In fact we can go up to 100 000 elements and still find a good seed by just making a couple of extra iterations.)

With a perfect hash function we only need to compare the hash values to dermine if two keys are equal, we never have to compare (or even store) the original keys themselves. We just have to store the 32-bit seed and the hash values. This saves both memory and processing time.

In the BitSquid engine this simple perfect hashing scheme is used to generate 32-bit resource IDs from resource names and types.

JSON configuration data

The BitSquid engine will use JSON as an intermediate format for all generic configuration data.

JSON is better than a custom binary format because:
  • The data can be inspected and debugged manually.
  • There are lots of editors.
  • Changes merge nicer in SVN.
  • The data is platform independent.
  • As long as you are just adding data fields, the data is both backward and forward compatible.
JSON files are slower to parse than binary files, but that doesn't matter because it is only an intermediate format. They are bigger, but not that much bigger, and again it doesn't matter because it is only an intermediate format. We will generate efficient binary data for the runtime.

JSON is better than XML because:
  • It is a lot simpler and easier to parse.
  • It maps directly to native data structures.
  • It is typed, meaning you can understand (more of) it without needing a DTD.
  • It is more "normalized". (In XML you have to choose whether to put information in attributes or in text nodes.
XML is good for marking up text, but not so good for describing data.

Welcome to the BitSquid blog

This blog will collect rants, ideas and random thoughts about the development of the BitSquid game engine.

See: http://www.bitsquid.se for more information.