Wednesday, September 30, 2009

Simple perfect murmur hashing

A simple way of finding a perfect (collision free) murmur hash for a set of keys S is to simply iterate over the seed values until we find one that doesn't produce any collisions:

seed := 0
while true
    H[i] := murmur_hash(S[i], seed) for all i
    return seed if no_duplicates(H)
    seed := seed + 1

As long as the size of the key set S is not much bigger than the square root of the output range of the hash function, the algorithm above will terminate quickly. For example, for a 32 bit hash this algorithm works well for sets up to about 65 000 elements. (In fact we can go up to 100 000 elements and still find a good seed by just making a couple of extra iterations.)

With a perfect hash function we only need to compare the hash values to dermine if two keys are equal, we never have to compare (or even store) the original keys themselves. We just have to store the 32-bit seed and the hash values. This saves both memory and processing time.

In the BitSquid engine this simple perfect hashing scheme is used to generate 32-bit resource IDs from resource names and types.

10 comments:

  1. Sorry for being late in the discussion but it means IDs are computed in a preprocessing stage?

    ReplyDelete
  2. Yes, you would have to know the set of all possible key values in the preprocessing stage.

    ReplyDelete
  3. What if I just get all keys and map them to numbers from 0 to N one by one with increment? Would it cause problems?

    ReplyDelete
  4. In that case you would have to store the "map" somehow. How would you do that? In a hash table? Then you are using a non-perfect hashing scheme to get to your perfect hashing scheme, which means you have removed all advantages of it.

    ReplyDelete
  5. Nice blog, I really enjoy reading your articles !

    Unfortunately I am not sure to understand the purpose of those 32 bits resource IDs ...
    It speeds up comparisons for sure but do you really often compare resources in your game ? Have you practical examples in mind ?

    Do you plan to use those IDs for indexing ? In that case, either you need a 4 billion entry table or you use a smaller set of bits from the key. By doing the last option, you loose the benefits of perfect hashing and collisions come back. Even if it deals faster with collisions (complete hash as key), I admit.

    Otherwise, if comparisons are the only purpose of those IDs, iLych solution (above reply) may work too since we generate them at preprocessing stage. We do not need to store a map, we just want to give a unique ID to each resource and forget all about the rest (e.g. using alphanumeric or timestamp order as mapping rule). Then adding hash to resource data. Or maybe you only need the seed and processes hashes at runtime ?

    Sorry for bothering you with so much questions but I have the feeling it could help me in the future.

    ReplyDelete
  6. Yes, most often we use the IDs for indexing. As you say, we still have to deal with collision in the smaller set of bits, but the data structures become a lot smaller and simpler when you don't have to store a lot of arbitrary sized strings.

    But actually, we don't use perfect hashing any more... we just hash into a large enough keyspace that collisions are so unlikely that we don't have to worry about them.

    ReplyDelete
  7. MEJAQQ: AGEN JUDI POKER DOMINOQQ BANDARQ ONLINE TERBESAR DI ASIA

    Yang Merupakan Agen Judi Poker DominoQQ BandarQ Online Terbesar di Asia Hadir Untuk Anda Semua Dengan Games dan Bonus Yang Menarik!

    Bonus yang Kami Berikan di MEJAQQ :
    * Bonus CASHBACK 0,5% 2 KALI (Dibagikan setiap hari MINGGU dan RABU)
    * Bonus REFERRAL 10% + 10% SEUMUR HIDUP (Total kemenangan REFERRAL anda)
    * Minimal Deposit 20,000
    * Minimal Withdraw 20,000
    * 7 Bank Lokal (BCA, BNI, MANDIRI, BRI, DANAMON, CIMB NIAGA, PERMATA)
    * Deposit via E-Money, Pulsa TELKOMSEL dan XL (TANPA POTONGAN)
    * 100% Member vs Member
    * 11 Game Dalam 1 Akun
    * Pelayanan Bank dan Livechat 24 jam
    * Tersedia Dalam Aplikasi Android atau IOS.

    Mau dapet duit tanpa kerja? Bisa banget!
    Caranya? Buruan Kunjungi Sekarang Juga ^.^

    Info Lebih Lanjut :
    WA 1 : +85515620767
    WA 2 : +855977507271

    Kunjungi situs kami di :
    MEJAQQ
    BandarQ Online
    Agen BandarQ Online

    ReplyDelete
  8. Saat ini togel deposit pulsa 10rb tanpa potongan besar sudah bisa dimainkan di agen Majalah4D bersama agen togel online anda bisa melakukan togel deposit pulsa 10rb hanya di Agen Togel Toto 4D hanya dengan menggunakan 1 user id saja.


    Togel Online
    Togel Toto 4D
    Togel Toto 4D
    Togel Toto 4D
    Sabung Ayam Online
    Link Sv388
    Sabung ayam online
    Aplikasi sv388
    Slot Online Gacor
    Slot Gacor

    ReplyDelete