## Monday, June 22, 2015

### Allocation Adventures 2: Arrays of Arrays

Last week's post ended with a puzzle: How can we allocate an array of dynamically growing and shrinking things in an efficient and data-oriented way? I.e. using contiguous memory buffers and as few allocations as possible.

The example in that post was kind of complicated, and I don't want to get lost in the details, so let's look at a simpler version of the same fundamental problem.

Suppose we want to create a `TagComponent` that allows us to store a number of `unsigned` tags for an entity.

These tags will be hashes of strings such as `"player"`, `"enemy"`, `"consumable"`, `"container"`, etc and the `TagComponent` will have some sort of efficient lookup structure that allows us to quickly find all entities with a particular tag.

But to keep things simple, let's ignore that for now. For now we will just consider how to store these lists of tags for all our entities. I.e. we want to find an alternative to:

``std::vector< std::vector < unsigned> > data;``

that doesn't store every list in a separate memory allocation.

## Fixed size

If we can get away with it, we can get rid of the "array of arrays" by setting a hard limit on the number of items we can store per entity. In that case, the data structure becomes simply:

``````enum {MAX_TAGS = 8};
struct Tags
{
unsigned n;
unsigned tags[MAX_TAGS];
};
Array<Tags> data;``````

Now all the data is contained in a single buffer, the data buffer for `Array<Tags>`.

Sometimes the hard limit is inherent in the problem itself. For example, in a 2D grid a cell can have at most four neighbors.

Sometimes the limit is a widely accepted compromise between cost and quality. For example, when skinning meshes it is usually consider ok to limit the number of bone influences per vertex to four.

Sometimes there is no sensible limit inherent to the problem itself, but for the particular project that we are working on we can agree to a limit and then design the game with that limit in mind. For example we may know that there will never be more than two players, never more than three lights affecting an object, never more than four tags needed for an entity, etc.

This of course requires that we are writing, or at least configuring, the engine for a particular project. If we are writing a general engine to be used for a lot of games it is hard to set such limits without artificially constraining what those games will be able to do.

Also, since the fixed size must be set to the maximum array size, every entity that uses fewer entries than the maximum will waste some space. If we need a high maximum this can be a significant problem and it might make sense to go with a dynamic solution even though there is an upper limit.

So while the fixed size approach can be good in some circumstances, it doesn't work in every situation.

Instead of using arrays, we can put the tags for a particular entity in a linked list:

``````struct Tag
{
unsigned tag;
Tag *next;
};
Array<Tag *> data;``````

Using a linked list may seem like a very bad choice at first. A linked list can give us a cache miss for every `next` pointer we follow. This would give us even worse performance than we would get with `vector < vector < unsigned > >`.

But the nodes in the linked list do not necessarily have to be allocated individually on the heap. We can do something similar to what we did in the last post: allocate the nodes in a buffer and refer to them using offsets rather than pointers:

``````struct Node
{
unsigned tag;
unsigned next;
};
Array<Node> nodes;``````

With this approach we only have a single allocation -- the buffer for the array that contains all the tag nodes -- and we can follow the indexes in the `next` field to walk the list.

Side note: Previously I have always used `UINT_MAX` to mark an nil value for an `unsigned`. So in the struct above, I would have used `UINT_MAX` for the `next` value to indicate the end of the list. But recently, I've switched to using `0` instead. I think it is nice to be able to `memset()` a buffer to `0` to reset all values. I think it is nice that I can just use `if (next)` to check if the value is valid. It is also nice that the invalid value will continue to be `0` even if I later decide to change the type to `int` or `uint_16t`. It does mean that I can't use the `nodes[0]` entry, since that is reserved for the `nil` value, but I think the increased simplicity is worth it.

Using a single buffer rather than separate allocations gives us much better cache locality, but the `next` references can still jump around randomly in that buffer. So we can still get cache misses. If the buffer is large, this can be as bad as using freely allocated nodes.

Another thing to note is that we are wasting a significant amount of memory. Only half of the memory is used for storing tags, the rest of it is wasted on the `next` pointers.

We can try to address both these problems by making the nodes a little bigger:

``````enum {MAX_TAGS_PER_NODE = 8};
struct Node
{
unsigned n;
unsigned tags[MAX_TAGS_PER_NODE];
unsigned next;
};
Array<Node> nodes;``````

This is just as before, except we have more than one tag per node. This gives better cache performance because we can now process eight tags at a time before we have to follow a next pointer and jump to a different memory location. Memory use can also be better. If the nodes are full, we are using 80 % of the memory for actual tags, rather than 50 % as we had before.

However, if the nodes are not full we could be wasting even more memory than before. If entities have three tags on average, then we are only using 30 % of the memory to store tags.

We can balance cache performance and memory use by changing `MAX_TAGS_PER_NODE`. Increasing it gives better cache coherence, because we can process more tags before we need to jump to a different memory location. However, increasing it also means more wasted memory. It is probably good to set the size so that "most entities" fit into a single node, but a few special ones (players and enemies maybe) need more.

One interesting thing to note about the cache misses is that we can get rid of them by sorting the nodes. If we sort them so that the nodes in the same `next` chain always appear directly after one another in the array, then walking the list will access the data linearly in memory, just as if we were accessing an array:

``````--------------------------------------------------
|  A1 --|--> A2 --|--> A3 |  B  |  C1 --|--> C2  |
--------------------------------------------------``````

Note that a complete ordering is not required, it is enough if the linked nodes end up together. Single nodes, such as the `B` node above could go anywhere.

Since these are dynamic lists where items will be added and removed all the time, we can't really do a full `O(n log n)` sort every time something changes. That would be too expensive. But we could sort the list "incrementally". Every time the list is accessed, we do a little bit of sorting work. As long as the rate of mutation is low compared to the rate of access, which you would expect in most circumstances, our sorting should be able to keep up with the mutations and keep the list "mostly sorted".

You would need a sorting algorithm that can be run incrementally and that works well with already sorted data. Two-way bubble sort perhaps? I haven't thought too deeply about this, because I haven't implemented this method in practice.

## Custom memory allocator

Another option is to write a custom memory allocator to divide the bigger buffer up into smaller parts for memory allocations.

You might think that this is a much too complex solution, but a custom memory allocator doesn't necessarily need to be a complex thing. In fact, both the fixed size and linked list approaches described above could be said to be using a very simple kind of custom memory allocator: one that just allocates fixed blocks from an array. Such an allocator does not need many lines of code.

Another criticism against this approach is that if we are writing our own custom memory allocator, aren't we just duplicating the work that `malloc()` or `new` already does? What's the point of first complaining a lot about how problematic the use of `malloc()` can be and then go on to write our very own (and probably worse) implementation of `malloc()`?

The answer is that `malloc()` is a generic allocator that has to do well in a lot of different situations. If we have more detailed knowledge of how the allocator is used, we can write an allocator that is both simpler and performs better. For example, as seen above, when we know the allocations are fixed size we can make a very fast and simple allocator. System software typically uses such allocators (check out the slab allocator for instance) rather than relying on `malloc()`.

In addition, we also get the benefit that I talked about in the previous post. Having all of a system's allocations in a single place (rather than mixed up with all other `malloc()` allocations) makes it much easier to reason about them and optimize them.

As I said above, the key to making something better than `malloc()` is to make use of the specific knowledge we have about the allocation patterns of our system. So what is special about our `vector < vector < unsigned > >` case?

1. There are no external pointers to the data.

All the pointers are managed by the `TagComponent` itself and never visible outside that component.

This means that we can "move around" memory blocks as we like, as long as the `TagComponent` keeps track of and updates its data structures with the new locations. So we don't have to worry (that much) about fragmentation, because when we need to, we can always move things around in order to defrag the memory.

I'm sure you can build something interesting based on that, but I actually want to explore another property:

2. Memory use always grows by a factor of two.

If you look at the implementation of `std::vector` or a similar class (since STL code tends to be pretty unreadable) you will see that the memory allocated always grows by a factor of two. (Some implementations may use 1.5 or something else, but usually it is 2. The exact figure doesn't matter that much.)

The `vector` class keeps track of two counters:

• `size` which stores the number of items in the `vector` and
• `capacity` which stores how many items the `vector` has room for, i.e. how much memory has been allocated.

If you try to push an item when `size == capacity`, more memory is needed. So what typically happens is that the `vector` allocates twice as much memory as was previously used (`capacity *= 2`) and then you can continue to push items.

This post is already getting pretty long, but if you haven't thought about it before you may wonder why the `vector` grows like this. Why doesn't it grow by one item at a time, or perhaps 16 items at a time.

The reason is that we want `push_back()` to be a cheap operation -- O(1) using computational complexity notation. When we reallocate the vector buffer, we have to move all the existing elements from the old place to the new place. This will take O(n) time. Here, n is the number of elements in the vector.

If we allocate one item at a time, then we need to allocate every time we push and since re-allocate takes O(n) that means push will also take O(n). Not good.

If we allocate 16 items at a time, then we need to allocate every 16th time we push, which means that push on average takes O(n)/16, which by the great laws of O(n) notation is still O(n). Oops!

But if we allocate 2*n items when we allocate, then we only need to reallocate after we have pushed n more items, which means that push on average takes O(n)/n. And O(n)/n is O(1), which is exactly what we wanted.

Note that it is just on average that push is O(1). Every n pushes, you will encounter a push that takes O(n) time. For this reason, push is said to run in amortized constant time. If you have really big vectors, that can cause an unacceptable hitch and in that case you may want to use something other than a `vector` to store the data.

Anyways, back to our regular programming.

The fact that our data (and indeed, any kind of dynamic data that uses the `vector` storage model) grows by powers of two is actually really interesting. Because it just so happens that there is an allocator that is very good at allocating blocks at sizes that are powers of two. It is called the buddy allocator and we will take a deeper look at it in the next post.

1. While reading this and the previous post a question came to my mind...
Do you design your engine with an assumption that components are an internal concept, or do you allow game-specific components?
DragonComponent, or just "beast" tag in TagComponent and "dragon.fire_power" in DataComponent?

2. The system is extensible so games will be able to create their own C components.

But there will also be "flexible" components, like DataComponent, ScriptComponent and FlowComponent that you can use to implement dynamic game-specific behaviour without having to write your own component on the C side.

1. Pure C, or did something eat two pluses?

2. Our plugin interface is C based (C++ does not have a stable ABI), but of course the C callbacks in your DLL could call out to C++ functions.

3. I know that probably you can't use modern C++ in your work, but you could check out my `multivector` class, that implements a dynamic struct of arrays, providing vector-like programming interface. I'm interested in your opinions.

https://github.com/cubuspl42/Nete/blob/cf654bfc88875b97f1e31ffcf7a6987cbdc1e663/tests/nete_tl_tests.cpp

1. I'm not a huge fan of modern C++. It's such an extremely complicated langauge. I find it hard to read, hard to debug and the compile errors are obtruse.

For me, when I try to achieve a specific low-level effect, all these C++ abstractions get in the way. I find it hard to know if the compiler is really optimizing everything properly the way I expect wihtout checking the assembly output. Something is wrong if you find yourself reading assembly code when you are programming in a supposedly high-level langauge.

It's too easy to miss something and get bitten by it. For example we had a significant performance hit because we used vector for temporary buffers, and the char's were being initialized on every resize.

To me, being C like and using direct pointer manipulations is simpler, faster and less error prone.

2. Thank you very much for your reply. I agree with you, abstraction often get into programmer's way and the code of classes like `std::vector` (or my `multivector`) is relatively complicated, due to its nature. On the other hand, code like this:

char *buffer = allocate(capacity * (sizeof(unsigned) + sizeof(DataType) + sizeof(Value));
keys = (unsigned *)buffer;
types = (DataType *)(keys + capacity);
values = (Value *)(types + capacity);

is quite readable... but isn't it bug-prone? Every class that needs struct-of-arrays storage needs to write this again and again. And how does this code handle alignment? What if you need dynamic expansion? If it sums up, wouldn't it justify writing an universal template class for that? It could have an option for not initializing trivial types, too. Aren't consoles' outdated compilers the real reason for not writing such classes?

3. Yes, it is bug-prone boiler plate code. It sucks. It does not make me happy.

But at least it is transparent. If there is a bug I can look at it and quite quickly figure out what is going on. The performance characteristics are plain to see. If it is slow I can see why and fix it. To me readability and simplicity are the most important things, because it makes the code easy to work with. And code is never static, there are always new bugs, new features, new performance improvements, new hardware characteristics, new compilers, new things to push.

Today you have to add initialization skipping to your class. Tomorrow maybe you have to add serialization/deserialization, or "reusing" slots so you can delete entries without changing the indices of other entries... or something else.

I don't believe in creating "perfect" library classes that cover every use case but are unreadable by anyone other than C++ experts. I've seen horrorshows like Singleton classes that have six boolean "traits" for things like is it thread-safe or not, etc. Congratulations you have taken a very simple thing (singleton) and made a complex monstrosity out of it. Instead, I believe in things that are "hackable".

My dream language would be C with a small & simple template engine on top so you could avoid boiler plate like this and extend the langauge freely. But sadly, that dream is not C++.

4. This comment has been removed by the author.

5. As expert plumbing technicians, we are the finest response to your query for ‘Plumbers Near Me’. Speak to us and leave your plumbing worries to us for the best care.

6. Plumbers On Call provides best services for your queries with Plumbers In My Area. We are active all day to provide plumbers at your homes! Visit Plumbers On

7. Is your work getting hampered by the occurrence of Samsung Printer Offline error message? Contact our technical experts to resolve this issue instantly!

8. Qucikbook provides accounting software to the users where you can connect with quickbooks software from any where with just an internet connection at Quickbooks online.if you have any query you can contact with us our Quickbook support team available 24*7.Quickbooks Support

9. I will be looking forward to your next post. Thank you
jacksondiscgolf.org/
airsoftdynamics.net/

10. Pretty nice post. I just stumbled upon your weblog and wanted to say that I have really enjoyed browsing your blog posts. After all I’ll be subscribing to your feed and I hope you write again soon!สล็อตออนไลน์

11. Today, I was just browsing along and came upon your blog. Just wanted to say good blog and this article helped me a lot, due to which I have found exactly I was looking.สล็อต 999

12. This was really an interesting topic and I kinda agree with what you have mentioned here!สล็อตวอเลท

13. I just want to let you know that I just check out your site and I find it very interesting and informative..สล็อตแตกง่าย

14. Interesting topic for a blog. I have been searching the Internet for fun and came upon your website. Fabulous post. Thanks a ton for sharing your knowledge! It is great to see that some people still put in an effort into managing their websites. I'll be sure to check back again real soon.บา คา ร่า วอ เลท

15. This article gives the light in which we can observe the reality. This is very nice one and gives indepth information. Thanks for this nice article.บา คา ร่า วอ เลท

16. I high appreciate this post. It’s hard to find the good from the bad sometimes, but I think you’ve nailed it! would you mind updating your blog with more information?สล็อตเว็บใหญ่

17. You have a good point here!I totally agree with what you have said!!Thanks for sharing your views...hope more people will read this article!!!บา คา ร่า วอ เลท

18. This is very educational content and written well for a change. It's nice to see that some people still understand how to write a quality post.!สล็อต ฝาก-ถอน true wallet ไม่มี บัญชีธนาคาร

19. Most of the time I don’t make comments on websites, but I'd like to say that this article really forced me to do so. Really nice post!บาคาร่าวอเลท

20. I can’t believe focusing long enough to research; much less write this kind of article. You’ve outdone yourself with this material without a doubt. It is one of the greatest contents.สล็อตแตกง่าย

21. This comment has been removed by the author.

22. Please continue this great work and I look forward to more of your awesome blog posts.สล็อตxo

23. I am very happy to discover your post as it will become on top in my collection of favorite blogs to visit.เว็บสล็อต

24. You’ve got some interesting points in this article. I would have never considered any of these if I didn’t come across this. Thanks!.เว็บตรงสล็อต

25. It was a very good post indeed. I thoroughly enjoyed reading it in my lunch time. Will surely come and visit this blog more often. Thanks for sharing.เว็บ ตรง

26. Thanks for sharing this quality information with us. I really enjoyed reading. Will surely going to share this URL with my friends.เกมสล็อต

27. Hello, i think that i saw you visited my website thus i came to “return the favor”.I’m trying to find things to enhance my
site!I suppose its ok to use a few of your ideas!!
บาคาร่า

28. บาคาร่าเป็น เกมคาสิโนในต้นแบบหนึ่งยอดนิยมสูงสุด บาคาร่า
สำหรับในการเล่นเกมคาสิโน บางบุคคลบางครั้งอาจจะกล่าวว่ามีแม้กระนั้นคนถูกใจเล่นแต่ว่าเกมสล็อตออนไลน์ แม้กระนั้นพวกเราขอบอกเลยว่าเกมบาคาร่า นี้แหละที่ทำให้ขาพนันเป็นคนมั่งมีหลายต่อหลายท่านแล้ว รวมทั้งบาคาร่าก็เป็นเกมพนันที่ได้เข้าไปอยู่ในระบบคาสิโนอีกด้วย ก็ยิ่งทำให้ขาพนันเลือกที่จะเล่นบาคาร่าออนไลน์กันมากมาย เนื่องจากว่าเหตุที่ระบบ คาสิโนนั้น คือระบบที่มีความยอดเยี่ยมสำหรับการเก็บรวบรวมเกมพนันที่ขาพนันถูกใจที่สุดมากมายองเอาไว้ในที่ที่เดียว โดยที่ขาพนันจะสะดวกที่สุดสำหรับในการเล่นแต่ละครั้ง กล่าวอีกนัยหนึ่ง แม้ขาพนันเบื่อบาคาร่าแล้ว ขาพนันก็สามารถที่จะไปสู่ปากทางเข้าเกมพนันอื่นๆได้เลย โดยไม่ต้องเสียเวล่ำเวลาเปิดปิดเว็บเป็นประจำ
บาคาร่าที่มีในตอนนี้ เป็นเกมพนันที่มีระบบระเบียบเยอะที่สุดแล้ว
บาคาร่าออนไลน์เป็นเกมพนันที่ขาพนันจำต้องให้การเห็นด้วยเลยว่าเป็นเกมพนันที่มีขั้นมีตอนมากที่สุดเมื่อได้เข้ามาสู่ระบบ คาสิโนออนไลน์ด้วยเหตุว่าระบบ คาสิโนออนไลน์คือระบบที่ขาพนันสามารถที่จะเล่นเกมพนันอื่นๆได้ ทั้งยัง เกมบาคาร่าเองก็มีทางเข้าที่เข้าทางชัดแจ้งอีกด้วย แม้ว่าจะมีเกมพนันอื่นๆรวมอยู่ด้วยก็ตาม แต่ว่าปากทางเข้าก็มีการจัดอย่างมีระบบกฎระเบียบที่สุด โดยเหตุนี้ ถ้าหากขาพนันเข้ามาในระบบ คาสิโนออนไลน์เกมบาคาร่าแล้ว เป็นไปไม่ได้เลยที่ขาพนันจะเข้าสู่ปากทางเข้าของเกมบาคาร่าอย่างยากเย็นแสนเข็ญ
บาคาร่ายิ่งเล่นก็ยิ่งได้
ขาพนันคนไม่ใช่น้อยอาจจะเป็นไปได้ว่าจะมีปัญหาว่า บาคาร่าเป็นอย่างไร ซึ่งในที่นี้พวกเราขอชี้แจงอย่างสั้นๆให้ฟังว่าเป็นเกมพนันที่เกี่ยวกับไพ่โดยการนับแต้มในไพ่นั่นเอง รวมทั้งการเล่นก็มีการพนันเพียงแต่สองฝั่งเพียงแค่นั้นหมายถึงฝั่งผู้พนันรวมทั้งฝั่งเจ้ามือ ซึ่งขาพนันจะเลือกเล่นพนันฝั่งไหนก็ได้ตามความพร้อมใจ แม้กระนั้นจำนวนมากแล้วขาพนันก็จะเลือกฝั่งที่เป็นเจ้ามือ เพราะเหตุว่าสามารถที่จะกลับเกมพนันได้ง่ายๆนั่นเอง
จะมองเห็นได้ว่า บาคาร่าออนไลน์เป็นเกมพนันอย่างหนึ่งที่มีความง่ายเป็นอย่างมากสำหรับการเล่น เพียแค่นับแต้มที่ไพ่ขาพนันก็จะได้ลุ้นแล้วว่าขาพนันเองเป็นผู้ชนะหรือแพ้พนันนั้น ทั้ง บาคาร่านี้ก็เป็นเกมพนันที่เรียกได้เลยว่ามีความยอดฮิตถึงที่สุดแล้วในเกมพนันคาสิโนออนไลน์ โดยมีความที่ได้รับความนิยมสำหรับเพื่อการเล่นมากยิ่งกว่าเกมสล็อตออนไลน์ เกมยิงปลาออนไลน์ หรือเกมรูเล็โคนอนไลน์เสียอีก