Tuesday, September 21, 2010

Custom Memory Allocation in C++

For console development, memory is a very precious resource. You want good locality of reference and as little fragmentation of possible. You also want to be able to track the amount of memory used by different subsystems and eliminate memory leaks. To do that, you want to write your own custom memory allocators. But the standard ways of doing that in C++ leave a lot to be desired.

You can override global new and replace it with something else. This way you can get some basic memory tracking, but you still have to use the same allocation strategy for all allocations, which is far from ideal. Some systems work better with memory pools. Some can use simple frame allocation (i.e., pointer bump allocation).  You really want each system to be able to have its own custom allocators.

The other option in C++ is to override new on a per class basis. This has always has seemed kind of strange to me. Pretty much the only thing you can use it for are object pools. Global, per-class object pools. If you want one pool per thread, or one pool per streaming chunk -- you run into problems.

Then you have the STL solution, where containers are templated on their allocator, so containers that use different allocators have different types. It also has fun things such as rebind(). But the weirdest thing is that all instances of the allocator class must be equivalent. So you must put all your data in static variables. And if you want to create two separate memory pools you have to have two different allocator classes.

I must admit that every time I run into something in STL that seems completely bonkers I secretly suspect that I have missed something. Because obviously STL has been created by some really clever people who have thought long and hard about these things. But I just don't understand the idea behind the design of the custom allocator interface at all. Can any one explain it to me? Does any one use it? Find it practical? Sane?

If it weren't for the allocator interface I could almost use STL. Almost. There is also the pretty inefficient map implementation. And the fact that deque is not a simple ring buffer, but some horrible beast. And that many containers allocate memory even if they are empty... So my own version of everything it is. Boring, but what's a poor gal gonna do?

Back to allocators. In conclusion, all the standard C++ ways of implementing custom allocators are (to me) strange and strangely useless. So what do I do instead? I use an abstract allocator interface and implement it with a bunch of concrete classes that allocate  memory in different ways:


class Allocator
{
public:
    virtual void *allocate(size_t size, size_t align) = 0;
    virtual void deallocate(void *p) = 0;
    virtual size_t allocated_size(void *p) = 0;
}


I think this is about as sane as an allocator API can get. One possible point of contention is the allocated_size() method. Some allocators (e.g., the frame allocator) do not automatically know the sizes of their individual allocations, and would have to use extra memory to store them. However, being able to answer questions about allocation sizes is very useful for memory tracking, so I require all allocators to provide that information, even if it means that a frame allocator will have to use a little extra memory to store it.

I use an abstract interface with virtual functions, because I don't want to template my classes on the allocator type. I like my allocators to be actual objects that I can create more than one of, thank you very much. Memory allocation is expensive anyway, so I don't care about the cost of a virtual function call.

In the BitSquid engine, you can only allocate memory through an Allocator object. If you call malloc or new the engine will assert(false).

Also, in the BitSquid engine all allocators keep track of the total number of allocations they have made, and the total size of those allocations. The numbers are decreased on deallocate(). In the allocator destructor we assert(_size == 0 && _allocations == 0) and when we shut down the application we tear down all allocators properly. So we know that we don't have any memory leaks in the engine. At least not along any code path that has ever been run.

Since everything must be allocated through an Allocator, all our collection classes (and a bunch of other low-level classes) take an Allocator & in the constructor and use that for all their allocations. Higher level classes either create their own allocator or use one of the globals, such as memory_globals::default_allocator().

With this interface set, we can implement a number of different allocators. A HeapAllocator that allocates from a heap. A PoolAllocator that uses an object pool. A FrameAllocator that pointer bumps. A PageAllocator that allocates raw virtual memory. And so on.

Most of the allocators are set up to use a backing allocator to allocate large chunks of memory which they then chop up into smaller pieces. The backing allocator is also an Allocator. So a pool allocator could use either the heap or the virtual memory to back up its allocations.

We use proxy allocators for memory tracking. For example, the sound system uses:


ProxyAllocator("sound", memory_globals::default_allocator());


which forwards all allocations to the default allocator, but keeps track of how much memory has been allocated by the sound system, so that we can display it in nice memory overviews.

If we have a hairy memory leak in some system, we can add a TraceAllocator, another proxy allocator which records a stack trace for each allocation. Though, truth be told, we haven't actually had to use that much. Since our assert triggers as soon as a memory leak is introduced, and the ProxyAllocator tells us in which subsystem the leak occurred, we usually find them quickly.

To create and destroy objects using our allocators, we have to use placement new and friends:


void *memory = allocator.allocate( sizeof(MyClass), alignof(MyClass) );
MyClass *m = new (memory) MyClass(10);

if (m) {
    m->~MyClass();
    allocator.deallocate(m);
}


My eyes! The pain! You certainly don't want to type or read that a lot. Thanks C++ for making my code so pretty. I've tried to make it less hurtful with some template functions in the allocator class:


class Allocator
{
    template <class T, class P1> T *make_new(const P1 &p1) {return new (allocate(sizeof(T), alignof(T))) T(p1);}

    template <class T> void make_delete(T *p) {
        if (p) {
            p->~T();
            deallocate(p);
        }
    }


Add a bunch of other templates for constructors that take a different number of arguments that can be const or non-const and now you can at least write:


MyClass *m = allocator.make_new<MyClass>(10);

allocator.make_delete(m);


That's not too bad.

One last interesting thing to talk about. Since we use the allocators to assert on memory leaks, we really want to make sure that we set them up and tear them down in a correct, deterministic order. Since we are not allowed to allocate anything without using allocators, this raises an interesting chicken-and-egg problem: who allocates the allocators? How does the first allocator get allocated?

The first allocator could be static, but I want deterministic creation and destruction. I don't want the allocator to be destroyed by some random _exit() callback god knows when.

The solution -- use a chunk of raw memory and new the first allocator into that:


char _buffer[BUFFER_SIZE];

HeapAllocator *_static_heap = 0;
PageAllocator *_page_allocator = 0;
HeapAllocator *_heap_allocator = 0;

void init()
{
    _static_heap = new (_buffer)
        HeapAllocator(NULL, _buffer + sizeof(HeapAllocator), BUFFER_SIZE - sizeof(HeapAllocator));
           
    _page_allocator = _static_heap->make_new<PageAllocator>("page_allocator");
    _heap_allocator = _static_heap->make_new<HeapAllocator>("heap_allocator", *_page_allocator);
    ...
}

void shutdown()
{
    ...
    _static_heap->make_delete(_heap_allocator);
    _heap_allocator = 0;
   
    _static_heap->make_delete(_page_allocator);
    _page_allocator = 0;
   
    _static_heap->~HeapAllocator();
    _static_heap = 0;
}


Note how this works. _buffer is initialized statically, but since that doesn't call any constructors or destructors, we are fine with that. Then we placement new a HeapAllocator at the start of that buffer. That heap allocator is a static heap allocator that uses a predefined memory block to create its heap in. And the memory block that it uses is the rest of the _buffer -- whatever remains after _static_heap has been placed in the beginning.

Now we have our bootstrap allocator, and we can go on creating all the other allocators, using the bootstrap allocator to create them.

69 comments:

  1. STL allocators were probably never designed for custom allocation techniques (e.g. pools), but rather to abstract the memory model (Stepanov mentions allowing persistent memory models). They seem to mainly have been pushed by external parties, and Stepanov himself says they are pretty flawed semantically (can't do find(a.begin(), a.end(), b[1]); where a and b are vectors with different allocators for example).

    http://www.sgi.com/tech/stl/drdobbs-interview.html
    http://www.stlport.org/resources/StepanovUSA.html

    ReplyDelete
  2. Read the paper:

    http://www.google.com/search?q=reconsidering+custom+memory+allocation

    ReplyDelete
  3. malte: Ah, I see, it was mostly a way to support Win16 and segmented memory. Makes sense. And then people started using them for memory pools and what not because "they were there".

    ReplyDelete
  4. lionet: Of course, one of the allocators in the system should be DougLeaAllocator (in fact, that is exactly what our HeapAllocator is). And if you are using some other allocator because you think it is faster / less fragmented / etc than the HeapAllocator you should performance test. Optimizations should always be based on real world data.

    ReplyDelete
  5. Thanks for another great post, Niklas. Was thinking about memory allocators for my engine/game and now I have a solution.

    ReplyDelete
  6. Nice post! Some thoughts/questions:

    1 - If your template methods (make_new, make_delete, etc) do not get inlined you will end up with extra functions for each object you have, what appears to be a waste of memory.

    2 - Why don't you change your make_delete to receive a double pointer **T, and enforce the *T = 0 inside of the method?

    3 - Using your TraceAllocator are you able to trace the file and line number of memory leaks?

    ReplyDelete
  7. 1 - But wouldn't it waste even more memory if it did get inlined? The code has to be there one way or another. It's not a memory cost that I really worry about.

    2 - You could do that if you like to always have your pointers nulled after delete.

    3 - Yes, by storing and keeping track of a stack trace for each allocation and then looking those traces upp in the PDB.

    ReplyDelete
  8. Actually STL was not very well thought through, it was a research project that was standardized in a very short amount of time. The ideas and principles of generic programming are very well thought through and very sane, STL is not :) Nor the C++ features that "supports" GP.

    http://en.wikipedia.org/wiki/Standard_Template_Library#History

    ReplyDelete
  9. Greedings a noob question: how do u handle deallocation of data.You just memset them to NULL
    and defrag the rest of the data (reallocate everything)?Or data are never deleted and u work with an index scheme?

    ReplyDelete
  10. The deallocation scheme is up to the allocator. The heap allocator does normal heap deallocation, etc.

    None of the allocators I've talked about in the article are handle based and do automatic defragmentation. You could add such an allocator to the system, but it is kind of a separate topic.

    ReplyDelete
  11. Great post! It's helped me with a few details I was fuzzy on.

    I have a couple of questions about your allocation tracking though. You mentioned above that you record the stack trace for each allocation. It seems you've decided not to use the __FILE__ and __LINE__ macros. I guess that would mean you'd have to wrap every allocation call with a macro to do it that way.

    How are you recording the stack trace? Writing to an external file? It would seem that this sort of tracking would be a big hit on performance and isn't enabled during regular development, no?.

    ReplyDelete
  12. You are right, I don't use __FILE__ and __LINE__. The main reason is that they don't give enough information in many cases. For examples, all allocations in the Vector class would show up as vector.inl:72 or something like that, which doesn't give any information about what system is actually leaking memory.

    I record the information in memory (as pointers, so it is just 4 bytes for each stack trace entry) using a special debug allocator. All debug allocations (profiler data, stack traces, etc) go through that allocator, so I always know how much memory is used for debugging and how much is used by the "real game" -- another advantage of the allocator system.

    It is a hit on performance, so it is not enabled during regular development. When I use it, I usually enable it just for the particular allocator that I know is leaking, for example the "sound" allocator, if the shutdown test has shown that that allocator is leaking. That way the game runs at nearly full speed, since only a small percent of the total allocations are traced.

    ReplyDelete
  13. What if the concrete allocator must take some additional params? For example, the stack allocator may take additional argument on which side to allocate (when double ended). Then the code will need to know which allocator uses.

    ReplyDelete
  14. In that case you could just create two different allocators that allocate from each end of the stack.

    ReplyDelete
  15. This is a really nice post and it has been very helpful!

    I just have a real noob question: You have disallowed the use of new and malloc() so how do you get your chunks of raw memory?

    If you use byte arrays as above, how do you check that the allocation was successful and that you have not run out of memory? I mean there is no way for the program to report an error in allocating a static array, right?

    ReplyDelete
  16. I get memory directly from the OS. For example on Windows I use VirtualAlloc.

    ReplyDelete
  17. Great post!! Just wondering how do you handle the main mem mapping for the RSX?

    ReplyDelete
  18. Thanks for the great post, very interesting read. Couple of questions:

    1. How do you control internal STL allocations? You said you avoided creating custom STL allocators, but how do you ensure that any memory dynamically allocated inside the STL (Grow() etc) go through your allocators?

    2. Do you have any recommended books/links on the types of allocators you've mentioned here (HeapAllocator, FrameAllocator, PageAllocator, PoolAllocator)?

    ReplyDelete
  19. @Jack

    1. We don't use STL. We use our own collector classes. They are quite similar to the STL classes, but they take a reference to an allocator interface in their constructors. They use that interface for all their memory allocations.

    2. I've picked up information here and there. You should be able to find some pointers by just googling "memory allocations". Some more detailed information:

    HeapAllocator - An allocator that allocates varied sized blocks and keeps an in-place linked list of free blocks. Look at dlmalloc, it is pretty much the standard allocator.

    FrameAllocator - Also called "arena allocator" or "pointer bump allocator". An allocator that doesn't deallocate individual blocks but releases all its memory in one go. That means it has a super simple internal state (just a pointer to the next free byte and the remaining bytes of free memory).

    PageAllocator - The virtual memory allocator provided by the system. Allocates memory as a number of "pages". The page size is OS dependent (typically something like 1K -- 64K). You don't want to use it for small allocations, but it can be used as a "backing allocator" for any of the other allocators. Google "virtual memory" and you should find lots of information.

    PoolAllocator - An allocator that has pools for allocations of certain sizes. For example, one pool may only allocate 16 byte blocks. The pool can be packed tight in memory, you don't need headers and footers for the blocks, since they are always the same size. This saves memory and reduces fragmentation and allocation time.

    ReplyDelete
  20. Thanks for the info Niklas.

    Been powering through this blog for the last few days, some great posts and resources. Thanks for sharing!

    ReplyDelete
  21. Very interesting. Will definitely be using a similar approach on my next project.

    ReplyDelete
  22. Nice article, I used it to build my own Allocator but I run into trouble with allocating arrays of objects. I made a helper method to my Allocator like the following:

    template T *make_new_array(size_t length) {
    return new (allocate(sizeof(T) * length, alignof(T))) T[length];
    }

    The problem with that is: If T has an destructor, placement new will write 'length' into the first 4 bytes and returns the allocated memory address offsetted by 4 bytes. This means it actually writes beyond the allocated buffer while initializing the objects. Furthermore I can't use the returned pointer to deallocate the memory again, since it is not the original one returned by the allocate method.
    How did you solve this? Manually calling regular placement new on each element?
    Thanks

    ReplyDelete
  23. Like Alexander solved the Gordian knot. I just don't use new[] and delete[]. I use a Vector<> instead.

    Or sometimes (when I don't care about constructors, which is most of the time) a plain C array with the size kept externally, i.e.:

    int len = 100;
    int *a = allocate(sizeof(int) * len);

    ReplyDelete
  24. @Serge

    For the RSX mapped memory we first allocate a couple of MB from the PageAllocator for that region and map it with cellGcmMapMainMemory().

    Then we create a HeapAllocator to service allocations of that memory. (Our heap allocator constructor can take a pointer and size to a memory area and will then construct the heap in that region.)

    ReplyDelete
  25. If you are using a templated "new" function you will run into the template forwarding problem at some point.

    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n1385.htm

    There's no nice way around it. In the past I've restricted parameters to value, const reference or pointers to work around it.

    C++11 will fixes it.

    ReplyDelete
  26. True. We use both const and non-const argument references so we run into the 2^N combinatorial explosion. We have unrolled that to up to three arguments... if you need more than that you have to write placement new by hand, rather than using make_new.

    ReplyDelete
  27. I'm experimenting with this allocator strategy, rolling my own container and such.
    I'm concerned by the global override of operator new/delete. While I understand that it's needed to ensure that everything is under control, by doing so we also forbid the use of any external library (or 99% of them) in their original state.
    E.g. I use UnitTest++ and it does allocate using new.
    Did you really don't use anything external ?

    I humbly thank you for this blog and the time spent for knowledge sharing.

    ReplyDelete
  28. No, we don't use anything external that uses new() and delete(). We use some external stuff, but we only use stuff that let's us customize the memory allocation. That's the way it should be, I think. Any third party library intended for high performance use should let you supply your own memory allocators.

    ReplyDelete
  29. Wow, I'm impressed ...
    Still a long way to go, it seems ^^
    Just for my curiosity, did you write a unit test library or did you pass on that ?

    ReplyDelete
  30. Very nice approach at all Niklas!
    But how have you implemented your HeapAllocator?
    Are you simply using malloc()/free() internally or have you simply copied the full source from dlmalloc()? I'm asking because I don't see a reason why reinventing the wheel and not simply using malloc()/free() in this allocator?

    ReplyDelete
  31. We are using dlmalloc. Using that instead of standard malloc() free() means we have more insight into and control over how the heap allocator grabs system memory. So we can make sure that it "plays nice" with our other allocators and we can write functions that given a pointer can tell us if it was allocated by the heap or not.

    It also allows us to have multiple heaps. For example, we have a special heap just for debug allocations (such as profiler data) so that we can easily ignore them when we check if we are within the memory budget.

    ReplyDelete
  32. Thanks for this great explanation!
    I think I will also try to wrap the dlmalloc() through template heap layers.
    It's really nice to have such control over your memory budget!

    ReplyDelete
  33. I am still confused about how you handle arrays of objects. You say you use Vector<> (I assume that this is your own custom container) How does this allocate the necessary memory as well as call the constructors for each of the objects in the array?

    And thank you so much for posting this!

    ReplyDelete
  34. It is no great mystery, memory allocated with (for example):

    data = (T *)_allocator->allocate( sizeof(T) * capacity, alignof(T))

    And then in-place new is used to call the constructors.

    for (unsigned i=0; i<size; ++i)
    new (data + i) T();

    ReplyDelete
  35. Are your allocators thread safe? If the answer is yes, what is your prefered method to achive that?

    ReplyDelete
  36. It is up to each class implementing the Allocator interface. Most of our allocators are thread-safe, because that makes them easier and safer to use, but we have some special allocators that faster but not thread-safe.

    It is implemented with critical sections protecting the access to the shared data.

    ReplyDelete
  37. Nice article! I noticed your heap_allocator is using page_allocator as backend. my question is how page_allocator was implemented? Does it allocated a big chunk at the begin and give memory to heap_allocator when requested, and only VirtualFree the chunk in the end? Also did you also redirect dlmalloc's MMAP to the page_allocator?

    ReplyDelete
  38. No, the page allocator, allocates pages as requested and returns them when done. It doesn't grab a big chunk at startup. It tries to be friendly to other processes/alloators that might be using the VM system at the same time.

    Yes for heaps that can grow, mmap() is redirected to the page allocator. (Actually, any allocator that supports the Allocator interface can be used as backing allocator.)

    ReplyDelete
  39. This is a very interesting article. Will definitely have to look into it further. Thanks for sharing.

    There is one thing i am thinking of adding to your suggestions though; adding an overload of the new operator that takes an allocator as a parameter.

    void* operator new (std::size_t size, Allocator & a) throw (std::bad_alloc);
    void operator delete (void* ptr, Allocator & a) throw ();

    With assert of global new and proper handling of delete, i still get all of the benefits of allocator usage along with all of the syntactic sugar the new and delete keywords provide. Also alleviates the need for multiple template parameters make_new uses as the syntax is the same as it would have been except for:

    HeapAllocator my_allocator;

    MyClass* mc = new (my_allocator) MyClass();

    Still enforces the allocator usage, and i can hide it internally if desired. Even if it wasn't hidden, someone using it would still incur correct behavior.

    Still toying with the idea though. Since i allow placement new, doesn't seem too bad to allow this for my own internal usage, but will have to see if i will keep with that line of thought.

    Now i just need to go research more about the actual implementations of the various types of allocators.

    ReplyDelete
  40. Yes, it has its advantages. The drawback is that you can't do the same for delete (no way of passing parameters), so you end up with different syntax for new and delete.

    ReplyDelete
  41. Why no realloc() support for Allocator?

    ReplyDelete
  42. I think realloc() is kind of weird. It is a non-deterministic optimization. If you are really lucky and their happens to be some free space to the right of the memory buffer, then realloc() can be faster than alloc + copy + delete. But you can't really do anything to make sure that that happens.

    So you could easily add realloc support (with a default implementation of alloc + copy + delete for allocators that don't have a faster path). But to me it isn't that useful. I much prefer deterministic optimizations.

    ReplyDelete
    Replies
    1. I get the determinism argument, but not sure if it is strong enough to out-weight benefits of successful realloc(). Also, I would not provide default realloc in terms of alloc+copy+del but rather return null from realloc() so that a+c+d would be a next explicit step so that it is always obvious if optimization happened or not.

      Delete
    2. If I need a buffer that can grow without reallocating memory I use some other technique, such as a linked chain of fixed size buffers (perhaps 8K each) that I merge as a final step. That way I am sure I don't have to copy the data unnecessarily, regardless of what the memory system decides to do.

      Delete
    3. But isn't that what Lea's allocator is doing already? I can see your point tho: you want to have full control always. Two more questions: 1) Do you use HeapCreate/Alloc on win to have different OS heaps for different systems/resources? 2) How do you handle dtor calling with allocators such as FrameAllocator where memory might be "freed" without client's explicit free() call? And sorry for not saying it with first comment: great read! :)

      Delete
    4. 1) No, I don't use OS heaps. All heap allocation is done by dlmalloc. I use the OS page allocator (VirtualAlloc) as a backing allocator for dlmalloc.

      2) If you "delete" an object in a FrameAllocator, the destructor will be called, but memory will not be freed (all memory is freed when the FrameAllocator is destroyed). When you destroy the FrameAllocator, all memory is released, but no destructors are called... So if you want destructors to be called you must call delete (the same as if you use any other allocator). With the frame allocator though, you can choose to *not* call delete, if you know that the destructor doesn't do anything useful. That should not be the normal code path though, it should be reserved for "special optimizations".

      Delete
  43. Hi Niklas, I see your heap allocator can take a pointer to a memory area and construct the heap in that region, you also said your heap allocator uses dlmalloc, how you managed to tell dlmalloc what memory region it should use?

    ReplyDelete
    Replies
    1. I use create_mspace_with_base()

      Delete
  44. Since you just did: http://www.altdevblogaday.com/2012/05/05/embracing-dynamism

    what is your alloction strategy for Lua states?

    ReplyDelete
    Replies
    1. We use a special allocator for Lua, so that we can track all Lua allocations (and optimize for Lua allocation patterns if needed).

      We use lightuserdata for most objects exported from C to Lua to minimize the load on the garbage collector.

      We use an incremental garbage collector where we spend a certain ammount of time every frame to collect. We dynamically adapt the time we spend collecting to the ammount of garbage we generate so that the system settles down in a "steady state" with a fixed garbage collection time per frame and a fixed garbage memory overhead.

      Delete
  45. Could you give more details about how to get the deallocated size from the ptr?

    ReplyDelete
    Replies
    1. It is up to the allocator to track the size of the allocated areas. Each allocator does it differently.

      For example, dlmalloc adds some extra memory before each allocated memory block where it stores a header with that information.

      A slot based allocator typically allocates slots one "chunk" at a time. In the chunk header it can then store information about the size of the slots in the chunk. If chunks are always say 8K big and aligned at 8K you can round down a slot pointer to the nearest 8K aligned address to get to the chunk header from the slot pointer, and then read the slot size from there.

      Delete
  46. I have been googlin around some about data alignment, and I'm curious of how you generally handle data alignment in the Bitsquid engine since it is multiplatform. Do you use alignas and alignof from the C++11 standar, or do you use some compiler specific methods?

    Also I'm interested of how you handle it in your allocators. if it's not to much to ask of you.

    ReplyDelete
  47. We use compiler specific attributes right now, such as __align.

    As seen above, our default allocate function takes a size and an alignment

    void *allocate(size_t size, size_t align);

    The alignment must be a power-of-two and <= size. The different allocators handle alignment as necessary. The simplest method is to just allocate extra memory and then offset the pointer to make sure that it has the right alignment. But most of our allocators are smarter than that. When they request a page from the system, and chop it up for allocation, they make sure that the blocks fall on reasonable alignment boundaries.

    ReplyDelete
  48. Thanks for the post, it's always interesting to see how others do it!

    I have one question regarding the deletion of instances using the templated make_delete(T *p) function, though.

    If people use multiple inheritance and don't delete instances using a pointer-to-base, the pointer handed to make_delete() will be wrong because the derived-to-base-offset hasn't been accounted for.

    I see two possibilities for handling this:
    1) Disallow multiple inheritance
    2) Tell users that they have to properly cast their pointers to base-pointers in cases of MI first

    So, how do you handle this?

    ReplyDelete
  49. Thank you very much for this ineresting post.

    I very much like the approach of orthogonalizing various aspects of the memory subsystem like linearization, stack traces, leak tracking and memory corruption.

    However, I'm wondering:

    1) How would you deal with private destructors?
    2) Wouldn't it help compile-times to replace the make_new template functions with a single variadic macro?

    I'm thinking of something along these lines:

    #define make_new( allocator, T, ... )\
    new ( (allocator)->allocate( sizeof(T), __alignof(T) ) ) T( __VA_ARGS__ )

    What's your opinion?

    ReplyDelete
  50. @molecularmusings True. We don't use multiple inheritance so we don't run into this problem.

    ReplyDelete
  51. @smocoder

    1) This doesn't work with private destructors, but regular new and delete wouldn't work in that case either. If you have a private destructor you typically have some other way of creating and destroying objects than using new/delete. In that case you would just make sure that that system worked with the allocator model.

    2) Actually we have pretty much exactly that macro in our code base. And we are transitioning from using the templates to using the macro to improve compile-times and reduce object size, just as you suggest. Good point!

    ReplyDelete
    Replies
    1. We use a macro-based approach in the Molecule Engine as well. It doesn't use variadic macros however, and looks somewhat similar to this:

      #define ME_NEW(type, arena) \ new ((arena)->Allocate(sizeof(type), ME_ALIGN_OF(type)) type

      The lone "type" at the end ensures that you can just open another set of parentheses for providing constructor arguments, like so:

      TestClass* tc = ME_NEW(TestClass, arena)(1, 2, 3);

      Delete
    2. This comment has been removed by the author.

      Delete
  52. Hi there, great article! I just had a quick question about the allocators you use and how they are used with your container classes (in this article and the Foundation example you created).

    I have been playing around with making my own Heap style allocator which allocates memory from a global pool/blob (just like in this example). I currently use an std::vector to track allocations which happen (a struct containing the the start and end address of an allocation). I use this to find and remove allocations, and detect where there are gaps in the memory blob to make new allocations if they'll fit. I realised I would like to not have to use std::vector and create my own Vector class in a similar style to the one you created in the Foundation example code, but I hit a problem. The Allocator needs a dynamic, resizing array to track allocations, but the dynamic resizing array needs an allocator itself when it is created, and that doesn't quite work as I have a sort of chicken/egg scenario. I could be completely miss understanding but from the example outlined above I assumed that you would not call new/malloc or delete/free at all (not quite sure what dlmalloc is I'm afraid). I guess what I am trying to ask is how do you track/store allocations that happen in your base allocator. I suppose I could use some sort of Scratch or Temp Allocator to hold the vector inside the Heap Allocator, but that seemed sort of messy and I was hoping there was a nicer solution. I thought I'd ask you in case I've got things horribly wrong and am barking up the wrong tree, I hope you understand what I'm prattling on about :)

    Thanks!


    Tom

    ReplyDelete
    Replies
    1. I tend to not use traditional data structures (vectors, etc) in the implementation of the allocators, just because of this reason. It becomes tricky to keep track of what is happening if the process of allocating memory triggers a resize which triggers another memory allocation.

      So instead I use other techniques, such as chaining together blocks of memory in implicit linked lists, having "header blocks" in memory regions allocated from the backing allocator that stores information about the number and type of allocations. Perhaps also a "master header" block with global information, etc.

      Dlmalloc can be found here http://g.oswego.edu/dl/html/malloc.html. You can see how it implements some of these techniques. You can find similar stuff in other memory allocators.

      There are some other situations when the chicken-and-egg problem can crop-up, such as when creating the initial global allocators. I don't want to use static variables, since I want to have a clear initialization and destruction order. So instead I placement-new them into a statically allocated memory buffer (as you can see in memory.cpp in the foundation project).

      Delete
    2. Hi Niklas,

      Thank you very much for the response, that makes a lot of sense.

      As a temporary solution (while I couldn't think of anything better) I used a simple linear allocator internal to the heap allocator which used a member variable char buffer in the heap allocator as it's memory. I set it to what seemed like an appropriate size and stuck an assert in to catch if ever it grew too large. The implicit linked list solution sounds like a nice solution, I will definitely check out Dlmalloc too.

      Thank you again, the Foundation project has been incredibly interesting to poke around in, both in terms of the coding style and techniques.

      Cheers!


      Tom :)

      Delete
  53. This comment has been removed by the author.

    ReplyDelete
  54. Hi Niklas,

    Just to note that I reference this in my blog post here. I talk about applying this kind of allocator setup in the specific context of a custom STL style vector class, and also the addition of realloc() support.

    Feedback welcome!

    Thomas

    ReplyDelete
  55. Hi there, I tried to understand a few concepts that you've shared here.

    You have a PageAllocator (which in turn uses VirtualAlloc, mmap etc) as the top level allocator. How do you handle allocated_size() in this allocator? If you store it directly in the page you may need to allocate another page just to store this information. This way request of 4KB would need to be satisfied by allocation of 8KB of memory in 2 pages, just to store it's size and to align the space accordingly. Do you have a separate hash table inside the allocator just for the bookkeeping? Does PageAllocator even implement the Allocator interface?

    As for the dlmalloc it uses only 2 global callbacks for allocating it's memory. How can you give it an allocator to allocate from? Did you modify it and pass your own callbacks and an Allocator pointer on creation, so it can allocate from it? By default, dlmalloc also coalesce adjacent segments and free them in one go (in one call to munmap()). How do you handle this behavior in your allocators?

    ReplyDelete
  56. On Windows you can use VirtualQuery() to get the size. But not all platforms provide a similar interface. On other platforms we have a hash table in the allocator for storing the size of page allocations. (As an optimization, we only store allocations > 1 page, if the page is not in the table the allocation size is assumed to be 1 page.)

    Yes, we have modified dlmalloc so that it uses a PageAllocator instead of mmap to allocate system memory.

    ReplyDelete
    Replies
    1. Thanks for the reply!

      What about the coalesced pages in dlmalloc? Did you turn them off as well or did you made PageAllocator to support that?

      Delete
    2. As far as I remember, I haven't done either. I use the mspace interface and provide my own versions of mmap and munmap to dlmalloc.

      Delete