When I dove into memory management and tried to optimize the different data structures that we use in the engine, I found that memory management is really important and makes a huge difference in performance. Then I discovered the String.h C library with many raw memory functions that would allow me to micro-manage the memory myself through a data structure. This would allow the CPU to more easily predict what part of the memory to cache when iterating over many elements in memory. Another benefit of managing raw byte offsets is that you can in a very trivial way make it threadsafe. That gave me more opportunities to thread the other systems in the engine easier.
It is a very trivial solution, ask the system for x bytes of memory and it will return a start address followed with enough space to cover the byte size you asked for. But you probably don't want to work in bytes, more so interact and store different data structures or native types. Every native
type or data structure has a byte size, ints are 4 bytes large, doubles are 8 bytes large and the list goes on.
Let's say you want to store 16 ints in the memory, each int is 4 bytes in size. To calculate the size of how much memory we need to allocate to store 16 ints we just need to take the number of items and multiply it with the size of the item. So in our case 16 * 4 which equals 64 bytes.
Then we ask the system for 64 bytes of memory and it will return a pointer that points to the start of this memory block.
Now we have a block of memory 64 bytes big, the memory is nulled or just garbage. The next step in the sequence is to allocate our individual elements. Then we need a way to keep track of where we currently are in the memory from the pointer that marks the start of our memory block. We can call this variable "Current byte offset". We also need a safeguard to make sure we don't exceed the bounds of the memory block, in our case we don't write more than 64 bytes.
This one is called "Max byte offset" and are equal to the start added with our total size (64 bytes). You can see an illustration of this in the image below this paragraph.
Creation Of An Element
We have all the pieces in place to start storing data in the memory block. The process of just adding an element into the memory block is very simple.
Step one, use the max size variable and check if the current element count + 1 is more or equal to the max size of the memory block. If it exceeds the max size return null or assert.
Step two, calculate the byte offset for the new element we want to store. You get the byte offset by multiplying the number of items currently stored in the memory block with the size of the element (in bytes).
Step three, call the overloaded "new" operator and in the function arguments, you want to tell it at which address it should allocate the element. Using the byte offset we calculated in step two we can just add it to our start pointer and use it as the argument for the "new" function.
As the last step, we just increment the element count by one and return the pointer to the element we just allocated in the memory block.
Removing Of An Element
Removing an element is a bit more tricky for several reasons. For one, we do not want fragmented memory. Fragmented memory forces the CPU to jump further in the memory and will cause the CPU to cache nulled memory. There is also the problem that if we remove an element further back in the memory block we have no idea of what addresses of the memory block are deleted elements and still active elements.
There are plenty of solutions to this problem, we could use the same system as the computer does, almost like a double linked list with memory addresses. Perhaps using a restful system to keep track of what is marked as dirty memory and not.
I did not have the time to implement a complex system for the memory pool, so I went with a more simple solution which is a queue system. The queue stores the address of the object on removal.
So in a more step by step explanation, decrease the element count, add the memory address to the queue, call the destructor of the element we are removing, as the last step we use memset to clear all the data located at the memory address of the item we are removing.
While this won't fix the issue at hand. The issue lies when we are calling the Create function. It will exclusively increment the counter and add an element to the next index. The fix to the issue is easy when we have implemented a queue. You just need to check "Is the queue empty?". If not then use the first value of the queue as the new address of the element we are assembling, then pop the front object.
Making It Threadsafe
One of the goals with the data structure was to be able to write to the same part of the memory from numerous threads simultaneously. This sounds like a complex problem but it is not. We just have integers as counters for our elements in the memory block right now.
Let's make them atomic instead. The atomic part of the variable will ensure the value will be identical across the threads and synced between them as well.
We do need to change one more thing when writing to the memory queue because the queue is not threadsafe. Therefore we need a mutex to use as a lock around the queue. With these few changes, we now have a threadsafe memory pool.
Now we have a very capable and optimized data structure that can store and remove data. I added some more features like being able to iterate over elements in the memory block, storing a copy of an element. I won't cover it in this post but I will publish the entire data structure on Github.
The use case for this kind of data structure is in any data management-intensive application. When you need to write to a list from multiple threads and then iterate over the elements added.
There are a few things I could do to make it a more versatile data structure, adding the support for it to grow if the element count exceeds the max element count, writing a custom memory queue that is by default threadsafe. But in its current state, it is still a very capable data structure that fills its purpose very well. I would not have been able to thread as many systems as I have without this data structure.
You can find the full memory pool here.