-
Notifications
You must be signed in to change notification settings - Fork 11
Add ResizeBuffer: automatically growing buffer
#54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dc29151 to
b6b39ba
Compare
| """ | ||
| ResizeBuffer{StorageType} | ||
|
|
||
| This is a simple bump allocator that could be used to store a fixed amount of memory of type | ||
| `StorageType`, so long as `::StorageType` supports `pointer`, and `sizeof`. | ||
|
|
||
| Do not manually manipulate the fields of a `ResizeBuffer` that is in use. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a very accurate description right? This thing allows overflow, and will adaptively resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still have to actually fix the docs, this is mostly just copied from the AllocBuffer. Will address this though!
| buf::Ptr{Cvoid} | ||
| buf_len::UInt | ||
|
|
||
| offset::UInt | ||
| max_offset::UInt | ||
|
|
||
| overflow::Vector{Ptr{Cvoid}} | ||
|
|
||
| function ResizeBuffer(max_size::Int = default_max_size; finalize::Bool = true) | ||
| buf = malloc(max_size) | ||
| buf_len = max_size | ||
| overflow = Ptr{Cvoid}[] | ||
| resizebuf = new(buf, buf_len, UInt(0), UInt(0), overflow) | ||
| finalize && finalizer(free, resizebuf) | ||
| return resizebuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use malloc/free here rather than just storing a Vector{Memory{UInt8}} and then taking the pointers from those Memorys?
I'm okay with using malloc/free here but just curious how conscious a decision it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that it is not that conscious of a decision. I guess I was trying to follow the style of the SlabBuffer, also because I'm not entirely sure about what is and isn't allowed in the scope of this package (I seem to vaguely recall that at some point not using the Julia GC was important for static compilation etc, but might be wrong here?)
Thinking about it more properly though, I would argue that it might be beneficial to keep it as malloc and free, since we already have the infrastructure of escape analysis etc anyways it might be nice to avoid possibly triggering the Julia GC, for example in multithreaded environments? This is however just an idea, I have no data to back this up or to substantiate if this is relevant at all.
| function alloc_ptr!(b::ResizeBuffer, sz::Int)::Ptr{Cvoid} | ||
| old_offset = b.offset | ||
| b.offset += sz | ||
| b.max_offset = max(b.max_offset, b.offset) | ||
|
|
||
| # grow the buffer - only available if empty | ||
| if iszero(old_offset) & (b.max_offset > b.buf_len) | ||
| free(b.buf) | ||
| b.buf = malloc(b.max_offset) | ||
| b.buf_len = b.max_offset | ||
| end | ||
|
|
||
| if b.offset ≤ b.buf_len # use the buffer if there is enough space | ||
| ptr = b.buf + old_offset | ||
| else # manually allocate if not | ||
| ptr = malloc(sz) | ||
| push!(b.overflow, ptr) | ||
| end | ||
|
|
||
| return ptr | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may get better performance here if you basically hide all the special cases behind a @noinline function barrier, and then just have the "happy path" where you just need to do ptr = b.buf + old_offset in the main part of this function.
non-inlined function barriers tend to cost around 10ns or so, but make it so that all the stuff behind that barrier doesn't need to be considered by the optimizer, potentially leading to better optimized function bodies, and the branch predictor can learn that it usually won't need to care about that branch.
Really depends though, might be worth checking some benchmarks to see if it makes any difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can come up with some tests (but probably not before next week).
Note that my use-case (tensor contractions) typically is not really bound by the actual speed of the allocations (the operations themselves easily take ~seconds), but rather it is important to not have to constantly trigger the GC because memory fills up and then stopping all threads because of that. Nevertheless I agree it is nice and worth it to investigate if this allocation can be made faster, so I do appreciate the suggestion!
This is the port of some work I did for TensorOperations.jl.
The idea is that in a workflow that repeatedly does the same task that might allocate large objects, but where it is a priori difficult to foresee how much memory is needed, this approach attempts to strike a balance between user-friendly, performance, and number of allocations.
The approach is simple, we keep a buffer alive like
AllocBuffer, but instead of erroring whenever we oversubscribe the buffer, we manually allocate extra objects (more like theSlabBuffer).Additionally, we keep a counter that simply keeps track of the maximal size that would have been reached if the buffer were sufficiently large, and after resetting the buffer, the next allocation will trigger a resize to account for that.
Note that this still needs some work, but could already do with a review.
warning that the tests have been AI generated and I still have to find the time to review that more closely myself!
Additionally I still have to have a look at the docs and make sure these are updated as well.