-
Notifications
You must be signed in to change notification settings - Fork 3k
Improvements to the array module #10578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
CT Test Results 3 files 130 suites 1h 22m 44s ⏱️ For more details on these failures, see this check. Results for commit 0daddab. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
|
Your commiting-BEAM-file check is drunk. |
|
Yes, it detected that one of your contributions contains |
fa44367 to
d460655
Compare
The diff included changes that had happened on the target branch (master), not just the ones in the contribution. I rebased and pushed a fix that I think should work for the future. |
d460655 to
68c7f2e
Compare
68c7f2e to
f1b794f
Compare
|
I wait with this until the other (discussed changes) are incorporated. |
Measurements gives that a bucket size of 16 seems to be the best size nowadays. Measurements where done on size 8, 10, 16 and 32. The power of 2 size also enables us to use bit operations instead of rem/div, that gave even more performance even if the div/rem operations where optimized. Further optimizations might be to remove the bit-size storage in the nodes, and only keep that on the top-level array.
Remove unused tests in source code.
415894b to
8bb8b6d
Compare
| end, | ||
| #array{size = Size, max = M, default = Default, elements = E}. | ||
| E = find_max(Size - 1, ?SHIFT), | ||
| C = ?NEW_CACHE(Default), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder to what extent it would be worth it to try and optimise the operations when the default is unchanged to return literals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think I understand?
Unchanged the default is 'undefined', which should be an immediate, can't be better than that, can it?
But even if it the user set default to an literal, wouldn't that just be a pointer to literal area, what do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean to have the macros like ?NEW_CACHE or others that dynamically create tuples for values to instead return a literal tuple if Default == ?DEFAULT - avoiding some of the allocations
Keep the size information in meta-data only, and forward recursively. Reduces memory and removes one bit shift on recursions. Fixed coverage and removed some dead code.
afbd032 to
fa06f5b
Compare
| -record(array, {size :: non_neg_integer(), %% number of defined entries | ||
| max :: non_neg_integer(), %% maximum number of entries | ||
| %% in current tree | ||
| fix :: boolean(), %% not automatically growing | ||
| default, %% the default value (usually 'undefined') | ||
| elements :: elements(_) %% the tuple tree | ||
| cache :: cache(), %% cached leaf tuple | ||
| cache_index :: non_neg_integer(), | ||
| elements :: elements(_), %% the tuple tree | ||
| bits :: integer() %% in bits | ||
| }). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to store some fields together to save on garbage when the array is updated - the old implementation had just 4 top-level fields, this has 7, so every update operation creates at least 3 words more of garbage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought about it, but most fields are needed regularly enough that you don't want to follow an extra pointer to access them. But a linear type analysis allowing in-place updates would be nice...
|
|
|
The |
And added more regression tests.
552db0d to
cbf14e5
Compare
cbf14e5 to
0daddab
Compare
Takes care of a couple of old TODO notes: adds write caching, which speeds up sequential writes by upwards of 300%, and adds pruning of the data structure when the array shrinks, so that the unused parts can be GC:d.