-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
libexpr: add builtins.filterAttrs with tombstone optimization
#15100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/libexpr/primops.cc
Outdated
| for (auto & i : *args[1]->attrs()) { | ||
| Value * vName = Value::toPtr(state.symbols[i.name]); | ||
| Value * callArgs[] = {vName, i.value}; | ||
| Value res; | ||
| state.callFunction(*args[0], callArgs, res, noPos); | ||
| if (state.forceBool( | ||
| res, pos, "while evaluating the return value of the filtering function passed to builtins.filterAttrs")) | ||
| attrs.insert(i.name, i.value); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, with the layering optimisation we have an opportunity to avoid copying anything at all. I did have some PoC branch that added tombstone values in case the attrset we are removing stuff from is much larger than the number attributes that get filtered out. This way we'd easily avoid the hot path when no stuff actually gets filtered out (I think that's exactly how it's used in the successful case in callPackage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that checks out. How POC is the POC tombstone branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think I've implemented enough stuff for the k-way merge iterator to skip nullptr values (so upper attrset "layers" can remove attrs from the resulting bindings and without actually modifying the lower ones).
I think something like: loop over the bindings and collect the ones that should be removed until we reach a sensible count that we actually have to allocate. The primary question would be about what sort of heuristic do we apply? Is removing 8 attrs out of 128 enough to trigger this optimisation?
I can look into digging up that branch and putting that up as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be data driven here. As you point out, it's used in callPackage -- which is very used everywhere.
I added tracing to lib.filterAttrs and evaluated NixOS tests:
Results
| Metric | nixosTests.simple |
nixosTests.gnome |
|---|---|---|
| Total calls | 2,910 | 3,882 |
| Empty sets (0 → 0) | 1,461 (50%) | 2,146 (55%) |
| Platform cmp (117 → 112) | 692 (24%) | 696 (18%) |
| Tiny (1-5 elements) | 490 (17%) | 705 (18%) |
| Large (>100 elements) | 697 (24%) | 704 (18%) |
What's being filtered
| Sizes | What it is |
|---|---|
| 768 → 706 | Kernel config options (8139TOO_8129, ACCESSIBILITY...) |
| 289 → 232 | Font configuration |
| 217 → 217 | NixOS users/groups (unchanged - no-op) |
| 117 → 112 | Platform/system config (lib.systems.equals) - 696 calls |
| 0 → 0 | Empty sets from module system - ~50% of all calls |
Key Sources
-
lib.systems.equals(lib/systems/default.nix:46) - 696 calls comparing platforms by filtering out 5 function-valued attributes from 117-element system configs. Same filtering repeated hundreds of times on same data. -
lib/types.nix:873(module system) - filters merged attribute definitions. Generates most of the0 → 0empty-set calls. -
lib/customisation.nix:290(callPackageWith) - checks for missing required arguments.
Thoughts
~55% of calls filter empty sets, ~24% repeat the same platform filtering. The C++ builtin helps (especially with the early empty-set bailout), but there are also nixpkgs-specific changes possible:
- Cache filtered results in
lib.systems.elaborate - Avoid
filterAttrsin module system hot paths when input is known empty
95d9e27 to
c0f71cc
Compare
| push(cursor); | ||
|
|
||
| // If this is a tombstone (nullptr value), skip it and all shadowed entries | ||
| if (current->value == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the invariant should be that the iterator never points to a tombstone. Then no changes are needed to downstream.
c0f71cc to
33cee85
Compare
|
Marking as draft while iterating. |
src/libexpr/primops.cc
Outdated
| for (const auto & i : input) { | ||
| Value * vName = Value::toPtr(state.symbols[i.name]); | ||
| Value * callArgs[] = {vName, i.value}; | ||
| Value res; | ||
| state.callFunction(*args[0], callArgs, res, noPos); | ||
| bool keep = state.forceBool( | ||
| res, pos, "while evaluating the return value of the filtering function passed to builtins.filterAttrs"); | ||
| filterResults.emplace_back(i.name, keep); | ||
| if (keep) | ||
| ++numKept; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like std::views::filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should we instead count the stuff that gets erased? Not sure about the tradeoffs here, but my intuition is that the usecases tend to erase (but this needs empirical evidence from nixpkgs I suppose)
src/libexpr/primops.cc
Outdated
| // First pass: evaluate filter for all attributes and collect results | ||
| boost::container::small_vector<std::pair<Symbol, bool>, 64> filterResults; | ||
| filterResults.reserve(input.size()); | ||
| size_t numKept = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems inevitable if we want to avoid over-allocation.
| if (attr.value == nullptr) { | ||
| if (base.get(attr.name)) | ||
| ++tombstonesShadowingBase; | ||
| else | ||
| ++tombstonesNotShadowing; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, the iterator should never point to a tombstone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterator doesn't. These are the internal iterators over attrs, which is the FAM above.
src/libexpr/primops.cc
Outdated
| if (input.empty()) { | ||
| v = *args[1]; | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does change the lazyness of the function -- if attrs is empty, then the function isn't forced!
| @@ -364,8 +383,12 @@ public: | |||
| const Bindings * currentChunk = this; | |||
| while (currentChunk) { | |||
| const Attr * maybeAttr = getInChunk(*currentChunk); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getInChunk is just a binary search that finds an Attr by name, regardless of whether the value is nullptr.
| if (attr.value == nullptr) { | ||
| if (base.get(attr.name)) | ||
| ++tombstonesShadowingBase; | ||
| else | ||
| ++tombstonesNotShadowing; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterator doesn't. These are the internal iterators over attrs, which is the FAM above.
| @@ -484,6 +512,8 @@ private: | |||
| auto attrs = std::span(bindings->attrs, bindings->numAttrs); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This iterator is over the FAM in the class above, which contains tombstones.
aba11ff to
b3e7061
Compare
|
FWIW, I'm seeing performance regressions with the tombstone approach that I don't see with the original approach when I'm testing against Nixpkgs. The original approach has no regressions I can find with the package and the NixOS test evals. What's in this branch right now is a consistent 3-5% slower, though it does use less memory. |
b3e7061 to
52a7103
Compare
This adds support for tombstone values (nullptr) in the Bindings layer system, enabling attribute deletion through layering rather than copying. Changes: - Bindings::get() treats nullptr values as "attribute deleted" - Bindings::iterator skips tombstone entries during iteration - BindingsBuilder::finishSizeIfNecessary() properly accounts for tombstones when calculating the size of layered bindings This infrastructure enables efficient filterAttrs implementation by layering tombstones on top of the original attrset instead of copying all kept attributes.
The tombstone infrastructure was adding ~5% overhead to general evaluation even when no tombstones were being used, due to extra checks in hot paths. Profiling showed BindingsBuilder::push_back went from 0.09% to 1.77% (20x!) due to the tombstone count tracking on every attribute insertion. This commit: - Packs hasTombstonesInChain into numLayers field (bit 31) to avoid increasing Bindings struct size (stays at 24 bytes vs 32 bytes unpacked) - Adds hasTombstonesInLayer flag to BindingsBuilder (set only by insertTombstone) - Adds insertTombstone() method for explicit tombstone insertion - Keeps push_back() completely clean - no tombstone checks - Fast path in finishSizeIfNecessary() when no tombstones anywhere The flag is propagated through layers: true if base has tombstones OR if this layer adds tombstones via insertTombstone(). Normal evaluation (no filterAttrs) has ~4% overhead from the tombstone checks in get() and iterator.
Rewrite builtins.removeAttrs to use the tombstone layering system introduced in the previous commits. This avoids copying all remaining attributes when removing from large attribute sets. Changes: - Add fast paths for common cases: - Empty removal list: return original attrset unchanged - Empty source attrset: return emptyBindings - No matching attributes: return original attrset unchanged - For attrsets >= 8 elements with available layer capacity: - Layer tombstones on top instead of copying all non-removed attrs - Only insert tombstones for attributes that actually exist - Fall back to copy-based set_difference for small attrsets or when the layer list is full (max 8 layers) Performance results (nixpkgs firefox/chromium evaluation): - Memory: 3-5% reduction in set allocation bytes - CPU: Neutral (within measurement noise) - Sets created: 1.4-1.8% fewer allocations The tombstone approach trades a small per-lookup cost for significant memory savings when removeAttrs is called on large attribute sets, which is common in nixpkgs (e.g., removing meta, passthru from mkDerivation).
52a7103 to
11f3da7
Compare
|
I think I found a way through the thicket. I'm still seeing eval times (measured by the ci/eval machinery in Nixpkgs) being consistently 2.5% higher, though. So tombstones might be a memory win but not a win on eval time. |
builtins.filterAttrsbuiltins.filterAttrs with tombstone optimization
|
Please add these as test cases for laziness: nix-repl> builtins.attrNames (lib.filterAttrs (name: value: name == "a") { a = abort "a"; b = abort "b"; c = abort "c"; })
[ "a" ]
nix-repl> lib.filterAttrs (abort "f") { }
{ } |
Add a new builtin function filterAttrs that filters attribute sets based
on a predicate function. The predicate receives both the attribute name
and value, returning true for attributes to keep.
Implementation uses tombstones for large attrsets (>= 8 elements) when
layer capacity is available. Instead of copying all matching attributes,
it layers tombstones for non-matching attributes on top of the source.
This is beneficial when most attributes pass the filter (common case).
Fast paths:
- Empty attrset: return unchanged
- All attributes pass: return unchanged
- No attributes pass: return emptyBindings
Falls back to copy-based approach for small attrsets or when the
layer list is full.
Example usage:
builtins.filterAttrs (name: value: value > 5) { a = 3; b = 6; c = 10; }
=> { b = 6; c = 10; }
11f3da7 to
972868c
Compare
|
FWIW, the k-way merge has some low-hanging fruit to improve CPU time like avoiding heap churn for long runs of non-intersecting symbols (should be very common with by-name structure). |
|
Added the lazy test cases. Now I'm really going to bed instead of obsessing about why there's 2.5% difference. |
roberth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| @@ -0,0 +1,2 @@ | |||
| # Test that filterAttrs doesn't call the function when the attrset is empty | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could evaluate without calling, and that's basically the mistake this is supposed to catch
| # Test that filterAttrs doesn't call the function when the attrset is empty | |
| # Test that filterAttrs doesn't evaluate the function when the attrset is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic: for the second time I read this code and freak out because GitHub shows it after the == "a" test case.
Maybe we should do .nix and .nix.exp so that the list starts with the code and then the result; more natural.
Summary
This PR adds
builtins.filterAttrsand introduces a tombstone system for the Bindings layer that enables memory-efficient attribute deletion through layering rather than copying.Changes
Tombstone infrastructure - Bindings now support tombstone values (nullptr) that mark attributes as deleted in layered bindings. This enables efficient removal operations where we layer deletions on top rather than copying all remaining attributes.
removeAttrsoptimization - For large attrsets (≥8 elements),removeAttrsnow layers tombstones on top of the source instead of copying all non-removed attributes.builtins.filterAttrs- New builtin that filters attribute sets based on a predicate. Uses tombstones for large attrsets when most attributes pass the filter.Example
Performance
Tested with
lib.filterAttrsupdated to usebuiltins.filterAttrs or <fallback>, comparing Nix 2.33.1 (baseline) vs 2.33.1+filterAttrs (this PR).filterAttrs microbenchmark
100 iterations of
lib.filterAttrson a 10,000-attribute set:NixOS test evaluations
nixosTests.simplenixosTests.firefoxnixpkgs ci/eval (5 chunks of 5000 attributes)
Summary
The tombstone infrastructure adds overhead to general attribute operations. The wins manifest when
filterAttrsis used heavily.Implementation details
AttrarrayBindings::get()returns nullptr when it finds a tombstone (attribute deleted)Bindings::iteratorskips tombstones during iteration (K-way merge)hasTombstonesInChainflag enables fast-path when no tombstones exist (bit-packed intonumLayersfield to avoid struct size increase)Acknowledgments
Inspired by DeterminateSystems#291 - thanks to the original authors. This implementation extends the concept with tombstone-based memory optimization hinted at by @xokdvium.