Closed
Conversation
Introduce nine new builtins that provide native access to files from a git repository during Nix evaluation, replacing the need to shell out to git via derivations: - builtins.unsafeTectonixInternalManifest: Get zone path -> zone ID mapping - builtins.unsafeTectonixInternalManifestInverted: Get zone ID -> zone path mapping - builtins.unsafeTectonixInternalTreeSha: Get tree SHA for a world path - builtins.unsafeTectonixInternalTree: Fetch a tree by SHA as a store path - builtins.unsafeTectonixInternalFile: Read file contents from world repository - builtins.unsafeTectonixInternalZoneSrc: Get zone source as store path - builtins.unsafeTectonixInternalDir: List directory contents - builtins.unsafeTectonixInternalSparseCheckoutRoots: Get zone IDs in sparse checkout - builtins.unsafeTectonixInternalDirtyZones: Get zone path -> dirty status mapping New CLI flags: - --tectonix-git-dir: Path to git directory - --tectonix-sha: Git commit SHA to use - --tectonix-checkout-path: Optional checkout for source-available mode Features: - Lazy initialization of git resources - Tree SHA caching at each path level - Source-available mode for local development (prefers checkout files) - Zone dirty detection: computes dirty status for all sparse-checked-out zones by walking dirty files and matching against zone paths - ODB-only mode: bypasses full repository validation to support repos with unsupported extensions (e.g., reftables) - Sparse checkout awareness: reads .git/info/sparse-checkout-roots to determine which zones are checked out (supports gitdir worktrees) Also fixes GitSourceAccessor::readDirectory to return file types instead of always returning unknown. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
When lazy-trees is enabled, zone sources are mounted lazily using
GitSourceAccessor and only copied to the store when used as derivation
inputs (devirtualized).
Key changes:
- Add tectonixZoneCache_ for tree SHA -> store path deduplication
- Add getZoneStorePath() orchestrating lazy vs eager zone fetching
- Add mountZoneByTreeSha() for lazy mounting clean zones
- Add getZoneFromCheckout() as extension point for dirty zones (eager)
- Add getOrMountWorldRoot() for read-only world access
New builtins:
- worldZone: Returns { outPath, root, treeSha, zonePath, dirty }
- outPath: string with context for derivation src
- root: path for reading files without devirtualization
- worldRoot: Returns path to world root for read-only eval access
Simplify __unsafeTectonixInternalZoneSrc to use getZoneStorePath().
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove builtins now superseded by lazy-trees zone access: - __unsafeTectonixInternalFile: use builtins.readFile with __unsafeTectonixInternalRoot - worldZoneFile: use builtins.readFile (zone.root + "/path") - __unsafeTectonixInternalDir: use builtins.readDir (zone.root + "/path") Rename to consistent internal naming: - worldZone → __unsafeTectonixInternalZone - worldRoot → __unsafeTectonixInternalRoot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove __unsafeTectonixInternalRoot (and getOrMountWorldRoot, worldRootStorePath_) - Add validation to __unsafeTectonixInternalZone to ensure the path exists in the manifest (is an exact zone root, not a prefix or subdir) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
With lazy-trees enabled, dirty zones are now mounted lazily using makeFSSourceAccessor rooted at the zone directory in the checkout. This avoids copying the entire zone to the store until it's actually used as a derivation input. - Add tectonixCheckoutZoneCache_ for caching mounted checkout zones - Update getZoneFromCheckout to mount lazily when lazy-trees enabled - Cache by zone path for the duration of evaluation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit implements support for internal zones - zones that exist within other zones, providing encapsulation and modularity. Internal zones are hidden from their host zone's lazy-tree source (the `_internal` directory is filtered out), addressable as first-class zones via extended paths like `//a/b/c/_internal/d/e`, and recursively nestable (internal zones can have their own `_internal` with more zones). The implementation uses a "peel" operation to parse zone paths, recursive resolution for tree SHA computation, a `ZoneFilteringAccessor` that hides `_internal` directories at every level, and updated dirty zone detection that maps dirty files to the most specific zone. All accessor creation paths (lazy mount, eager copy, checkout mount) now wrap accessors with the zone filter to ensure hermetic isolation.
# Nested Zones Design and Implementation Plan
This document describes the design for **nested (internal) zones** - zones that exist within other zones, providing encapsulation and modularity.
## Overview
Internal zones are:
- **Hidden** from their host zone's lazy-tree source (as if `_internal` doesn't exist)
- **Addressable** as first-class zones via extended paths like `//a/b/c/_internal/d/e`
- **Recursively nestable** - internal zones can have their own `_internal` with more zones
### Constraints
The `_internal` directory must contain precisely:
1. A `manifest.json`
2. Zone directories
3. No other files
Internal zones are only readable from:
- The enclosing zone
- Co-internal cousins within that enclosing zone
---
## Zone Path Algebra
### Grammar
```
zone_path ::= top_level | internal
top_level ::= "//" segments
internal ::= zone_path "/_internal/" segments
segments ::= name ("/" name)*
```
This grammar reveals the key insight: **an internal zone path is recursive** - the host of an internal zone can itself be an internal zone.
### Examples
| Path | Host | Internal Path |
|------|------|---------------|
| `//areas/tools/tec` | (root manifest) | — |
| `//areas/tools/tec/_internal/helpers` | `//areas/tools/tec` | `helpers` |
| `//areas/tools/tec/_internal/a/b/_internal/c` | `//areas/tools/tec/_internal/a/b` | `c` |
### The "Peel" Operation
Every zone path can be **peeled** into at most one layer:
```cpp
struct PeeledZonePath {
std::optional<std::string> hostPath; // nullopt for top-level
std::string localPath; // The path to look up in manifest
};
PeeledZonePath peel(std::string_view path) {
auto pos = path.rfind("/_internal/");
if (pos == std::string_view::npos) {
return {.hostPath = std::nullopt, .localPath = std::string(path)};
}
return {
.hostPath = std::string(path.substr(0, pos)),
.localPath = std::string(path.substr(pos + 11)) // skip "/_internal/"
};
}
```
This is elegant because:
- `peel("//a/b/c")` → `{nullopt, "//a/b/c"}` — top-level
- `peel("//a/b/_internal/c")` → `{"//a/b", "c"}` — one level of nesting
- `peel("//a/_internal/b/_internal/c")` → `{"//a/_internal/b", "c"}` — recursive host
---
## Resolution Algorithm
```
resolveZone(path):
peeled = peel(path)
if peeled.hostPath is null:
# Top-level zone: use root manifest
manifest = readRootManifest()
assert peeled.localPath in manifest
treeSha = computeTreeShaFromWorldRoot(peeled.localPath)
return Zone(path, treeSha, manifest[peeled.localPath].id)
# Internal zone: resolve host first (recursive!)
hostZone = resolveZone(peeled.hostPath)
# Read host's internal manifest
internalManifest = readFile(hostZone.tree, "_internal/manifest.json")
assert peeled.localPath in internalManifest
# Compute tree SHA relative to host
treeSha = getSubtreeSha(hostZone.treeSha, "_internal/" + peeled.localPath)
return Zone(path, treeSha, internalManifest[peeled.localPath].id)
```
The beauty: **one algorithm handles arbitrary nesting depth** through recursion.
---
## Source Filtering: The Disappearing `_internal`
Every zone's source accessor must filter out `_internal` directories **at every level**:
```cpp
class ZoneFilteringAccessor : public FilteringSourceAccessor {
bool isAllowed(const CanonPath & path) override {
// Check each path component
for (auto it = path.begin(); it != path.end(); ++it) {
if (*it == "_internal")
return false;
}
return true;
}
};
```
This means:
- `//a/b/c` sees everything EXCEPT any `_internal` subdirectories
- `//a/b/c/_internal/d` sees everything EXCEPT any `_internal` subdirectories within it
- Each zone is hermetically sealed from its internal zones
---
## Manifest Structure
**Root manifest** (`//.meta/manifest.json`):
```json
{
"//areas/tools/tec": {"id": "W-123456"},
"//areas/platform/core": {"id": "W-789abc"}
}
```
**Internal manifest** (`//areas/tools/tec/_internal/manifest.json`):
```json
{
"helpers": {"id": "W-def000"},
"test-utils": {"id": "W-def001"},
"deeply/nested/thing": {"id": "W-def002"}
}
```
Note: Internal manifest paths are **relative** (no `//` prefix).
---
## Implementation Plan
### Phase 1: Zone Path Parsing Infrastructure
**File: `src/libexpr/primops/tectonix.cc`**
```cpp
namespace {
struct PeeledZonePath {
std::optional<std::string> hostPath;
std::string localPath;
bool isInternal() const { return hostPath.has_value(); }
};
PeeledZonePath peelZonePath(std::string_view path) {
auto pos = path.rfind("/_internal/");
if (pos == std::string_view::npos) {
return {.hostPath = std::nullopt, .localPath = std::string(path)};
}
return {
.hostPath = std::string(path.substr(0, pos)),
.localPath = std::string(path.substr(pos + 11))
};
}
} // anonymous namespace
```
### Phase 2: Internal Manifest Reading
**Add to `src/libexpr/primops/tectonix.cc`:**
```cpp
static std::optional<nlohmann::json> readInternalManifest(
EvalState & state,
const Hash & hostTreeSha)
{
auto repo = state.getWorldRepo();
GitAccessorOptions opts{.exportIgnore = false, .smudgeLfs = false};
auto accessor = repo->getAccessor(hostTreeSha, opts, "host");
auto manifestPath = CanonPath("_internal/manifest.json");
if (!accessor->pathExists(manifestPath))
return std::nullopt;
return nlohmann::json::parse(accessor->readFile(manifestPath));
}
```
### Phase 3: Recursive Tree SHA Computation
**Modify `EvalState::getWorldTreeSha` in `src/libexpr/eval.cc`:**
```cpp
Hash EvalState::getWorldTreeSha(std::string_view zonePath) const
{
auto peeled = peelZonePath(zonePath);
if (!peeled.isInternal()) {
// Existing top-level logic (unchanged)
return computeTreeShaFromWorldRoot(peeled.localPath);
}
// Internal zone: recursive resolution
auto hostTreeSha = getWorldTreeSha(*peeled.hostPath);
auto repo = getWorldRepo();
// Navigate: hostTree -> _internal -> localPath
auto internalTreeSha = repo->getSubtreeSha(hostTreeSha, "_internal");
// Walk through localPath segments
for (auto & segment : tokenizeString<std::vector<std::string>>(peeled.localPath, "/")) {
internalTreeSha = repo->getSubtreeSha(internalTreeSha, segment);
}
return internalTreeSha;
}
```
### Phase 4: Zone Filtering Accessor
**Add to `src/libfetchers/filtering-source-accessor.cc` or inline:**
```cpp
class ZoneFilteringAccessor : public FilteringSourceAccessor {
public:
ZoneFilteringAccessor(ref<SourceAccessor> next)
: FilteringSourceAccessor(std::move(next), makeNotAllowedError) {}
private:
static MakeNotAllowedError makeNotAllowedError(const CanonPath & path) {
return RestrictedPathError(
fmt("'%s' is hidden (inside _internal)", path));
}
bool isAllowed(const CanonPath & path) override {
for (auto it = path.begin(); it != path.end(); ++it) {
if (*it == "_internal")
return false;
}
return true;
}
};
```
### Phase 5: Updated Zone Resolution
**Modify `prim_unsafeTectonixInternalZone` in `src/libexpr/primops/tectonix.cc`:**
```cpp
static void prim_unsafeTectonixInternalZone(EvalState & state, const PosIdx pos, Value ** args, Value & v)
{
auto zonePath = state.forceStringNoCtx(*args[0], pos, "...");
auto peeled = peelZonePath(zonePath);
// Validate zone exists in appropriate manifest
if (!peeled.isInternal()) {
// Top-level: check root manifest (existing logic)
auto manifest = readRootManifest(state, pos);
if (!manifest.contains(std::string(zonePath)))
state.error<EvalError>("'%s' is not a zone", zonePath).atPos(pos).debugThrow();
} else {
// Internal: resolve host, check its internal manifest
auto hostTreeSha = state.getWorldTreeSha(*peeled.hostPath);
auto internalManifest = readInternalManifest(state, hostTreeSha);
if (!internalManifest)
state.error<EvalError>("zone '%s' has no internal manifest", *peeled.hostPath)
.atPos(pos).debugThrow();
if (!internalManifest->contains(peeled.localPath))
state.error<EvalError>("'%s' is not an internal zone of '%s'",
peeled.localPath, *peeled.hostPath).atPos(pos).debugThrow();
}
// Get tree SHA (handles recursion internally)
auto treeSha = state.getWorldTreeSha(zonePath);
// ... rest of existing logic, but wrap accessor with ZoneFilteringAccessor
}
```
### Phase 6: Updated `mountZoneByTreeSha`
**Modify in `src/libexpr/eval.cc`:**
```cpp
StorePath EvalState::mountZoneByTreeSha(const Hash & treeSha, std::string_view zonePath)
{
// ... existing cache check ...
auto repo = getWorldRepo();
GitAccessorOptions opts{.exportIgnore = true, .smudgeLfs = false};
auto rawAccessor = repo->getAccessor(treeSha, opts, "zone");
// NEW: Wrap with _internal filter
auto accessor = make_ref<ZoneFilteringAccessor>(rawAccessor);
// ... rest of existing logic ...
}
```
### Phase 7: Dirty Zone Detection for Internal Zones
**Modify `getTectonixDirtyZones` in `src/libexpr/eval.cc`:**
This is trickier because we need to:
1. Detect dirty files in the checkout
2. Map them to zones (including internal zones)
3. A file at `a/b/_internal/c/foo.nix` means zone `//a/b/_internal/c` is dirty
```cpp
// When processing dirty files, check if path contains _internal
// and attribute dirtiness to the correct internal zone
for (auto & dirtyFile : dirtyFiles) {
auto zonePath = findEnclosingZone(dirtyFile, allManifests);
dirtyZones[zonePath] = true;
}
```
---
## Summary of Changes
| Component | Change |
|-----------|--------|
| Zone path parsing | Add `peelZonePath()` function |
| Tree SHA computation | Recursive resolution for internal zones |
| Manifest lookup | Support internal manifests relative to host zones |
| Source accessor | Filter `_internal` at all levels |
| Zone validation | Check appropriate manifest (root vs internal) |
| Dirty detection | Attribute dirty files to correct zone level |
---
## Design Elegance
The elegance comes from:
1. **One grammar** for all zone paths
2. **One algorithm** (peel + recurse) for all resolution depths
3. **One filter** (`_internal` everywhere) for all source access
4. **Relative paths** in internal manifests (no duplication of host path)
---
## Edge Cases
### Zone path with consecutive `_internal`
`//a/_internal/_internal/b` — This shouldn't happen by design (manifest would declare `_internal/b`, not `_internal`). Should error gracefully.
### Missing internal manifest
Error clearly: "Zone X does not have an internal manifest"
### Zone references itself
Not possible with the manifest structure.
### Circular internal zones
Not possible — each `_internal` is strictly nested deeper.
### Dirty zone detection for internal zones
Need to check if the internal zone's files are dirty. The host zone being dirty doesn't mean the internal zone is dirty.
---
## Future Considerations: Access Control
The design mentions that internal zones are "only readable from the zone that encloses them or their co-internal cousins." This access control could be enforced at:
1. **Nix expression level** — The code that uses these builtins enforces who can call them
2. **Builtin level** — Add a "caller zone" context and validate access
This is deferred to a future phase.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
f1d233e to
58e2e01
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit implements support for internal zones - zones that exist within other zones, providing encapsulation and modularity. Internal zones are hidden from their host zone's lazy-tree source (the
_internaldirectory is filtered out), addressable as first-class zones via extended paths like//a/b/c/_internal/d/e, and recursively nestable (internal zones can have their own_internalwith more zones). The implementation uses a "peel" operation to parse zone paths, recursive resolution for tree SHA computation, aZoneFilteringAccessorthat hides_internaldirectories at every level, and updated dirty zone detection that maps dirty files to the most specific zone. All accessor creation paths (lazy mount, eager copy, checkout mount) now wrap accessors with the zone filter to ensure hermetic isolation.Design Prompt
❯ We are going to explore this possibility: what if we had nested zones?
Nested zones will always be a subdirectory at the root of a zone named '_internal'. That directory can contain any number of zones at any arbitrary nesting levels. Things in
those zones can themselves have internal nested zones but only under another '_internal' node. Internal/nested zones will not be declared in the top-level manifest, because
they are only readable from the zone that encloses them or their co-internal cousins within that enclosing zone.
So. I want to explore this: Let's simply pretend _internal doesn't exist when we expose our lazy-trees. We will, elsewhere, assert that _internal contains precisely:
...but from the perspective of a zone, these do not exist in the lazy-tree of the source.
HOWEVER. When we ask for a reference to a zone, and it is, for example, //a/b/c/_internal/d/e, we must behave quite differently!
We look up the zone //a/b/c, and then resolve its _internal/manifest.json. Then, from there, presuming it contains d/e, we know that d/e is a zone.
Let's ultrathink the crap out of this and come up with a very elegant description of this design and an implementation plan to integrate it here into tecnix.
Nested Zones Design and Implementation Plan
This document describes the design for nested (internal) zones - zones that exist within other zones, providing encapsulation and modularity.
Overview
Internal zones are:
_internaldoesn't exist)//a/b/c/_internal/d/e_internalwith more zonesConstraints
The
_internaldirectory must contain precisely:manifest.jsonInternal zones are only readable from:
Zone Path Algebra
Grammar
This grammar reveals the key insight: an internal zone path is recursive - the host of an internal zone can itself be an internal zone.
Examples
//areas/tools/tec//areas/tools/tec/_internal/helpers//areas/tools/techelpersThe "Peel" Operation
Every zone path can be peeled into at most one layer:
This is elegant because:
peel("//a/b/c")→{nullopt, "//a/b/c"}— top-levelpeel("//a/b/_internal/c")→{"//a/b", "c"}— one level of nestingpeel("//a/_internal/b/_internal/c")→{"//a/_internal/b", "c"}— recursive hostResolution Algorithm
The beauty: one algorithm handles arbitrary nesting depth through recursion.
Source Filtering: The Disappearing
_internalEvery zone's source accessor must filter out
_internaldirectories at every level:This means:
//a/b/csees everything EXCEPT any_internalsubdirectories//a/b/c/_internal/dsees everything EXCEPT any_internalsubdirectories within itManifest Structure
Root manifest (
//.meta/manifest.json):{ "//areas/tools/tec": {"id": "W-123456"}, "//areas/platform/core": {"id": "W-789abc"} }Internal manifest (
//areas/tools/tec/_internal/manifest.json):{ "helpers": {"id": "W-def000"}, "test-utils": {"id": "W-def001"}, "deeply/nested/thing": {"id": "W-def002"} }Note: Internal manifest paths are relative (no
//prefix).Implementation Plan
Phase 1: Zone Path Parsing Infrastructure
File:
src/libexpr/primops/tectonix.ccPhase 2: Internal Manifest Reading
Add to
src/libexpr/primops/tectonix.cc:Phase 3: Recursive Tree SHA Computation
Modify
EvalState::getWorldTreeShainsrc/libexpr/eval.cc:Phase 4: Zone Filtering Accessor
Add to
src/libfetchers/filtering-source-accessor.ccor inline:Phase 5: Updated Zone Resolution
Modify
prim_unsafeTectonixInternalZoneinsrc/libexpr/primops/tectonix.cc:Phase 6: Updated
mountZoneByTreeShaModify in
src/libexpr/eval.cc:Phase 7: Dirty Zone Detection for Internal Zones
Modify
getTectonixDirtyZonesinsrc/libexpr/eval.cc:This is trickier because we need to:
a/b/_internal/c/foo.nixmeans zone//a/b/_internal/cis dirtySummary of Changes
peelZonePath()functionDesign Elegance
The elegance comes from:
_internaleverywhere) for all source accessEdge Cases
Zone path with consecutive
_internal//a/_internal/_internal/b— This shouldn't happen by design (manifest would declare_internal/b, not_internal). Should error gracefully.Missing internal manifest
Error clearly: "Zone X does not have an internal manifest"
Zone references itself
Not possible with the manifest structure.
Circular internal zones
Not possible — each
_internalis strictly nested deeper.Dirty zone detection for internal zones
Need to check if the internal zone's files are dirty. The host zone being dirty doesn't mean the internal zone is dirty.
Future Considerations: Access Control
The design mentions that internal zones are "only readable from the zone that encloses them or their co-internal cousins." This access control could be enforced at:
This is deferred to a future phase.
Motivation
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.