|
| 1 | +# ELF PT_NOTE VirtAddr Architecture Decision |
| 2 | + |
| 3 | +**Date:** 2026-01-15 |
| 4 | +**Status:** Implemented |
| 5 | +**Decision:** Split ELF PT_NOTE injection into two approaches based on use case |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +When injecting PT_NOTE segments into ELF binaries, there are two fundamentally different reading mechanisms that require different approaches: |
| 10 | + |
| 11 | +1. **File-based reading** - Decompression stub reads notes from file offset |
| 12 | +2. **Memory-based reading** - Node.js SEA uses `dl_iterate_phdr()` to find notes in memory |
| 13 | + |
| 14 | +These different mechanisms require different VirtAddr settings in the PT_NOTE segment. |
| 15 | + |
| 16 | +## Decision |
| 17 | + |
| 18 | +### Approach 1: Raw Notes with VirtAddr=0 (SMOL Stubs) |
| 19 | + |
| 20 | +**Use Case:** Binary compression with SMOL stubs (binpress, smol_repack) |
| 21 | + |
| 22 | +**Method:** Raw binary manipulation (`write_with_raw_notes()`) |
| 23 | + |
| 24 | +**VirtAddr:** 0 (non-loadable) |
| 25 | + |
| 26 | +**Why:** |
| 27 | +- Static glibc binaries require PHT at original offset (typically 64) |
| 28 | +- LIEF's `binary->write()` restructures the entire binary, moving PHT |
| 29 | +- Moving PHT causes SIGSEGV in static glibc (reads from `base + e_phoff`) |
| 30 | +- Decompression stub reads notes from file offset, not memory address |
| 31 | +- VirtAddr=0 means kernel won't map segment into memory (saves address space) |
| 32 | + |
| 33 | +**Implementation:** |
| 34 | +```cpp |
| 35 | +// packages/binpress/src/stub_elf_compress_lief.cpp |
| 36 | +elf_note_utils::write_with_raw_note(stub_path, output_path, note_name, note_data); |
| 37 | + |
| 38 | +// packages/bin-infra/src/stub_smol_repack_lief.cpp |
| 39 | +elf_note_utils::write_with_raw_note(stub_path, output_path, note_name, note_data); |
| 40 | +``` |
| 41 | +
|
| 42 | +**Critical Constraint:** PHT must stay at original offset for static glibc compatibility |
| 43 | +
|
| 44 | +### Approach 2: LIEF Notes with Proper VirtAddr (Node.js SEA) |
| 45 | +
|
| 46 | +**Use Case:** Node.js Single Executable Application (binject --sea, --vfs) |
| 47 | +
|
| 48 | +**Method:** LIEF high-level Note API (`write_with_notes()`) |
| 49 | +
|
| 50 | +**VirtAddr:** ≠ 0 (loadable, assigned by LIEF) |
| 51 | +
|
| 52 | +**Why:** |
| 53 | +- Node.js uses `dl_iterate_phdr()` to discover resources at runtime |
| 54 | +- `dl_iterate_phdr()` only reports segments mapped into memory |
| 55 | +- VirtAddr=0 segments are not mapped, thus invisible to `dl_iterate_phdr()` |
| 56 | +- Node.js expects notes discoverable via postject's mechanism |
| 57 | +- Dynamic binaries can tolerate PHT relocation (dynamic linker handles it) |
| 58 | +
|
| 59 | +**Implementation:** |
| 60 | +```cpp |
| 61 | +// packages/binject/src/elf_inject_lief.cpp |
| 62 | +elf_note_utils::replace_or_add(binary.get(), section_name, note_data); |
| 63 | +elf_note_utils::write_with_notes(binary.get(), tmpfile); |
| 64 | +``` |
| 65 | + |
| 66 | +**Critical Requirement:** Notes must be mapped into memory for `dl_iterate_phdr()` discovery |
| 67 | + |
| 68 | +## Comparison |
| 69 | + |
| 70 | +| Aspect | SMOL Stubs (VirtAddr=0) | Node.js SEA (VirtAddr≠0) | |
| 71 | +|--------|-------------------------|--------------------------| |
| 72 | +| **Primary Use** | Binary compression | JavaScript SEA execution | |
| 73 | +| **Target Binaries** | Static glibc stubs | Dynamic Node.js binaries | |
| 74 | +| **Reading Method** | File I/O (offset-based) | Memory mapping (`dl_iterate_phdr()`) | |
| 75 | +| **VirtAddr** | 0 (non-loadable) | ≠ 0 (loadable) | |
| 76 | +| **PHT Preservation** | Critical (must stay at offset 64) | Not critical (dynamic linker) | |
| 77 | +| **Implementation** | Raw binary manipulation | LIEF high-level API | |
| 78 | +| **Mapped to Memory** | No | Yes | |
| 79 | +| **LIEF Restructuring** | Avoided (breaks static glibc) | Acceptable (dynamic linking) | |
| 80 | + |
| 81 | +## Technical Details |
| 82 | + |
| 83 | +### Why PHT Relocation Breaks Static Glibc |
| 84 | + |
| 85 | +Static glibc binaries have hardcoded assumptions: |
| 86 | +1. PHT is at offset 64 (ELF64 header size) |
| 87 | +2. Code reads from `base + e_phoff` in memory |
| 88 | +3. If PHT moves, `e_phoff` points to wrong memory location |
| 89 | +4. PLT/GOT resolution fails → SIGSEGV |
| 90 | + |
| 91 | +LIEF's `binary->write()`: |
| 92 | +- Creates new PT_LOAD segments for added content |
| 93 | +- Reorganizes segment layout |
| 94 | +- Moves PHT to accommodate new segments |
| 95 | +- Updates `e_phoff` in header |
| 96 | +- But static code already has old offset hardcoded |
| 97 | + |
| 98 | +### Why VirtAddr=0 Works for SMOL |
| 99 | + |
| 100 | +Decompression stub flow: |
| 101 | +1. Open compressed binary as file |
| 102 | +2. Read ELF header, find PHT at `e_phoff` |
| 103 | +3. Iterate PT_NOTE entries in PHT |
| 104 | +4. Match note name (e.g., "pressed_data") |
| 105 | +5. Read from file at `p_offset` (NOT `p_vaddr`) |
| 106 | +6. Decompress and execute |
| 107 | + |
| 108 | +Key insight: File I/O uses `p_offset`, not `p_vaddr` |
| 109 | + |
| 110 | +### Why VirtAddr≠0 Required for SEA |
| 111 | + |
| 112 | +Node.js resource discovery: |
| 113 | +1. Call `dl_iterate_phdr()` at startup |
| 114 | +2. Kernel returns segments mapped into memory |
| 115 | +3. VirtAddr=0 segments skipped (not mapped) |
| 116 | +4. Search returned segments for PT_NOTE |
| 117 | +5. Match note name (e.g., "NODE_SEA_BLOB") |
| 118 | +6. Read from memory at `p_vaddr + load_address` |
| 119 | + |
| 120 | +Key insight: `dl_iterate_phdr()` only reports mapped segments (VirtAddr≠0) |
| 121 | + |
| 122 | +## Implementation Files |
| 123 | + |
| 124 | +### SMOL Stub Implementation (VirtAddr=0) |
| 125 | + |
| 126 | +**Compression:** |
| 127 | +- `packages/binpress/src/stub_elf_compress_lief.cpp` - Initial compression |
| 128 | +- `packages/bin-infra/src/stub_smol_repack_lief.cpp` - Repack/update operations |
| 129 | + |
| 130 | +**Core Utilities:** |
| 131 | +- `packages/bin-infra/src/elf_note_utils.hpp::write_with_raw_notes()` - Raw injection |
| 132 | +- `packages/bin-infra/src/elf_note_utils.hpp::write_with_raw_note()` - Single note wrapper |
| 133 | + |
| 134 | +### Node.js SEA Implementation (VirtAddr≠0) |
| 135 | + |
| 136 | +**Injection:** |
| 137 | +- `packages/binject/src/elf_inject_lief.cpp::binject_elf_lief()` - Single injection |
| 138 | +- `packages/binject/src/elf_inject_lief.cpp::binject_elf_lief_batch()` - Batch injection |
| 139 | + |
| 140 | +**Core Utilities:** |
| 141 | +- `packages/bin-infra/src/elf_note_utils.hpp::write_with_notes()` - LIEF with config |
| 142 | +- `packages/bin-infra/src/elf_note_utils.hpp::replace_or_add()` - Note management |
| 143 | +- `packages/bin-infra/src/elf_note_utils.hpp::create_and_add()` - Note creation |
| 144 | + |
| 145 | +## Verification |
| 146 | + |
| 147 | +### SMOL Stub Verification |
| 148 | + |
| 149 | +Static glibc binary should: |
| 150 | +1. ✅ Run without SIGSEGV |
| 151 | +2. ✅ PHT at original offset (typically 64) |
| 152 | +3. ✅ PT_NOTE with VirtAddr=0 |
| 153 | +4. ✅ Decompression stub finds note by name |
| 154 | +5. ✅ Extraction succeeds |
| 155 | + |
| 156 | +```bash |
| 157 | +# Check PHT offset |
| 158 | +readelf -h compressed-binary | grep "Start of program headers" |
| 159 | +# Should show: 64 (bytes into file) |
| 160 | + |
| 161 | +# Check PT_NOTE VirtAddr |
| 162 | +readelf -l compressed-binary | grep -A 5 "NOTE" |
| 163 | +# Should show: VirtAddr 0x0000000000000000 |
| 164 | +``` |
| 165 | + |
| 166 | +### Node.js SEA Verification |
| 167 | + |
| 168 | +Dynamic Node.js binary should: |
| 169 | +1. ✅ Run without SIGSEGV |
| 170 | +2. ✅ PT_NOTE with VirtAddr≠0 |
| 171 | +3. ✅ NODE_SEA_FUSE flipped from :0 to :1 |
| 172 | +4. ✅ `dl_iterate_phdr()` discovers note |
| 173 | +5. ✅ SEA JavaScript executes |
| 174 | + |
| 175 | +```bash |
| 176 | +# Check PT_NOTE VirtAddr |
| 177 | +readelf -l node-sea-binary | grep -A 5 "NOTE" |
| 178 | +# Should show: VirtAddr 0xNNNNNNNNNNNNNNNN (non-zero) |
| 179 | + |
| 180 | +# Check fuse state |
| 181 | +strings node-sea-binary | grep NODE_SEA_FUSE |
| 182 | +# Should show: NODE_SEA_FUSE_fce680ab2cc467b6e072b8b5df1996b2:1 |
| 183 | +``` |
| 184 | + |
| 185 | +## Consequences |
| 186 | + |
| 187 | +### Positive |
| 188 | + |
| 189 | +1. **Correct Behavior:** Each use case gets the appropriate implementation |
| 190 | +2. **Static Glibc Safety:** PHT preservation prevents segfaults |
| 191 | +3. **Node.js Compatibility:** Proper VirtAddr enables resource discovery |
| 192 | +4. **Clear Separation:** Code explicitly documents which approach to use |
| 193 | +5. **Maintainable:** Easy to understand why two approaches exist |
| 194 | + |
| 195 | +### Negative |
| 196 | + |
| 197 | +1. **Complexity:** Two code paths to maintain instead of one |
| 198 | +2. **Documentation:** Requires clear explanation of when to use each |
| 199 | +3. **Testing:** Must test both approaches independently |
| 200 | + |
| 201 | +### Mitigations |
| 202 | + |
| 203 | +- Comprehensive documentation (this file) |
| 204 | +- Clear comments in code explaining rationale |
| 205 | +- Shared utilities where possible (note format, deduplication) |
| 206 | +- Integration tests for both use cases (see Testing section below) |
| 207 | + |
| 208 | +## Testing |
| 209 | + |
| 210 | +### SMOL Stub Tests (VirtAddr=0) |
| 211 | + |
| 212 | +**Unit Test: PT_NOTE Replacement** |
| 213 | +- **Location:** `packages/binpress/test/elf-ptnote-repack.test.mjs` |
| 214 | +- **Purpose:** Validates PT_NOTE segment replacement (not appending) during binary repacking |
| 215 | +- **What it tests:** |
| 216 | + - PT_NOTE segments are properly replaced in update mode |
| 217 | + - Section names follow correct format (`.note.pressed_data`) |
| 218 | + - Multiple sequential updates don't accumulate PT_NOTE segments |
| 219 | + - Binary structure remains valid after repacking |
| 220 | + - Compressed binaries remain executable |
| 221 | +- **Platform:** Linux only (ELF native platform) |
| 222 | +- **Run with:** `pnpm test` in `packages/binpress` |
| 223 | + |
| 224 | +**Integration Test: Compression Round-Trip** |
| 225 | +- **Location:** `packages/binpress/test/compression-roundtrip.test.mjs` |
| 226 | +- **Purpose:** End-to-end validation of compression/decompression workflow |
| 227 | +- **What it tests:** |
| 228 | + - Compress binary with binpress (uses `write_with_raw_note()`) |
| 229 | + - Execute compressed binary (decompression stub reads PT_NOTE from file offset) |
| 230 | + - Verify decompressed binary matches original functionality |
| 231 | + - Validate LZFSE compression metadata and magic markers |
| 232 | + - Test multiple compression cycles and large binaries |
| 233 | +- **Platform:** All platforms (Linux, macOS, Windows) |
| 234 | +- **Run with:** `pnpm test` in `packages/binpress` |
| 235 | +- **Critical validation:** Compressed binaries execute without SIGSEGV, proving VirtAddr=0 notes work correctly with file-based reading |
| 236 | + |
| 237 | +### SEA Tests (VirtAddr≠0) |
| 238 | + |
| 239 | +**Regression Test: write_with_notes() PT_NOTE Handling** |
| 240 | +- **Location:** `packages/bin-infra/test/test-write-with-notes.sh` |
| 241 | +- **Purpose:** Prevent regression of the notes=false bug (commit 271e9c5a) |
| 242 | +- **What it tests:** |
| 243 | + - PT_NOTE segments properly preserved in both writes (double-write pattern) |
| 244 | + - ALLOC flags correctly removed from sections with VirtAddr=0 |
| 245 | + - Produced binaries execute without SIGSEGV (exit code 139) |
| 246 | +- **Platform:** Linux (full validation with readelf), macOS (execution test only) |
| 247 | +- **Run with:** `pnpm test` in `packages/bin-infra` or `bash test/test-write-with-notes.sh` |
| 248 | +- **Historical context:** This test catches the bug where `notes=false` in second write corrupted the Program Header Table |
| 249 | + |
| 250 | +**Integration Test: LIEF Section Injection** |
| 251 | +- **Location:** `packages/binject/test/test-lief-integration.sh` |
| 252 | +- **Purpose:** Validates LIEF can inject multiple sections into the same segment |
| 253 | +- **What it tests:** |
| 254 | + - Single section injection with LIEF |
| 255 | + - Multiple section injection into same segment |
| 256 | + - Data integrity after injection |
| 257 | + - Segment structure correctness |
| 258 | +- **Platform:** macOS only (uses otool for verification) |
| 259 | +- **Run with:** `pnpm test` in `packages/binject` |
| 260 | +- **Note:** Tests macOS-specific Mach-O injection; Linux ELF injection tested via SEA execution in firewall E2E tests |
| 261 | + |
| 262 | +### End-to-End Tests |
| 263 | + |
| 264 | +**Firewall Integration Tests** |
| 265 | +- **Location:** `packages/firewall/test/*.integration.test.ts` |
| 266 | +- **Purpose:** Real-world validation of Node.js SEA binaries with VirtAddr≠0 notes |
| 267 | +- **What it tests:** |
| 268 | + - Binaries built with binject execute Node.js code from embedded resources |
| 269 | + - `dl_iterate_phdr()` discovers PT_NOTE segments in memory |
| 270 | + - NODE_SEA_FUSE properly flipped from :0 to :1 |
| 271 | + - SEA resources accessible at runtime |
| 272 | +- **Platform:** All platforms (Linux, macOS, Windows) |
| 273 | +- **Run with:** `pnpm test` in `packages/firewall` |
| 274 | +- **Critical validation:** If PT_NOTE segments have VirtAddr=0, these tests would fail because `dl_iterate_phdr()` wouldn't find the notes |
| 275 | + |
| 276 | +### Test Coverage Summary |
| 277 | + |
| 278 | +| Code Path | What's Tested | Test Location | Platforms | |
| 279 | +|-----------|---------------|---------------|-----------| |
| 280 | +| **write_with_raw_note()** | VirtAddr=0 for SMOL stubs | binpress compression-roundtrip | All | |
| 281 | +| **write_with_raw_note()** | PT_NOTE replacement in repack | binpress elf-ptnote-repack | Linux | |
| 282 | +| **write_with_notes()** | VirtAddr≠0 for SEA | bin-infra test-write-with-notes | Linux, macOS | |
| 283 | +| **write_with_notes()** | Double-write pattern | bin-infra test-write-with-notes | Linux, macOS | |
| 284 | +| **binject SEA** | End-to-end SEA execution | firewall integration tests | All | |
| 285 | +| **LIEF injection** | Multi-section support | binject test-lief-integration | macOS | |
| 286 | + |
| 287 | +### Running All Tests |
| 288 | + |
| 289 | +```bash |
| 290 | +# SMOL stub tests (VirtAddr=0) |
| 291 | +cd packages/binpress |
| 292 | +pnpm test |
| 293 | + |
| 294 | +# SEA regression test (VirtAddr≠0) |
| 295 | +cd packages/bin-infra |
| 296 | +pnpm test |
| 297 | + |
| 298 | +# LIEF integration test (macOS only) |
| 299 | +cd packages/binject |
| 300 | +pnpm test |
| 301 | + |
| 302 | +# End-to-end firewall tests |
| 303 | +cd packages/firewall |
| 304 | +pnpm test |
| 305 | +``` |
| 306 | + |
| 307 | +## Future Considerations |
| 308 | + |
| 309 | +### If Supporting Big-Endian Architectures |
| 310 | + |
| 311 | +Would need to: |
| 312 | +1. Byte-swap all header reads (`phoff`, `phnum`, etc.) |
| 313 | +2. Byte-swap all note structure fields |
| 314 | +3. Add endianness detection and conversion utilities |
| 315 | +4. Test on actual big-endian hardware (PowerPC, s390x) |
| 316 | + |
| 317 | +Currently rejected because: |
| 318 | +- Target platforms (x86-64, ARM64) are all little-endian |
| 319 | +- Byte-swapping adds complexity and overhead |
| 320 | +- Big-endian usage is rare and declining |
| 321 | + |
| 322 | +### If Supporting 32-bit ELF |
| 323 | + |
| 324 | +Would need to: |
| 325 | +1. Duplicate all pointer arithmetic for 32-bit offsets |
| 326 | +2. Handle both ELF32 and ELF64 header layouts |
| 327 | +3. Adjust size calculations (4-byte vs 8-byte fields) |
| 328 | +4. Test on 32-bit systems |
| 329 | + |
| 330 | +Currently rejected because: |
| 331 | +- Modern systems are 64-bit |
| 332 | +- 32-bit Node.js not supported |
| 333 | +- Static glibc stubs built as 64-bit |
| 334 | + |
| 335 | +## References |
| 336 | + |
| 337 | +- **ELF Specification:** [System V ABI, Chapter 5](https://refspecs.linuxfoundation.org/elf/elf.pdf) |
| 338 | +- **postject Implementation:** [nodejs/postject on GitHub](https://github.com/nodejs/postject) |
| 339 | +- **Static Glibc Issue:** `.claude/fix-elf-ptnote-virtaddr.md` |
| 340 | +- **dl_iterate_phdr Manual:** `man 3 dl_iterate_phdr` |
| 341 | +- **Original Plan:** `.claude/elf-section-vs-note-plan.md` |
| 342 | + |
| 343 | +## Related Documents |
| 344 | + |
| 345 | +- `.claude/elf-ptnote-fix.md` - Initial PT_NOTE implementation |
| 346 | +- `.claude/fix-elf-ptnote-virtaddr.md` - PHT relocation problems |
| 347 | +- `.claude/elf-section-vs-note-plan.md` - Section vs Note comparison |
0 commit comments