Skip to content

Commit a9a020c

Browse files
committed
Initial commit
0 parents  commit a9a020c

File tree

553 files changed

+94852
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

553 files changed

+94852
-0
lines changed
Lines changed: 347 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,347 @@
1+
# ELF PT_NOTE VirtAddr Architecture Decision
2+
3+
**Date:** 2026-01-15
4+
**Status:** Implemented
5+
**Decision:** Split ELF PT_NOTE injection into two approaches based on use case
6+
7+
## Context
8+
9+
When injecting PT_NOTE segments into ELF binaries, there are two fundamentally different reading mechanisms that require different approaches:
10+
11+
1. **File-based reading** - Decompression stub reads notes from file offset
12+
2. **Memory-based reading** - Node.js SEA uses `dl_iterate_phdr()` to find notes in memory
13+
14+
These different mechanisms require different VirtAddr settings in the PT_NOTE segment.
15+
16+
## Decision
17+
18+
### Approach 1: Raw Notes with VirtAddr=0 (SMOL Stubs)
19+
20+
**Use Case:** Binary compression with SMOL stubs (binpress, smol_repack)
21+
22+
**Method:** Raw binary manipulation (`write_with_raw_notes()`)
23+
24+
**VirtAddr:** 0 (non-loadable)
25+
26+
**Why:**
27+
- Static glibc binaries require PHT at original offset (typically 64)
28+
- LIEF's `binary->write()` restructures the entire binary, moving PHT
29+
- Moving PHT causes SIGSEGV in static glibc (reads from `base + e_phoff`)
30+
- Decompression stub reads notes from file offset, not memory address
31+
- VirtAddr=0 means kernel won't map segment into memory (saves address space)
32+
33+
**Implementation:**
34+
```cpp
35+
// packages/binpress/src/stub_elf_compress_lief.cpp
36+
elf_note_utils::write_with_raw_note(stub_path, output_path, note_name, note_data);
37+
38+
// packages/bin-infra/src/stub_smol_repack_lief.cpp
39+
elf_note_utils::write_with_raw_note(stub_path, output_path, note_name, note_data);
40+
```
41+
42+
**Critical Constraint:** PHT must stay at original offset for static glibc compatibility
43+
44+
### Approach 2: LIEF Notes with Proper VirtAddr (Node.js SEA)
45+
46+
**Use Case:** Node.js Single Executable Application (binject --sea, --vfs)
47+
48+
**Method:** LIEF high-level Note API (`write_with_notes()`)
49+
50+
**VirtAddr:** ≠ 0 (loadable, assigned by LIEF)
51+
52+
**Why:**
53+
- Node.js uses `dl_iterate_phdr()` to discover resources at runtime
54+
- `dl_iterate_phdr()` only reports segments mapped into memory
55+
- VirtAddr=0 segments are not mapped, thus invisible to `dl_iterate_phdr()`
56+
- Node.js expects notes discoverable via postject's mechanism
57+
- Dynamic binaries can tolerate PHT relocation (dynamic linker handles it)
58+
59+
**Implementation:**
60+
```cpp
61+
// packages/binject/src/elf_inject_lief.cpp
62+
elf_note_utils::replace_or_add(binary.get(), section_name, note_data);
63+
elf_note_utils::write_with_notes(binary.get(), tmpfile);
64+
```
65+
66+
**Critical Requirement:** Notes must be mapped into memory for `dl_iterate_phdr()` discovery
67+
68+
## Comparison
69+
70+
| Aspect | SMOL Stubs (VirtAddr=0) | Node.js SEA (VirtAddr≠0) |
71+
|--------|-------------------------|--------------------------|
72+
| **Primary Use** | Binary compression | JavaScript SEA execution |
73+
| **Target Binaries** | Static glibc stubs | Dynamic Node.js binaries |
74+
| **Reading Method** | File I/O (offset-based) | Memory mapping (`dl_iterate_phdr()`) |
75+
| **VirtAddr** | 0 (non-loadable) | ≠ 0 (loadable) |
76+
| **PHT Preservation** | Critical (must stay at offset 64) | Not critical (dynamic linker) |
77+
| **Implementation** | Raw binary manipulation | LIEF high-level API |
78+
| **Mapped to Memory** | No | Yes |
79+
| **LIEF Restructuring** | Avoided (breaks static glibc) | Acceptable (dynamic linking) |
80+
81+
## Technical Details
82+
83+
### Why PHT Relocation Breaks Static Glibc
84+
85+
Static glibc binaries have hardcoded assumptions:
86+
1. PHT is at offset 64 (ELF64 header size)
87+
2. Code reads from `base + e_phoff` in memory
88+
3. If PHT moves, `e_phoff` points to wrong memory location
89+
4. PLT/GOT resolution fails → SIGSEGV
90+
91+
LIEF's `binary->write()`:
92+
- Creates new PT_LOAD segments for added content
93+
- Reorganizes segment layout
94+
- Moves PHT to accommodate new segments
95+
- Updates `e_phoff` in header
96+
- But static code already has old offset hardcoded
97+
98+
### Why VirtAddr=0 Works for SMOL
99+
100+
Decompression stub flow:
101+
1. Open compressed binary as file
102+
2. Read ELF header, find PHT at `e_phoff`
103+
3. Iterate PT_NOTE entries in PHT
104+
4. Match note name (e.g., "pressed_data")
105+
5. Read from file at `p_offset` (NOT `p_vaddr`)
106+
6. Decompress and execute
107+
108+
Key insight: File I/O uses `p_offset`, not `p_vaddr`
109+
110+
### Why VirtAddr≠0 Required for SEA
111+
112+
Node.js resource discovery:
113+
1. Call `dl_iterate_phdr()` at startup
114+
2. Kernel returns segments mapped into memory
115+
3. VirtAddr=0 segments skipped (not mapped)
116+
4. Search returned segments for PT_NOTE
117+
5. Match note name (e.g., "NODE_SEA_BLOB")
118+
6. Read from memory at `p_vaddr + load_address`
119+
120+
Key insight: `dl_iterate_phdr()` only reports mapped segments (VirtAddr≠0)
121+
122+
## Implementation Files
123+
124+
### SMOL Stub Implementation (VirtAddr=0)
125+
126+
**Compression:**
127+
- `packages/binpress/src/stub_elf_compress_lief.cpp` - Initial compression
128+
- `packages/bin-infra/src/stub_smol_repack_lief.cpp` - Repack/update operations
129+
130+
**Core Utilities:**
131+
- `packages/bin-infra/src/elf_note_utils.hpp::write_with_raw_notes()` - Raw injection
132+
- `packages/bin-infra/src/elf_note_utils.hpp::write_with_raw_note()` - Single note wrapper
133+
134+
### Node.js SEA Implementation (VirtAddr≠0)
135+
136+
**Injection:**
137+
- `packages/binject/src/elf_inject_lief.cpp::binject_elf_lief()` - Single injection
138+
- `packages/binject/src/elf_inject_lief.cpp::binject_elf_lief_batch()` - Batch injection
139+
140+
**Core Utilities:**
141+
- `packages/bin-infra/src/elf_note_utils.hpp::write_with_notes()` - LIEF with config
142+
- `packages/bin-infra/src/elf_note_utils.hpp::replace_or_add()` - Note management
143+
- `packages/bin-infra/src/elf_note_utils.hpp::create_and_add()` - Note creation
144+
145+
## Verification
146+
147+
### SMOL Stub Verification
148+
149+
Static glibc binary should:
150+
1. ✅ Run without SIGSEGV
151+
2. ✅ PHT at original offset (typically 64)
152+
3. ✅ PT_NOTE with VirtAddr=0
153+
4. ✅ Decompression stub finds note by name
154+
5. ✅ Extraction succeeds
155+
156+
```bash
157+
# Check PHT offset
158+
readelf -h compressed-binary | grep "Start of program headers"
159+
# Should show: 64 (bytes into file)
160+
161+
# Check PT_NOTE VirtAddr
162+
readelf -l compressed-binary | grep -A 5 "NOTE"
163+
# Should show: VirtAddr 0x0000000000000000
164+
```
165+
166+
### Node.js SEA Verification
167+
168+
Dynamic Node.js binary should:
169+
1. ✅ Run without SIGSEGV
170+
2. ✅ PT_NOTE with VirtAddr≠0
171+
3. ✅ NODE_SEA_FUSE flipped from :0 to :1
172+
4.`dl_iterate_phdr()` discovers note
173+
5. ✅ SEA JavaScript executes
174+
175+
```bash
176+
# Check PT_NOTE VirtAddr
177+
readelf -l node-sea-binary | grep -A 5 "NOTE"
178+
# Should show: VirtAddr 0xNNNNNNNNNNNNNNNN (non-zero)
179+
180+
# Check fuse state
181+
strings node-sea-binary | grep NODE_SEA_FUSE
182+
# Should show: NODE_SEA_FUSE_fce680ab2cc467b6e072b8b5df1996b2:1
183+
```
184+
185+
## Consequences
186+
187+
### Positive
188+
189+
1. **Correct Behavior:** Each use case gets the appropriate implementation
190+
2. **Static Glibc Safety:** PHT preservation prevents segfaults
191+
3. **Node.js Compatibility:** Proper VirtAddr enables resource discovery
192+
4. **Clear Separation:** Code explicitly documents which approach to use
193+
5. **Maintainable:** Easy to understand why two approaches exist
194+
195+
### Negative
196+
197+
1. **Complexity:** Two code paths to maintain instead of one
198+
2. **Documentation:** Requires clear explanation of when to use each
199+
3. **Testing:** Must test both approaches independently
200+
201+
### Mitigations
202+
203+
- Comprehensive documentation (this file)
204+
- Clear comments in code explaining rationale
205+
- Shared utilities where possible (note format, deduplication)
206+
- Integration tests for both use cases (see Testing section below)
207+
208+
## Testing
209+
210+
### SMOL Stub Tests (VirtAddr=0)
211+
212+
**Unit Test: PT_NOTE Replacement**
213+
- **Location:** `packages/binpress/test/elf-ptnote-repack.test.mjs`
214+
- **Purpose:** Validates PT_NOTE segment replacement (not appending) during binary repacking
215+
- **What it tests:**
216+
- PT_NOTE segments are properly replaced in update mode
217+
- Section names follow correct format (`.note.pressed_data`)
218+
- Multiple sequential updates don't accumulate PT_NOTE segments
219+
- Binary structure remains valid after repacking
220+
- Compressed binaries remain executable
221+
- **Platform:** Linux only (ELF native platform)
222+
- **Run with:** `pnpm test` in `packages/binpress`
223+
224+
**Integration Test: Compression Round-Trip**
225+
- **Location:** `packages/binpress/test/compression-roundtrip.test.mjs`
226+
- **Purpose:** End-to-end validation of compression/decompression workflow
227+
- **What it tests:**
228+
- Compress binary with binpress (uses `write_with_raw_note()`)
229+
- Execute compressed binary (decompression stub reads PT_NOTE from file offset)
230+
- Verify decompressed binary matches original functionality
231+
- Validate LZFSE compression metadata and magic markers
232+
- Test multiple compression cycles and large binaries
233+
- **Platform:** All platforms (Linux, macOS, Windows)
234+
- **Run with:** `pnpm test` in `packages/binpress`
235+
- **Critical validation:** Compressed binaries execute without SIGSEGV, proving VirtAddr=0 notes work correctly with file-based reading
236+
237+
### SEA Tests (VirtAddr≠0)
238+
239+
**Regression Test: write_with_notes() PT_NOTE Handling**
240+
- **Location:** `packages/bin-infra/test/test-write-with-notes.sh`
241+
- **Purpose:** Prevent regression of the notes=false bug (commit 271e9c5a)
242+
- **What it tests:**
243+
- PT_NOTE segments properly preserved in both writes (double-write pattern)
244+
- ALLOC flags correctly removed from sections with VirtAddr=0
245+
- Produced binaries execute without SIGSEGV (exit code 139)
246+
- **Platform:** Linux (full validation with readelf), macOS (execution test only)
247+
- **Run with:** `pnpm test` in `packages/bin-infra` or `bash test/test-write-with-notes.sh`
248+
- **Historical context:** This test catches the bug where `notes=false` in second write corrupted the Program Header Table
249+
250+
**Integration Test: LIEF Section Injection**
251+
- **Location:** `packages/binject/test/test-lief-integration.sh`
252+
- **Purpose:** Validates LIEF can inject multiple sections into the same segment
253+
- **What it tests:**
254+
- Single section injection with LIEF
255+
- Multiple section injection into same segment
256+
- Data integrity after injection
257+
- Segment structure correctness
258+
- **Platform:** macOS only (uses otool for verification)
259+
- **Run with:** `pnpm test` in `packages/binject`
260+
- **Note:** Tests macOS-specific Mach-O injection; Linux ELF injection tested via SEA execution in firewall E2E tests
261+
262+
### End-to-End Tests
263+
264+
**Firewall Integration Tests**
265+
- **Location:** `packages/firewall/test/*.integration.test.ts`
266+
- **Purpose:** Real-world validation of Node.js SEA binaries with VirtAddr≠0 notes
267+
- **What it tests:**
268+
- Binaries built with binject execute Node.js code from embedded resources
269+
- `dl_iterate_phdr()` discovers PT_NOTE segments in memory
270+
- NODE_SEA_FUSE properly flipped from :0 to :1
271+
- SEA resources accessible at runtime
272+
- **Platform:** All platforms (Linux, macOS, Windows)
273+
- **Run with:** `pnpm test` in `packages/firewall`
274+
- **Critical validation:** If PT_NOTE segments have VirtAddr=0, these tests would fail because `dl_iterate_phdr()` wouldn't find the notes
275+
276+
### Test Coverage Summary
277+
278+
| Code Path | What's Tested | Test Location | Platforms |
279+
|-----------|---------------|---------------|-----------|
280+
| **write_with_raw_note()** | VirtAddr=0 for SMOL stubs | binpress compression-roundtrip | All |
281+
| **write_with_raw_note()** | PT_NOTE replacement in repack | binpress elf-ptnote-repack | Linux |
282+
| **write_with_notes()** | VirtAddr≠0 for SEA | bin-infra test-write-with-notes | Linux, macOS |
283+
| **write_with_notes()** | Double-write pattern | bin-infra test-write-with-notes | Linux, macOS |
284+
| **binject SEA** | End-to-end SEA execution | firewall integration tests | All |
285+
| **LIEF injection** | Multi-section support | binject test-lief-integration | macOS |
286+
287+
### Running All Tests
288+
289+
```bash
290+
# SMOL stub tests (VirtAddr=0)
291+
cd packages/binpress
292+
pnpm test
293+
294+
# SEA regression test (VirtAddr≠0)
295+
cd packages/bin-infra
296+
pnpm test
297+
298+
# LIEF integration test (macOS only)
299+
cd packages/binject
300+
pnpm test
301+
302+
# End-to-end firewall tests
303+
cd packages/firewall
304+
pnpm test
305+
```
306+
307+
## Future Considerations
308+
309+
### If Supporting Big-Endian Architectures
310+
311+
Would need to:
312+
1. Byte-swap all header reads (`phoff`, `phnum`, etc.)
313+
2. Byte-swap all note structure fields
314+
3. Add endianness detection and conversion utilities
315+
4. Test on actual big-endian hardware (PowerPC, s390x)
316+
317+
Currently rejected because:
318+
- Target platforms (x86-64, ARM64) are all little-endian
319+
- Byte-swapping adds complexity and overhead
320+
- Big-endian usage is rare and declining
321+
322+
### If Supporting 32-bit ELF
323+
324+
Would need to:
325+
1. Duplicate all pointer arithmetic for 32-bit offsets
326+
2. Handle both ELF32 and ELF64 header layouts
327+
3. Adjust size calculations (4-byte vs 8-byte fields)
328+
4. Test on 32-bit systems
329+
330+
Currently rejected because:
331+
- Modern systems are 64-bit
332+
- 32-bit Node.js not supported
333+
- Static glibc stubs built as 64-bit
334+
335+
## References
336+
337+
- **ELF Specification:** [System V ABI, Chapter 5](https://refspecs.linuxfoundation.org/elf/elf.pdf)
338+
- **postject Implementation:** [nodejs/postject on GitHub](https://github.com/nodejs/postject)
339+
- **Static Glibc Issue:** `.claude/fix-elf-ptnote-virtaddr.md`
340+
- **dl_iterate_phdr Manual:** `man 3 dl_iterate_phdr`
341+
- **Original Plan:** `.claude/elf-section-vs-note-plan.md`
342+
343+
## Related Documents
344+
345+
- `.claude/elf-ptnote-fix.md` - Initial PT_NOTE implementation
346+
- `.claude/fix-elf-ptnote-virtaddr.md` - PHT relocation problems
347+
- `.claude/elf-section-vs-note-plan.md` - Section vs Note comparison

0 commit comments

Comments
 (0)