Skip to content

feat(config): add Import.* for CID Profiles from IPIP-499#11148

Merged
lidel merged 30 commits intomasterfrom
feat/ipip-499-unixfs-2025
Feb 4, 2026
Merged

feat(config): add Import.* for CID Profiles from IPIP-499#11148
lidel merged 30 commits intomasterfrom
feat/ipip-499-unixfs-2025

Conversation

@lidel
Copy link
Member

@lidel lidel commented Jan 17, 2026

Implements IPIP-499: UnixFS CID Determinism

Depends on:

Closes #11071

Changes

CID Profiles

Apply a profile to pin down import settings for reproducible CIDs:

ipfs config profile apply unixfs-v1-2025

Available profiles:

  • unixfs-v1-2025: modern defaults (CIDv1, sha2-256, raw leaves, 1 MiB chunks)
  • unixfs-v0-2015 (alias: legacy-cid-v0): legacy CIDv0 behavior

Removes deprecated test-cid-v1 and test-cid-v1-wide profiles.

New Config Options

  • Import.UnixFSHAMTDirectorySizeEstimation: HAMT threshold mode (links, block, disabled)
  • Import.UnixFSDAGLayout: balanced (+ optional trickle) but in the future we could have others

MFS Improvements

ipfs files commands now respect Import.* config:

  • CID version and hash function
  • Chunker settings
  • HAMT sharding thresholds and fanout
  • Raw leaves

Fix: single-block files in CIDv1 directories now produce raw CIDs (matching ipfs add behavior)

New CLI Flags

  • --dereference-symlinks: resolve all symlinks to target content
  • --empty-dirs / -E: include empty directories
  • --hidden / -H: include hidden files

Deprecates --dereference-args (subsumed by --dereference-symlinks).

Tests

  • CID profile determinism at HAMT threshold boundaries
  • Balanced DAG layout verification
  • MFS config integration
  • Symlink handling

implements IPIP-499: add config options for controlling UnixFS DAG
determinism and introduces `unixfs-v1-2025` and `unixfs-v0-2015`
profiles for cross-implementation CID reproducibility.

changes:
- add Import.* fields: HAMTDirectorySizeEstimation, SymlinkMode,
  DAGLayout, IncludeEmptyDirectories, IncludeHidden
- add validation for all Import.* config values
- add unixfs-v1-2025 profile (recommended for new data)
- add unixfs-v0-2015 profile (alias: legacy-cid-v0)
- remove deprecated test-cid-v1 and test-cid-v1-wide profiles
- wire Import.HAMTSizeEstimationMode() to boxo globals
- update go.mod to use boxo with SizeEstimationMode support

ref: https://specs.ipfs.tech/ipips/ipip-0499/
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch 2 times, most recently from bf5578b to d79f7de Compare January 17, 2026 04:55
add CLI flags for controlling file collection behavior during ipfs add:

- `--dereference-symlinks`: recursively resolve symlinks to their target
  content (replaces deprecated --dereference-args which only worked on
  CLI arguments). wired through go-ipfs-cmds to boxo's SerialFileOptions.
- `--empty-dirs` / `-E`: include empty directories (default: true)
- `--hidden` / `-H`: include hidden files (default: false)

these flags are CLI-only and not wired to Import.* config options because
go-ipfs-cmds library handles input file filtering before the directory
tree is passed to kubo. removed unused Import.UnixFSSymlinkMode config
option that was defined but never actually read by the CLI.

also:
- wire --trickle to Import.UnixFSDAGLayout config default
- update go-ipfs-cmds to v0.15.1-0.20260117043932-17687e216294
- add SYMLINK HANDLING section to ipfs add help text
- add CLI tests for all three flags

ref: ipfs/specs#499
lidel added 2 commits January 19, 2026 06:13
add comprehensive test suite for UnixFS CID determinism per IPIP-499:
- verify exact HAMT threshold boundary for both estimation modes:
  - v0-2015 (links): sum(name_len + cid_len) == 262144
  - v1-2025 (block): serialized block size == 262144
- verify HAMT triggers at threshold + 1 byte for both profiles
- add all deterministic CIDs for cross-implementation testing

also wires SizeEstimationMode through CLI/API, allowing
Import.UnixFSHAMTSizeEstimation config to take effect.

bumps boxo to ipfs/boxo@6707376 which aligns HAMT threshold with
JS implementation (uses > instead of >=), fixing CID determinism
at the exact 256 KiB boundary.
Previously, resolving symlinks required two flags:
- --dereference-args: resolved symlinks passed as CLI arguments
- --dereference-symlinks: resolved symlinks inside directories

Now --dereference-symlinks handles both cases. Users only need one flag
to fully dereference symlinks when adding files to IPFS.

The deprecated --dereference-args still works for backwards compatibility
but is no longer necessary.
- update boxo to ebdaf07c (nil filter fix, thread-safety docs)
- simplify changelog for IPIP-499 section
- shorten test names, move context to comments
@lidel lidel marked this pull request as ready for review January 20, 2026 02:26
@lidel lidel requested a review from a team as a code owner January 20, 2026 02:26
@lidel
Copy link
Member Author

lidel commented Jan 20, 2026

I may add more tests or improve code, but its ready for initial review, to course correct-early.

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code looks good, and it looks like all tests cases are covered.

lidel and others added 3 commits January 22, 2026 01:25
Co-authored-by: Andrew Gillis <11790789+gammazero@users.noreply.github.com>
add test that confirms kubo uses balanced layout (all leaves at same
depth) rather than balanced-packed (varying depths). creates 45MiB file
to trigger multi-level DAG and walks it to verify leaf depth uniformity.

includes trickle subtest to validate test logic can detect varying depths.

supports CAR export via DAG_LAYOUT_CAR_OUTPUT env var for test vectors.
lidel added 5 commits January 27, 2026 21:51
switches to ipfs/boxo@6141039

changes since 5cf22196ad0b:
- refactor(unixfs): use arithmetic for exact block size calculation
- refactor(unixfs): unify size tracking and make SizeEstimationMode immutable
- feat(unixfs): optimize SizeEstimationBlock and add mode/mtime tests

also clarifies that directory sharding globals affect both `ipfs add` and MFS.
- add UnixFSDataType() helper to directly check UnixFS type via protobuf
- refactor threshold tests to use exact +1 byte calculations instead of +1 file
- verify directory type directly (ft.TDirectory vs ft.THAMTShard) instead of
  inferring from link count
- clean up helper function signatures by removing unused cidLength parameter
remove duplicate profile threshold tests from add_test.go since they
are fully covered by the data-driven tests in cid_profiles_test.go.

changes:
- improve test names to describe what threshold is being tested
- add inline documentation explaining each test's purpose
- add byte-precise helper IPFSAddDeterministicBytes for threshold tests
- remove ~200 lines of duplicated test code from add_test.go
- keep non-profile tests (pinning, symlinks, hidden files) in add_test.go
…s-2025

# Conflicts:
#	docs/examples/kubo-as-a-library/go.mod
#	docs/examples/kubo-as-a-library/go.sum
#	go.mod
#	go.sum
#	test/dependencies/go.mod
#	test/dependencies/go.sum
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch from 3e4059b to 800cba9 Compare January 28, 2026 00:17
@lidel lidel self-assigned this Jan 30, 2026
@lidel
Copy link
Member Author

lidel commented Jan 31, 2026

Triage note:

  • found MFS bug durig final review while writing missing tests, Import.* are not correctly respected by ipfs files (we had no tests for this)
  • will fix next week before starting 0.40 RC1

problem: `ipfs files write` in CIDv1 directories wrapped single-block
files in dag-pb even when raw-leaves was enabled, producing different
CIDs than `ipfs add --raw-leaves` for the same content.

fix: boxo now collapses single-block ProtoNode wrappers (with no
metadata) to RawNode in DagModifier.GetNode(). files with mtime/mode
stay as dag-pb since raw blocks cannot store UnixFS metadata.

also fixes sparse file writes where writing past EOF would lose data
because expandSparse didn't update the internal node pointer.

updates boxo to v0.36.1-0.20260203003133-7884ae23aaff
updates t0250-files-api.sh test hashes to match new behavior
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch from 340b0ad to c2d414f Compare February 3, 2026 00:44
@lidel lidel marked this pull request as ready for review February 3, 2026 15:32
lidel added 4 commits February 3, 2026 18:33
- fix typo in files write help text
- update boxo with CI fixes (gofumpt, race condition in test)
…s-2025

# Conflicts:
#	docs/examples/kubo-as-a-library/go.mod
#	docs/examples/kubo-as-a-library/go.sum
#	go.mod
#	go.sum
#	test/dependencies/go.mod
#	test/dependencies/go.sum
includes binary content types fix: gzip, zip, vnd.ipld.car, vnd.ipld.raw,
vnd.ipfs.ipns-record
includes refactor of maxLinks check in addLinkChild (review feedback).
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch from af565e3 to eca0b5d Compare February 3, 2026 19:21
skip '@helia/mfs - should have the same CID after creating a file' test
until helia implements IPIP-499 (tracking: ipfs/helia#941)

the test fails because kubo now collapses single-block files to raw CIDs
while helia explicitly uses reduceSingleLeafToSelf: false

changes:
- run aegir directly instead of helia-interop binary (binary ignores --grep flags)
- cache node_modules keyed by @helia/interop version from npm registry
- skip npm install on cache hit (matches ipfs-webui caching pattern)
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch from eca0b5d to a018d14 Compare February 3, 2026 20:11
- name: Install @helia/interop
if: steps.helia-cache.outputs.cache-hit != 'true'
run: npm install @helia/interop
# TODO(IPIP-499): Remove --grep --invert workaround once helia implements IPIP-499
Copy link
Member Author

@lidel lidel Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ @achingbrain fyi I'm skipping that one test for now. while I wrote this is blocked until IPIP-499 is in Helia, seems that helia already bent backwards to simulate buggy behavior from Kubo (reduceSingleLeafToSelf: false). So maybe once 0.40.0-rc1 is tagged helia could switch to updated kubo and flip the flag and things will pass again (without waiting for IPIP-499)?

@lidel
Copy link
Member Author

lidel commented Feb 3, 2026

@gammazero small changes since your last review:

All CI checks passing. If no concerns I will merge tomorrow to unblock RC1.

lidel added 3 commits February 4, 2026 05:52
…s-2025

# Conflicts:
#	docs/examples/kubo-as-a-library/go.mod
#	docs/examples/kubo-as-a-library/go.sum
#	go.mod
#	go.sum
#	test/dependencies/go.mod
#	test/dependencies/go.sum
includes latest upstream changes from boxo main
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch from 92e51ed to e69b33f Compare February 4, 2026 05:43
Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All additional changes look good.

switches to boxo@main after merging ipfs/boxo#1088
switches to go-ipfs-cmds@master after merging ipfs/go-ipfs-cmds#315
@lidel lidel force-pushed the feat/ipip-499-unixfs-2025 branch from 7055ac1 to 0284f7b Compare February 4, 2026 20:48
@lidel
Copy link
Member Author

lidel commented Feb 4, 2026

Thanks! Switched to boxo and cmds from their respective master branches.

Moving forward, we will test and gather feedback during 0.40 RC1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement modern CID profile from IPIP-499

2 participants