Skip to content

state-prune snapshot resume can drop nodes #3943

@jolestar

Description

@jolestar

Summary

Resume path of rooch db state-prune snapshot can drop child nodes when a run is interrupted, causing final integrity check failures (missing child node).

Impact

Snapshots built via resume may be unusable; integrity check fails even when source DB is healthy.

Root causes

  • Progress is persisted every 5 minutes; newly enqueued children can be lost if the process dies before save.
  • Resume trusts snapshot_progress.json for worklist and nodes_written without reconciling with snapshot.db contents.
  • nodes_written restored from file masks missing nodes; crash after pushing children but before write can leave parent present and child absent.

Repro (high level)

  1. Run rooch db state-prune snapshot (default resume enabled).
  2. Interrupt between progress saves (e.g., kill process after some batches).
  3. Resume; run completes but final integrity check reports missing child node.

Proposed fix (MVP)

  1. On resume, recompute nodes_written from snapshot.db (actual count) and prefer DB over progress file; warn on divergence.
  2. Make frontier durable: persist worklist/batch_buffer much more frequently (seconds) or log transactionally before batch writes.
  3. Safe resume: optionally rebuild worklist by scanning snapshot.db from root (enqueue parents with missing children) or force restart when progress is stale.
  4. Progress hygiene: if progress file is older/shorter than DB, delete or ignore to avoid partial frontier.

Acceptance

  • Kill-and-resume cycles no longer produce missing-child errors.
  • Integrity check passes after resumed runs; logged node count matches RocksDB actual.
  • --no-resume behavior unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions