Skip to content

A CLI tool to analyze and extract data from CometBFT databases including `state.db`, `tx_index.db`, and more for inspection and debugging purposes.

Notifications You must be signed in to change notification settings

altuslabsxyz/cda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CometBFT DB Analyzer (CDA)

A CLI tool to analyze and extract data from CometBFT databases including state.db, tx_index.db, and more for inspection and debugging purposes.

Features

  • Multi-database support: Extract data from various CometBFT databases
    • state.db: Blockchain state, validators, consensus params, ABCI responses
    • tx_index.db: Transaction index data with height and event indexes
  • Streaming output: Memory-efficient streaming JSON output for large databases
  • Skip-full-dump mode: Generate summary-only analysis to save memory
  • Decode and analyze: Automatically decode protobuf data structures
  • Comprehensive statistics: Generate detailed reports about database contents
  • JSON export: Export data to human-readable JSON format
  • Size analysis: Identify largest keys and size distribution by type
  • Progress reporting: Real-time progress updates every 10,000 keys
  • Modular CLI: Easy to extend with new database extraction commands

Installation

go install github.com/altuslabsxyz/cda@latest

Or clone and build:

git clone https://github.com/altuslabsxyz/cda.git
cd cda
go build

Usage

cda <command> [flags]

Available Commands

  • extract state-db: Extract and analyze data from state.db
  • extract tx-index-db: Extract and analyze data from tx_index.db
  • help: Display help information
  • version: Display version information

Common Flags

  • -d, ---db: Path to the database directory (required)
  • -o, ---output: Output directory for JSON files (default varies by command)
  • --skip-full-dump: Skip full data dump to save memory (only generate summary)
  • -h, --help: Display help information

Examples

# Display help
cda --help
cda extract --help
cda extract state-db --help
cda extract tx-index-db --help

# Extract state.db from a CometBFT node
cda extract state-db --db ~/.cometbft/data

# Extract with custom output directory
cda extract state-db --db ~/.cometbft/data --output ./analysis_results

# Extract tx_index.db
cda extract tx-index-db --db ~/.cometbft/data

# Extract tx_index.db with custom output
cda extract tx-index-db --db ~/.cometbft/data --output ./tx_analysis

# Quick analysis - summary only (recommended for large production databases)
# Only generates summary.json (~10-200MB) instead of full dumps (100GB+)
cda extract state-db --db ~/.cometbft/data --skip-full-dump
cda extract tx-index-db --db ~/.cometbft/data --skip-full-dump

# Analyze from a custom chain
cda extract state-db --db ~/.mychain/data --output ./mychain_analysis

Output Files

State.db Extraction

The extract state-db command generates the following files in the output directory:

File Description Typical Size Notes
summary.json Overall statistics and key type distribution ~10-20MB Always generated, includes metadata
stateKey_dump.json Current blockchain state (block height, validators, consensus params, etc.) ~5KB Single entry with current state
validatorsKey_dump.json Validator set history at each height ~20-50MB Size depends on validator count and chain history
consensusParamsKey_dump.json Consensus parameters history ~50-150MB Records parameter changes over time
abciResponsesKey_dump.json ABCI responses for each block ~100-200GB+ ⚠️ Can be very large! Contains execution results for all blocks
lastABCIResponseKey_dump.json Most recent ABCI response ~4MB Latest block execution result
offlineStateSyncHeightKey_dump.json State sync height (if applicable) ~100B Only present if state sync was used
unknown_dump.json Any unrecognized keys (e.g., genesisDoc) ~30KB Rarely populated

⚠️ Storage Warning: The abciResponsesKey_dump.json file can be extremely large (100GB+) for chains with long history. Consider using --skip-full-dump to generate only the summary if disk space is limited.

Tx_index.db Extraction

The extract tx-index-db command generates the following files in the output directory:

File Description Typical Size Notes
summary.json Overall statistics, key type distribution, height range, and transaction count ~100-200KB Always generated, includes height range and tx count
tx_hash_dump.json Transaction results with execution details, events, gas usage ~100-200GB+ ⚠️ Can be very large! Contains full transaction data
height_index_dump.json Height-based transaction indexes ~50-150MB Maps heights to transaction hashes
event_index_dump.json Event-based transaction indexes for querying ~5-10GB Enables event-based transaction lookup

⚠️ Storage Warning: The tx_hash_dump.json file can be extremely large (100GB+) depending on transaction volume and chain history. Use --skip-full-dump if you only need summary statistics.

Console Output

The tool displays a summary on the console including:

  • Total number of keys
  • Key type distribution with sizes
  • Top 10 largest keys
  • Key breakdown by type

Example output:

=== State.db Analysis ===
Database path: /Users/user/.aultd/data
Output directory: ./analysis_results

=== SUMMARY ===

Total Keys: 9061

Key Type Distribution:
  abciResponsesKey         :   3018 keys | Total:   10556929 bytes | Avg:   3498.0 bytes
  validatorsKey            :   3020 keys | Total:       6166 bytes | Avg:      2.0 bytes
  consensusParamsKey       :   3019 keys | Total:     147804 bytes | Avg:     49.0 bytes
  stateKey                 :      1 keys | Total:        619 bytes | Avg:    619.0 bytes
  ...

Key Types

The analyzer recognizes the following key types from CometBFT's state store:

Key Type Description Storage Pattern
stateKey Current blockchain state Single entry
validatorsKey:{height} Validator set at specific height Per height/checkpoint
consensusParamsKey:{height} Consensus parameters at height Per height when changed
abciResponsesKey:{height} ABCI responses for block Per block (if not pruned)
lastABCIResponseKey Most recent ABCI response Single entry
offlineStateSyncHeightKey State sync height Single entry

Supported Versions

  • CometBFT v0.38.x
  • Database backend: GoLevelDB

Use Cases

  • Debug state database issues
  • Analyze database size and growth patterns
  • Inspect historical validator sets
  • Review consensus parameter changes
  • Extract block execution results
  • Identify data pruning candidates

Project Structure

cda/
├── main.go                      # Entry point
├── cmd/                         # CLI commands
│   ├── root.go                 # Root command
│   ├── extract.go              # Extract parent command
│   ├── extract_state_db.go     # State.db extraction command
│   └── extract_tx_index.go     # Tx_index.db extraction command
└── analyzer/                    # Analysis logic
    ├── types.go                # Common types and StreamWriter
    ├── state_db.go             # State.db analysis
    ├── tx_index.go             # Tx_index.db analysis
    └── output.go               # Output utilities

Adding New Database Extractors

To add support for a new database:

  1. Create analysis logic in analyzer/ with decode functions for the database key types
  2. Create a new command in cmd/ (e.g., extract_blockstore.go)
  3. The command will automatically be registered via the init() function
  4. Update the README with the new command usage

Example extractors already implemented:

  • analyzer/state_db.go - State database analysis
  • analyzer/tx_index.go - Transaction index database analysis

Notes

General

  • The tool opens the database in read-only mode
  • Make sure the CometBFT node is not running when analyzing the database
  • Uses streaming output to efficiently handle databases of any size
  • Progress is reported every 10,000 keys during extraction

Performance & Storage Considerations

  • Large databases may produce very large output files (100GB+)
  • Full extraction of production chains can take significant time (30+ minutes) and disk space
  • Consider using --skip-full-dump in these scenarios:
    • Limited disk space (outputs only summary.json, typically <200MB)
    • Quick database health checks
    • Key distribution analysis without full data export
    • Memory-constrained environments
  • Summary files alone provide comprehensive statistics including:
    • Total key counts by type
    • Size distribution and largest keys
    • Height ranges and transaction counts (for tx_index.db)
    • Database health metrics

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

A CLI tool to analyze and extract data from CometBFT databases including `state.db`, `tx_index.db`, and more for inspection and debugging purposes.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published