diff --git a/README.md b/README.md index 67c75e4..a90b2cf 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,23 @@ -# pdp-explorer +# PDP Explorer + Data model and UI for exploring the PDP hot storage network -# Mockup +## Documentation + +Detailed documentation is available in the following file: + +- [Documentation](docs/README.md) - System architecture, database schema, and development guide + +## Mockup This is a first draft at what the PDP explorer will look like ![pdpexplorer](https://github.com/user-attachments/assets/e0595422-fa77-490b-ab57-0c9516ea5d8a) # Usage -A few user journeys: -As a user storing data with PDP I can use the explorer to: -* Check if my SP has had any faults. And I can check which data in particular was faulted -* Validate that all of the data added to my proofset is data that I asked to store, not anything else -* Look at fault rate of SPs in the network when deciding who to store my data with -* Learn about data that has been removed from my proofset - +A few user journeys: +As a user storing data with PDP I can use the explorer to: + +- Check if my SP has had any faults. And I can check which data in particular was faulted +- Validate that all of the data added to my proofset is data that I asked to store, not anything else +- Look at fault rate of SPs in the network when deciding who to store my data with +- Learn about data that has been removed from my proofset diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..c81d0b1 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,172 @@ +# PDP Explorer Backend Architecture + +## Table of Contents +- [Overview](#overview) +- [System Components](#system-components) +- [Database Schema](#database-schema) +- [Architecture Diagram](#architecture-diagram) +- [Integration Points](#integration-points) +- [Data Flow](#data-flow) +- [Configuration](#configuration) +- [Security Considerations](#security-considerations) +- [Maintenance and Operations](#maintenance-and-operations) +- [Future Considerations](#future-considerations) + +## Overview + +The PDP Explorer backend is composed of two main components: + +1. **Indexer Service**: Responsible for processing blockchain events and maintaining the database state +2. **API Server**: Provides REST endpoints for the frontend to query indexed data + +## System Components + +### 1. Indexer Service + +The indexer is responsible for: + +- Processing blockchain events in real-time +- Maintaining database consistency during chain reorganizations +- Managing historical data for providers and proof sets + +#### Key Components: + +- **Block Processor**: Handles block-by-block processing of chain events +- **Event Handlers**: Process specific event types (ProofSetCreated, RootsAdded, etc.) + +### 2. API Server + +Provides RESTful endpoints for the frontend application. + +#### Available Endpoints: + +- `GET /providers`: List all providers with pagination +- `GET /providers/:providerId`: Get detailed provider information +- `GET /providers/:providerId/proofsets`: List all proof sets for a provider with sorting and pagination +- `GET /providers/:providerId/activities`: Get activity statistics for a provider +- `GET /proofsets`: List all proof sets with sorting and pagination +- `GET /proofsets/:proofSetId`: Get detailed proof set information +- `GET /proofsets/:proofSetId/event-logs`: Get proof set event logs +- `GET /proofsets/:proofSetId/txs`: Get proof set transactions +- `GET /proofsets/:proofSetId/roots`: Get proof set roots +- `GET /network-metrics`: Get network-wide metrics +- `GET /search`: Search for proof sets and providers + +## Database Schema + +The system uses PostgreSQL with the following key tables: + +### Core Tables + +1. **blocks** + + - Tracks processed blocks and their finalization status + - Used for handling chain reorganizations + +2. **providers** + + - Stores provider information with version control + - Tracks historical changes to provider data + +3. **proof_sets** + + - Maintains proof set metadata and status + - Includes version control for ownership changes + +4. **roots** + + - Stores root data associated with proof sets + - Maintains historical versions for chain reorgs + +5. **transactions** + + - Records all relevant blockchain transactions + - Links to related entities (proof sets, providers) + +6. **event_logs** + + - Stores blockchain events and their metadata + - Used for real-time processing and indexing + +7. **proofs** + + - Stores individual proof submissions + - Used for analytics and provider activity tracking + +8. **proof_fees** + + - Records fee information for proof submissions + - Used for economic analysis + +9. **fault_records** + - Tracks provider fault events + - Used for provider reliability metrics + +## Architecture Diagram + +![System Architecture Diagram](./assets/pdp-arch.png) + +## Integration Points + +### Blockchain Integration + +- Connects to Filecoin Eth RPC endpoint for blockchain data +- Handles chain reorganizations gracefully +- Recovery process for historical data + +### Frontend Integration + +- RESTful API with JSON responses +- Pagination support for large datasets +- Sorting and filtering capabilities +- Real-time data consistency + +## Data Flow + +1. **Event Processing Flow** + + ``` + Blockchain Event → Filecoin Eth RPC Endpoints → Indexer → Processor → Handlers → Database + ``` + +2. **Query Flow** + + ``` + Frontend Request → API Server → Database → JSON Response + ``` + +3. **Chain Reorganization Flow** + ``` + Reorg Detection → Block Reversal → Historical Data Cleanup → State Recovery + ``` + +## Configuration + +The system is configured through environment variables: + +- Database connection settings +- Filecoin Eth RPC endpoint configuration +- Other runtime settings + +## Security Considerations + +- Database connections use SSL/TLS +- API endpoints follow REST best practices +- Environment variables for sensitive configuration +- No direct blockchain write access from API server + +## Maintenance and Operations + +### Database Maintenance + +- Regular cleanup of historical data +- Index optimization for query performance +- Monitoring of database size and growth + +## Future Considerations + +- **Feature Additions** + - Event processing pipeline optimization + - Use Filecoin Method Group instead of Eth + - Advanced analytics endpoints + - Historical data API diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md new file mode 100644 index 0000000..45e4f12 --- /dev/null +++ b/docs/DEVELOPMENT.md @@ -0,0 +1,135 @@ +# Development Guide + +## Table of Contents +- [Overview](#overview) +- [Development Environment Setup](#development-environment-setup) +- [Development Workflow](#development-workflow) +- [Code Structure](#code-structure) +- [Additional Resources](#additional-resources) + +## Overview + +This document provides guidelines and instructions for developers working on the PDP Explorer backend. It covers setup procedures, development workflows, and best practices to ensure consistent and high-quality contributions. + +## Development Environment Setup + +### Prerequisites + +- Go 1.19 or higher +- PostgreSQL 14 or higher +- Docker and Docker Compose (optional, for containerized development) +- Make + +### Initial Setup + +1. **Clone the repository**: + + ```bash + git clone https://github.com/FilOzone/pdp-explorer.git + cd pdp-explorer + ``` + +2. **Set up environment variables**: + Create a `.env` file with following variables at: + + - `backend/indexer/.env` + + ``` + # Database + DATABASE_URL=postgresql://localhost:5432/pdp + + # RPC provider + LOTUS_API_ENDPOINT=https://api.calibration.node.glif.io/rpc/v0 + LOTUS_API_KEY= # Your API key (from https://api.node.glif.io) + + # Trigger config + TRIGGERS_CONFIG=./config/pdp.yaml + START_BLOCK= # Start block number + ``` + + - `backend/server/.env` + + ``` + # Database + DATABASE_URL=postgresql://localhost:5432/pdp + + # Server Port + PORT=3000 + ``` + + - `client/.env` + ``` + VITE_SERVER_URL=http://localhost:3000 + VITE_NETWORK=calibration + ``` + +3. **Initialize the database**: + + ```bash + cd backend/indexer + make migrate-up + ``` + +## Development Workflow + +### Running the Indexer + +```bash +# From root of the repository +cd backend/indexer + +# Run the indexer in development mode +make dev +``` + +### Running the API Server + +```bash +# From root of the repository +cd backend/server + +# Run the API server in development mode +make dev +``` + +### Running the Frontend + +```bash +# From root of the repository +cd client + +# Run the frontend in development mode +npm run dev +``` + +## Code Structure + +### Indexer Directory Layout + +``` +backend/indexer/ +├── cmd/ +│ └── indexer/ # Indexer application entrypoint +├── config/ # Configuration files +├── internal/ +│ ├── client/ # RPC client library +│ ├── contract/ # Contract related code +│ ├── indexer/ # Blockchain indexing logic +│ ├── infrastructure/ # Infrastructure layer +│ │ ├── config/ # Infrastructure configuration +│ │ └── database/ # Database access layer +│ ├── logger/ # Logging package +│ ├── models/ # Data models +│ ├── processor/ # Transactions and events processor +│ │ └── handlers/ # Event and transaction handlers +│ └── types/ # Common types +├── migrations/ # Database migrations +└── scripts/ # Utility sql scripts +``` + +## Additional Resources + +- [Go Documentation](https://golang.org/doc/) +- [Echo Framework Documentation](https://echo.labstack.com/) +- [PostgreSQL Documentation](https://www.postgresql.org/docs/) +- [Filecoin Documentation](https://docs.filecoin.io/) diff --git a/docs/INTEGRATION.md b/docs/INTEGRATION.md new file mode 100644 index 0000000..7fc7f8e --- /dev/null +++ b/docs/INTEGRATION.md @@ -0,0 +1,184 @@ +# Integration Architecture + +## Overview + +The PDP Explorer backend integrates with multiple systems to provide a comprehensive view of PDP data. This document details the key integration points between the indexer, API server, blockchain, and frontend, as well as the data flow between these components. + +## Integration Points + +### 1. Blockchain Integration + +#### Connection Details + +The indexer connects to the Filecoin blockchain through the following methods: + +- **Primary Connection**: Filecoin Eth RPC endpoint +- **Protocol**: JSON-RPC over HTTPS +- **Configuration**: Environment variables + +``` +LOTUS_API_ENDPOINT=https://api.node.glif.io/rpc/v1 +LOTUS_API_KEY=your-api-key +``` + +#### Data Retrieval + +The indexer retrieves blockchain data through several API calls: + +1. **Chain Height**: + + ``` + eth_blockNumber + ``` + +2. **Block Headers**: + + ``` + eth_getBlockByNumber(blockNumber, false) + ``` + +3. **Block with Transactions**: + + ``` + eth_getBlockByNumber(blockNumber, true) + ``` + +4. **Transaction Receipts**: + + ``` + eth_getTransactionReceipt(transactionHash) + ``` + +5. **Message Cid**: + + ``` + eth_getMessageCidByTransactionHash(transactionHash) + ``` + +#### Reorg Detection + +The system detects chain reorganizations by comparing parent hashes: + +``` +currentBlock.parentHash != storedBlock.hash +``` + +When a reorg is detected, the system traverses the chain backwards to find the fork point and reprocesses blocks from that point. + +Read more about reorg handling in PDP Explorer [here](./REORG_HANDLING.md). + +### 2. Database Integration + +The backend uses PostgreSQL for data persistence: + +- **Connection**: Connection pooling for efficient resource utilization +- **Migration**: Version-controlled schema migrations +- **Transactions**: ACID transactions for data consistency during reorgs + +### 3. API Server Integration + +The API server provides RESTful endpoints for the frontend: + +- **Framework**: Gin web framework +- **Routing**: Path-based routing with parameter extraction +- **Response Format**: JSON with consistent envelope pattern + +#### API Routes + +Core routes are defined in the OpenAPI specification [here](./server/openapi.yaml). + +### 4. Frontend Integration + +The API server integrates with the frontend through: + +- **CORS**: Cross-Origin Resource Sharing enabled for frontend domains +- **Pagination**: Consistent limit/offset parameters +- **Filtering**: Query parameters for data filtering +- **Sorting**: Support for multiple sort criteria + +## Data Flow Architecture + +### Complete Data Flow + +``` + ┌────────────┐ + │ │ + │ Blockchain │ + │ │ + └─────┬──────┘ + │ + ▼ +┌──────────────────────────────────────────┐ +│ │ +│ ┌──────────┐ ┌─────────┐ ┌───────┐ │ +│ │ │ │ │ │ │ │ +│ │ Indexer ├───►│Processor├───►│Handler│ │ +│ │ │ │ │ │ │ │ +│ └──────────┘ └─────────┘ └───┬───┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────┴─┐ +│ │ │ +│ │ Database │ +│ │ │ +│ └─────┬───────┘ +│ │ +│ ▼ +│ ┌───────────┐ +│ │ │ +│ │ API Server│ +│ │ │ +│ └─────┬─────┘ +└──────────────────────────────────┬─┘ + │ + ▼ + ┌──────────┐ + │ │ + │ Frontend │ + │ │ + └──────────┘ +``` + +### Indexer Data Flow + +``` +┌───────────┐ ┌───────────┐ ┌───────────┐ +│ │ │ │ │ │ +│ Block Data├───►│Transaction├───►│Transaction│ +│ │ │ Handler │ │ Data │ +└─────┬─────┘ └───────────┘ └───────────┘ + │ + │ ┌───────────┐ ┌───────────┐ + │ │ │ │ │ + └─────────►│Event Log ├───►│ Event │ + │ Handler │ │ Data │ + └───────────┘ └───────────┘ +``` + +## API Server Data Flow + +``` +┌────────────┐ ┌────────────┐ ┌────────────┐ +│ │ │ │ │ │ +│HTTP Request├───►│ Controller ├───►│ Service │ +│ │ │ │ │ │ +└────────────┘ └────────────┘ └─────┬──────┘ + │ + ▼ + ┌────────────┐ + │ │ + │ Repository │ + │ │ + └─────┬──────┘ + │ + ▼ + ┌────────────┐ + │ │ + │ Database │ + │ │ + └────────────┘ +``` + +## Future Integration Considerations + +1. **GraphQL API**: More flexible querying for complex data relationships diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..6e5da21 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,55 @@ +# PDP Explorer Documentation + +## Overview + +This directory contains comprehensive documentation for the PDP Explorer backend system. The documentation is organized by component and covers the architecture, database schema, integration points, and development guidelines. + +## Document Structure + +### Indexer Documentation + +The indexer is responsible for processing blockchain events and maintaining the database state. + +- [**Architecture**](indexer/ARCHITECTURE.md): Overall system architecture and component design +- [**Database Schema**](indexer/DATABASE.md): Database schema design, tables, relationships, and optimizations +- [**Processor**](indexer/PROCESSOR.md): Details of the processor component that routes blockchain data to handlers +- [**Reorg Handling**](indexer/REORG_HANDLING.md): Chain reorganization detection and handling +- [**Integration**](indexer/INTEGRATION.md): Integration points with blockchain, database, and frontend +- [**Development Guide**](indexer/DEVELOPMENT.md): Setup and workflow for developers + +### API Server Documentation + +The API server provides REST endpoints for the frontend to query indexed data. + +- [**OpenAPI Specification**](server/openapi.yaml): Full API specification in OpenAPI 3.0 format + +## Quick Start + +For new developers, we recommend reading the documents in the following order: + +1. [Architecture Overview](indexer/ARCHITECTURE.md) - Start with the high-level architecture +2. [Development Guide](indexer/DEVELOPMENT.md) - Set up your development environment +3. [Database Schema](indexer/DATABASE.md) - Understand the data model +4. [Processor Architecture](indexer/PROCESSOR.md) - Learn about how events are processed +5. [API Specification](server/openapi.yaml) - Explore the available API endpoints + +## Diagrams + +Architectural diagrams are located in the `indexer/assets` directory: + +- `pdp-arch.png` - Overall system architecture diagram + +## Contributing to Documentation + +When contributing to the documentation: + +1. Maintain consistent formatting and style +2. Update diagrams when architecture changes +3. Keep code examples up-to-date with the codebase +4. Add new documents for significant new features + +## Related Resources + +- [Project Repository](https://github.com/FilOzone/pdp-explorer) +- [Issue Tracker](https://github.com/FilOzone/pdp-explorer/issues) +- [Filecoin Documentation](https://docs.filecoin.io/) diff --git a/docs/assets/pdp-arch.png b/docs/assets/pdp-arch.png new file mode 100644 index 0000000..3676368 Binary files /dev/null and b/docs/assets/pdp-arch.png differ diff --git a/docs/db/DATABASE.md b/docs/db/DATABASE.md new file mode 100644 index 0000000..ab495cf --- /dev/null +++ b/docs/db/DATABASE.md @@ -0,0 +1,286 @@ +# Database Architecture + +## Table of Contents +- [Overview](#overview) +- [Schema Design](#schema-design) +- [Performance Improvements](#2-performance-improvements) +- [Database Versioning](#3-database-versioning) +- [Database Management](#database-management) + +## Overview + +The PDP Explorer uses PostgreSQL as its primary database, with a schema designed for high performance, data integrity, and support for chain reorganizations. This document details the database structure, relationships, and recent optimizations. + +## Schema Design + +### Core Tables + +#### 1. `blocks` + +Tracks processed blocks and their finalization status. + +| Column | Type | Description | +| ---------------- | -------------------------- | ----------------------------------------------- | +| height | BIGINT | Block height (primary key) | +| hash | TEXT | Block hash | +| parent_hash | TEXT | Parent block hash | +| timestamp | BIGINT | Block timestamp | +| is_processed | BOOLEAN | Whether block is processed (default: false) | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- Index on `height` +- Index on `is_processed` + +#### 2. `providers` + +Stores provider information with version control. + +| Column | Type | Description | +| --------------------- | ------------------------ | --------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| address | TEXT | Provider address (normalized) | +| total_faulted_periods | BIGINT | Total number of faulted periods (default: 0) | +| total_data_size | TEXT | Total size of data | +| proof_set_ids | BIGINT[] | Array of proof set IDs (default: '{}') | +| block_number | BIGINT | Block where this record was created/updated | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | +| updated_at | TIMESTAMP WITH TIME ZONE | Last update timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_providers_address`: Index on `address` +- `idx_providers_block_number`: Index on `block_number` +- Unique constraint on `(address, block_number)` + +#### 3. `proof_sets` + +Maintains proof set metadata and status. + +| Column | Type | Description | +| --------------------- | ------------------------ | --------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| set_id | BIGINT | On-chain proof set ID | +| owner | TEXT | Current owner address | +| listener_addr | TEXT | Listener address | +| total_faulted_periods | BIGINT | Total number of faulted periods (default: 0) | +| total_data_size | TEXT | Total size of data | +| total_roots | BIGINT | Total number of roots (default: 0) | +| total_proved_roots | BIGINT | Total number of proved roots (default: 0) | +| total_fee_paid | TEXT | Total fee paid | +| last_proven_epoch | BIGINT | Last proven epoch (default: 0) | +| next_challenge_epoch | BIGINT | Next challenge epoch (default: 0) | +| is_active | BOOLEAN | Whether proof set is active (default: true) | +| block_number | BIGINT | Block where this record was created/updated | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | +| updated_at | TIMESTAMP WITH TIME ZONE | Last update timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_proof_sets_set_id`: Index on `set_id` +- `idx_proof_sets_set_owner`: Index on `owner` +- `idx_proof_sets_block_number`: Index on `block_number` +- Unique constraint on `(set_id, block_number)` + +#### 4. `roots` + +Stores root data associated with proof sets. + +| Column | Type | Description | +| --------------------- | ------------------------ | -------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| root_id | BIGINT | On-chain root ID | +| size | BIGINT | Size of the root data | +| cid | TEXT | Content identifier | +| ipfs_url | TEXT | IPFS URL for this root | +| provider_id | BIGINT | Associated provider ID | +| set_id | BIGINT | Associated proof set ID | +| proved | BOOLEAN | Whether the root has been proved | +| epoch | BIGINT | Epoch number | +| is_fresh | BOOLEAN | Whether the root is fresh | +| block_number | BIGINT | Block where this root was added | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | +| updated_at | TIMESTAMP WITH TIME ZONE | Last update timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_roots_root_id`: Index on `root_id` +- `idx_roots_set_id`: Index on `set_id` +- `idx_roots_provider_id`: Index on `provider_id` +- `idx_roots_block_number`: Index on `block_number` +- Unique constraint on `(root_id, block_number)` + +#### 5. `transactions` + +Records all relevant blockchain transactions. + +| Column | Type | Description | +| --------------- | ------------------------ | --------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| tx_hash | TEXT | Transaction hash | +| proof_set_id | BIGINT | Associated proof set ID | +| from_address | TEXT | Sender address | +| to_address | TEXT | Recipient address | +| value | TEXT | Transaction value | +| gas_spent | BIGINT | Gas spent in the transaction | +| gas_price | TEXT | Gas price in wei | +| gas_limit | BIGINT | Gas limit | +| status | BOOLEAN | Status of the transaction (success/failure) | +| chain_id | BIGINT | Chain ID | +| nonce | BIGINT | Transaction nonce | +| block_number | BIGINT | Block number where transaction was included | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_transactions_tx_hash`: Index on `tx_hash` for quick lookups +- `idx_transactions_proof_set_id`: Index on `proof_set_id` +- `idx_transactions_from_address`: Index on `from_address` +- `idx_transactions_to_address`: Index on `to_address` +- `idx_transactions_block_number`: Index on `block_number` for reorg handling + +#### 6. `event_logs` + +Stores blockchain events and their metadata. + +| Column | Type | Description | +| ---------------- | ------------------------ | ------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| tx_hash | TEXT | Associated transaction hash | +| block_number | BIGINT | Block number where event occurred | +| block_hash | TEXT | Hash of the block | +| log_index | BIGINT | Index of log within the block | +| contract_address | TEXT | Contract address emitting the event | +| event_index | INTEGER | Index of the event | +| event_type | TEXT | Type of event | +| data | JSONB | Non-indexed event data stored as JSONB | +| entity_id | BIGINT | Associated entity ID | +| is_processed | BOOLEAN | Whether the event has been processed | +| is_deletion | BOOLEAN | Whether this is a deletion event for reorgs | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_event_logs_tx_hash`: Index on `tx_hash` +- `idx_event_logs_block_number`: Index on `block_number` for reorg handling +- `idx_event_logs_contract_address`: Index on `contract_address` +- `idx_event_logs_event_type`: Index on `event_type` +- `idx_event_logs_entity_id`: Index on `entity_id` +- Index on `(tx_hash, log_index)` with uniqueness constraint + +#### 7. `proofs` + +Stores individual proof submissions. + +| Column | Type | Description | +| ---------------- | ------------------------ | -------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| proof_id | BIGINT | On-chain proof ID | +| offset | BIGINT | Proof offset | +| merkle_proof | TEXT | Merkle proof data | +| root_id | BIGINT | Associated root ID | +| provider_id | BIGINT | Associated provider ID | +| set_id | BIGINT | Associated proof set ID | +| tx_hash | TEXT | Transaction hash of the proof submission | +| block_number | BIGINT | Block number where proof was submitted | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | +| updated_at | TIMESTAMP WITH TIME ZONE | Last update timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_proofs_proof_id`: Index on `proof_id` +- `idx_proofs_set_id`: Index on `set_id` +- `idx_proofs_root_id`: Index on `root_id` +- `idx_proofs_provider_id`: Index on `provider_id` +- `idx_proofs_block_number`: Index on `block_number` for reorg handling +- Unique constraint on `(proof_id, block_number)` + +#### 8. `proof_fees` + +Stores proof fee information. + +| Column | Type | Description | +| ---------------- | ------------------------ | -------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| fee_id | BIGINT | On-chain fee ID | +| fee | TEXT | Proof fee amount | +| block_number | BIGINT | Block number where fee was recorded | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_proof_fees_fee_id`: Index on `fee_id` +- `idx_proof_fees_block_number`: Index on `block_number` for reorg handling +- Unique constraint on `(fee_id, block_number)` + +#### 9. `fault_records` + +Stores provider fault information. + +| Column | Type | Description | +| --------------------- | ------------------------ | -------------------------------------------------- | +| id | BIGSERIAL | Internal ID (primary key) | +| fault_id | BIGINT | On-chain fault ID | +| provider_id | BIGINT | Associated provider ID | +| provider_address | TEXT | Provider address | +| set_id | BIGINT | Associated proof set ID | +| period | INTEGER | Fault period | +| faulted_epoch | BIGINT | Epoch when fault occurred | +| fault_type | TEXT | Type of fault | +| block_number | BIGINT | Block number where fault was recorded | +| block_hash | TEXT | Hash of the block | +| created_at | TIMESTAMP WITH TIME ZONE | Creation timestamp (default: CURRENT_TIMESTAMP) | +| updated_at | TIMESTAMP WITH TIME ZONE | Last update timestamp (default: CURRENT_TIMESTAMP) | + +**Indices:** + +- `idx_fault_records_fault_id`: Index on `fault_id` +- `idx_fault_records_provider_id`: Index on `provider_id` +- `idx_fault_records_provider_address`: Index on `provider_address` +- `idx_fault_records_set_id`: Index on `set_id` +- `idx_fault_records_block_number`: Index on `block_number` for reorg handling +- Unique constraint on `(fault_id, block_number)` + +### 2. Performance Improvements + +1. **Added Composite Indices** + + - Added composite indices for frequently joined queries + - Improved query performance for filtered searches + +2. **Optimized Join Patterns** + + - Structured queries to leverage existing indices + - Reduced table scan operations + +3. **Query Optimization** + - Implemented pagination for large result sets + - Added proper sorting indices + +### 3. Database Versioning + +Implemented block-based versioning for all tables that require reorg support: + +```sql +-- Example of retrieval pattern +SELECT * FROM proof_sets +WHERE proof_set_id = ? AND block_number <= ? +ORDER BY block_number DESC LIMIT 1; +``` + +## Database Management + +### Migrations + +Database schema changes are managed through migration files located in: + +``` +backend/indexer/migrations/ +``` diff --git a/docs/indexer/PROCESSOR.md b/docs/indexer/PROCESSOR.md new file mode 100644 index 0000000..a008304 --- /dev/null +++ b/docs/indexer/PROCESSOR.md @@ -0,0 +1,133 @@ +# Processor Architecture + +## Overview + +The Processor is a crucial component that sits between the Indexer and Handlers, responsible for efficiently processing blockchain data and routing it to appropriate handlers. It implements parallel processing of transactions and logs while maintaining data consistency and error handling. + +## Data Flow + +``` +Indexer -> Processor -> Handlers + | | | + | | └─ Process specific events/txs + | └─ Block data processing & routing + └─ Block data (txs & logs) +``` + +## Processing Pipeline + +### 1. Block Data Reception + +- Receives `Transactions` from Indexer containing: + - Transactions from the block + - Event logs from the block for each transaction + +### 2. Processing + +The processor processes transactions and logs in queue: + +- **Transaction Processing** + + - Each transaction processed in separate goroutine + - Worker pool controls concurrent execution + - Context cancellation handled gracefully + +- **Log Processing** + - Each log processed in separate goroutine + - Associated transaction data available via txMap + - Same worker pool mechanism as transactions + +### 3. Signature Matching + +#### Transaction Matching + +``` +Contract Call: +┌─────────────────┐ +│ Contract Address│ ──┐ +├─────────────────┤ │ ┌─────────────┐ +│ Function Selector│ ──┴──► │ Match Found │ ──► Call Handler +└─────────────────┘ └─────────────┘ +``` + +1. Extracts function selector from transaction input +2. Matches against configured contract addresses +3. Compares function selector with generated signatures +4. Routes to registered handler if match found + +#### Log Matching + +``` +Event Log: +┌─────────────────┐ +│ Contract Address│ ──┐ +├─────────────────┤ │ ┌─────────────┐ +│ Topic[0] │ ──┴──► │ Match Found │ ──► Call Handler +└─────────────────┘ └─────────────┘ +``` + +1. Uses contract address and first topic (event signature) +2. Matches against configured event definitions +3. Routes to registered handler if match found + +### 4. Handler Execution + +#### Transaction Handlers + +- Receive transaction data only +- Access to: + - Transaction hash + - Input data + - From/To addresses + - Value + - Block information + +#### Log Handlers + +- Receive both log and associated transaction +- Access to: + - Event data (topics and data) + - Transaction context + - Block information + +## Error Handling + +1. **Error Collection** + + - Uses error channel to collect errors from handlers + - Channel sized to match maximum possible errors + +2. **Error Aggregation** + - Collects all errors from processing + - Returns combined error if any failures + - Continues processing despite individual failures + +## Configuration + +The processor is configured via a config file that defines: + +1. **Contract Definitions** + + - Addresse to monitor + - Function definitions to track + - Event definitions to capture + +2. **Handler Mappings** + - handler name → Handler + +## Example Configuration + +```yaml +Resources: + - Name: "PDPVerifier" + Address: "0x123..." + Triggers: + - Type: "event" + Definition: "ProofSetCreated(uint256 indexed setId, address indexed owner)" + Handler: "ProofSetCreatedHandler" + - Type: "function" + Definition: "proposeProofSetOwner(uint256 setId, address newOwner)" + Handler: "TransactionHandler" + +This configuration would generate appropriate signatures and route matching transactions and logs to their respective handlers. +``` diff --git a/docs/indexer/REORG_HANDLING.md b/docs/indexer/REORG_HANDLING.md new file mode 100644 index 0000000..cc43b5e --- /dev/null +++ b/docs/indexer/REORG_HANDLING.md @@ -0,0 +1,143 @@ +# Chain Reorganization (Reorg) Handling + +## Table of Contents +- [Overview](#overview) +- [Reorg Detection](#reorg-detection) +- [Data Management During Reorg](#data-management-during-reorg) +- [Reorg Processing Steps](#reorg-processing-steps) +- [Concurrency Control](#concurrency-control) +- [Error Handling](#error-handling) +- [Example Scenario](#example-scenario) +- [Best Practices](#best-practices) + +## Overview + +Chain reorganization (reorg) occurs when a chain switches to a different fork, causing previously processed blocks to become invalid. The PDP Explorer implements a robust reorg detection and handling mechanism to maintain data consistency with the canonical chain. + +## Reorg Detection + +The system detects reorgs through the following process: + +1. **Parent Hash Verification** + + - For each new block, verify if its parent hash matches the hash of the stored parent block + - If mismatch detected, initiate reorg detection process + +2. **Reorg Depth Calculation** + - System traverses backwards through blocks until finding a common ancestor (fork point) + - Maximum reorg depth is limited to 1000 blocks for safety + - Null epochs are handled gracefully during traversal + +## Data Management During Reorg + +### Block-Number Based Versioning + +The system uses block numbers as version markers for data consistency: + +1. **Immutable Records** + + - Each state change is recorded with its corresponding block number + - Historical records are preserved for audit trails + +2. **Update Strategy** + For tables with updatable data (proof_sets, roots, providers): + + a. **Finding Latest State**: + + - When updating a record (e.g., proof_set), first query the latest version by block number + - Example: `SELECT * FROM proof_sets WHERE proof_id = 'xyz' ORDER BY block_number DESC LIMIT 1` + + b. **Update Decision**: + + - If latest record's block_number matches current processing block: + - Update the existing row (same block updates) + - Example: Multiple updates in block 100 modify same row + - If block numbers differ: + - Insert new row with current block number + - Example: Update at block 120 for data last modified in block 100 + + c. **Reorg Safety**: + + - This versioning strategy ensures clean reorgs + - During reorg, can safely delete all rows where block_number >= fork_point + - Previous versions remain intact for blocks before fork point + + Example: + + ```sql + -- Initial state at block 100 + INSERT INTO proof_sets (proof_id, data, block_number) VALUES ('xyz', 'initial', 100); + + -- Update at block 100 (same block) + UPDATE proof_sets + SET data = 'updated' + WHERE proof_id = 'xyz' AND block_number = 100; + + -- Update at block 120 (different block) + INSERT INTO proof_sets (proof_id, data, block_number) + VALUES ('xyz', 'new_data', 120); + + -- During reorg at block 110 + DELETE FROM proof_sets WHERE block_number >= 110; + -- Record from block 100 remains intact + ``` + +### Reorg Processing Steps + +1. **Initialization** + + - Lock reorg processing to prevent concurrent reorgs + - Verify no overlapping reorgs are in progress + - Create context with 10-minute timeout + +2. **Data Cleanup** + + - Begin atomic transaction + - Delete all data from fork point to current height + - Affects all tables with block-number-based versioning + +3. **Reprocessing** + + - Process blocks from fork point to current height in batches + - Each block's transactions and events are reprocessed + - New data is inserted with correct block numbers + +4. **Completion** + - Commit transaction if successful + - Release reorg lock + - Log completion status + +## Concurrency Control + +- Mutex-based locking prevents concurrent reorg processing +- Active reorgs are tracked with start/end heights +- Stale reorgs (>10 minutes) are automatically cleaned up +- Overlapping reorg attempts are rejected + +## Error Handling + +- Context cancellation checks throughout process +- Transaction rollback on failures +- Detailed error logging for debugging +- Maximum reorg depth enforcement + +## Example Scenario + +``` +Original Chain: A -> B -> C -> D + \ +New Chain: -> C' -> D' -> E' + +1. System detects parent hash mismatch at block E' +2. Traverses back to find fork point (B) +3. Calculates reorg depth (2 blocks) +4. Deletes data from blocks C and D +5. Processes new blocks C', D', E' +``` + +## Best Practices + +1. Always use block numbers for versioning updatable data +2. Create new records for updates from different blocks +3. Implement atomic transactions for data consistency +4. Monitor reorg frequency and depth for system health diff --git a/docs/server/openapi.yaml b/docs/server/openapi.yaml new file mode 100644 index 0000000..9f8edf9 --- /dev/null +++ b/docs/server/openapi.yaml @@ -0,0 +1,672 @@ +openapi: 3.0.1 +info: + title: PDP Explorer API + description: API for exploring Proof of Data Possession (PDP) details for providers and proof sets. + version: 1.0.0 + +servers: + - url: "" + description: Not Deployed Yet + +paths: + /providers: + get: + summary: Get list of all providers + description: Retrieve a list of all storage providers with basic details. + parameters: + - name: offset + in: query + required: false + schema: + type: number + default: 0 + - name: limit + in: query + required: false + schema: + type: number + default: 10 + responses: + "200": + description: List of providers + content: + application/json: + schema: + type: object + properties: + data: + type: array + items: + $ref: "#/components/schemas/Provider" + metadata: + $ref: "#/components/schemas/Metadata" + /providers/:providerId: + get: + summary: Get Provider Details + description: Retrieve detailed information about a specific provider. + parameters: + - name: providerId + in: path + required: true + description: address of provider + schema: + type: string + responses: + "200": + description: Provider details + content: + application/json: + schema: + $ref: "#/components/schemas/Provider" + /providers/:providerId/proof-sets: + get: + summary: Get Proof Sets for a Provider + parameters: + - name: providerId + in: path + required: true + description: address of provider + schema: + type: string + - name: offset + in: query + required: false + schema: + type: number + default: 0 + - name: limit + in: query + required: false + schema: + type: number + default: 10 + responses: + "200": + description: List of proof sets for a provider + content: + application/json: + schema: + type: object + properties: + data: + type: array + items: + $ref: "#/components/schemas/ProofSet" + metadata: + $ref: "#/components/schemas/Metadata" + /providers/:providerId/activities: + get: + summary: Get Provider Activities + description: Retrieve information about a specific provider's activities for chart. + parameters: + - name: providerId + in: path + required: true + description: address of provider + schema: + type: string + - name: type + in: query + required: false + description: Type of activity to retrieve + schema: + type: string + enum: [all, prove_possession, fault_recorded] + default: all + responses: + "200": + description: Provider activities + content: + application/json: + schema: + $ref: "#/components/schemas/Activity" + + /proofsets: + get: + summary: Get list of all Proof Sets + description: Retrieve a list of the top 5 proof sets, sorted by a specified metric such as number of proofs submitted or data size. + parameters: + - name: sortBy + in: query + required: false + description: Metric to sort proof sets by (e.g., `proofsSubmitted`, `size`, `faults`) + schema: + type: string + enum: [proofsSubmitted, size, faults] + default: proofsSubmitted + - name: order + in: query + required: false + description: Sort order (ascending or descending) + schema: + type: string + enum: [asc, desc] + default: desc + - name: offset + in: query + required: false + schema: + type: number + default: 0 + - name: limit + in: query + required: false + schema: + type: number + default: 10 + responses: + "200": + description: List of all proof sets + content: + application/json: + schema: + type: object + properties: + data: + type: array + items: + $ref: "#/components/schemas/ProofSet" + metadata: + $ref: "#/components/schemas/Metadata" + "400": + description: Invalid query parameter + content: + application/json: + schema: + type: object + properties: + error: + type: string + description: Error message + /proofsets/:proofSetId: + get: + summary: Get ProofSet Details + description: Retrieve detailed information about a specific proof set. + parameters: + - name: proofSetId + in: path + required: true + description: ID of the proof set + schema: + type: string + responses: + "200": + description: ProofSet details + content: + application/json: + schema: + $ref: "#/components/schemas/ProofSet" + "400": + description: Invalid query parameter + content: + application/json: + schema: + type: object + properties: + error: + type: string + description: Error message + "404": + description: ProofSet not found + content: + application/json: + schema: + type: object + properties: + error: + type: string + description: Error message + /proofsets/:proofSetId/txs: + get: + summary: Get ProofSet Transactions + description: Retrieve detailed information about proof set transaction. + parameters: + - name: proofSetId + in: path + required: true + description: ID of the proof set + schema: + type: string + - name: filter + in: query + description: transaction method + required: false + schema: + type: string + enum: + [ + "all", + "createProofSet", + "proposeProofSetOwner", + "claimProofSetOwnership", + "deleteProofSet", + "addRoots", + "scheduleRemovals", + "provePossession", + "nextProvingPeriod", + ] + default: "all" + - name: offset + in: query + required: false + schema: + type: number + default: 0 + - name: limit + in: query + required: false + schema: + type: number + default: 10 + responses: + "200": + description: ProofSet details + content: + application/json: + schema: + type: object + properties: + data: + type: array + items: + $ref: "#/components/schemas/Transaction" + metadata: + $ref: "#/components/schemas/Metadata" + /proofsets/:proofSetId/event-logs: + get: + summary: Get ProofSet Event Logs + description: Retrieve detailed information about proof set event logs. + parameters: + - name: proofSetId + in: path + required: true + description: ID of the proof set + schema: + type: string + - name: filter + in: query + description: event name + required: false + schema: + type: string + enum: + [ + "all", + "ProofSetCreated", + "ProofSetOwnerChanged", + "ProofSetDeleted", + "ProofSetEmpty", + "PossessionProven", + "FaultRecord", + "NextProvingPeriod", + "RootsAdded", + "RootsRemoved", + "ProofFeePaid", + ] + default: "all" + - name: offset + in: query + required: false + schema: + type: number + default: 0 + - name: limit + in: query + required: false + schema: + type: number + default: 10 + responses: + "200": + description: ProofSet details + content: + application/json: + schema: + type: object + properties: + data: + type: array + items: + $ref: "#/components/schemas/EventLog" + metadata: + $ref: "#/components/schemas/Metadata" + /proofsets/:proofSetId/roots: + get: + summary: Get ProofSet Roots + description: Retrieve detailed information about proof set roots. + parameters: + - name: proofSetId + in: path + required: true + description: ID of the proof set + schema: + type: string + - name: orderBy + in: query + description: order by + required: false + schema: + type: string + enum: + [ + "root_id", + "total_periods_faulted", + "total_proofs_submitted", + "raw_size", + ] + default: "root_id" + - name: order + in: query + description: order + required: false + schema: + type: string + enum: [asc, desc] + default: desc + - name: offset + in: query + required: false + schema: + type: number + default: 0 + - name: limit + in: query + required: false + schema: + type: number + default: 10 + responses: + "200": + description: ProofSet details + content: + application/json: + schema: + type: object + properties: + data: + type: array + items: + $ref: "#/components/schemas/Root" + metadata: + $ref: "#/components/schemas/Metadata" + + /network-metrics: + get: + summary: Retrieve network metrices + description: Returns network metrics + responses: + "200": + description: Network metrics + content: + application/json: + schema: + $ref: "#/components/schemas/NetworkMetrics" + + /search: + get: + summary: Search + description: Search for providers and proof sets. + parameters: + - name: q + in: query + required: true + description: Search query + schema: + type: string + responses: + "200": + description: Search results + content: + application/json: + schema: + type: object + properties: + data: + $ref: "#/components/schemas/SearchResult" + +components: + schemas: + Provider: + type: object + properties: + id: + type: string + description: pg provider id + providerId: + type: string + description: provider's address + totalFaultedPeriods: + type: integer + format: int64 + totalDataSize: + type: string + proofSetIds: + type: array + items: + type: string + blockNumber: + type: integer + blockHash: + type: string + activeProofSets: + type: integer + numRoots: + type: integer + firstSeen: + type: string + format: date-time + lastSeen: + type: string + format: date-time + createdAt: + type: string + format: date-time + updatedAt: + type: string + format: date-time + ProofSet: + type: object + properties: + id: + type: integer + format: int64 + setId: + type: integer + format: int64 + owner: + type: string + listenerAddr: + type: string + totalFaultedPeriods: + type: integer + format: int64 + totalDataSize: + type: string + totalRoots: + type: integer + format: int64 + totalProvedRoots: + type: integer + format: int64 + totalFeePaid: + type: string + lastProvenEpoch: + type: integer + format: int64 + nextChallengeEpoch: + type: integer + format: int64 + isActive: + type: boolean + blockNumber: + type: integer + format: int64 + blockHash: + type: string + createdAt: + type: string + format: date-time + updatedAt: + type: string + format: date-time + Root: + type: object + properties: + rootId: + type: integer + format: int64 + cid: + type: string + size: + type: integer + format: int64 + removed: + type: boolean + totalPeriodsFaulted: + type: integer + format: int64 + totalProofsSubmitted: + type: integer + format: int64 + lastProvenEpoch: + type: integer + format: int64 + lastProvenAt: + type: string + format: date-time + nullable: true + lastFaultedEpoch: + type: integer + format: int64 + lastFaultedAt: + type: string + format: date-time + nullable: true + createdAt: + type: string + format: date-time + Transaction: + type: object + properties: + hash: + type: string + proofSetId: + type: integer + format: int64 + messageId: + type: string + height: + type: integer + format: int64 + fromAddress: + type: string + toAddress: + type: string + value: + type: string + method: + type: string + status: + type: boolean + blockNumber: + type: integer + format: int64 + blockHash: + type: string + createdAt: + type: string + format: date-time + EventLog: + type: object + properties: + setId: + type: integer + format: int64 + address: + type: string + eventName: + type: string + data: + type: string + logIndex: + type: integer + format: int64 + removed: + type: boolean + topics: + type: array + items: + type: string + blockNumber: + type: integer + format: int64 + blockHash: + type: string + transactionHash: + type: string + createdAt: + type: string + format: date-time + Activity: + type: object + properties: + id: + type: string + type: + type: string + timestamp: + type: string + format: date-time + details: + type: string + Metadata: + type: object + properties: + total: + type: integer + offset: + type: integer + limit: + type: integer + SearchResult: + type: object + properties: + results: + type: array + items: + type: object + properties: + type: + type: string + enum: ["provider", "proofset"] + id: + type: string + active_sets: + type: integer + data_size: + type: string + NetworkMetrics: + type: object + properties: + totalProofSets: + type: integer + format: int64 + totalProviders: + type: integer + format: int64 + totalDataSize: + type: string + totalPieces: + type: integer + format: int64 + totalProofs: + type: integer + format: int64 + totalFaults: + type: integer + format: int64 + uniqueDataSize: + type: string + uniquePieces: + type: integer + format: int64