Add Prometheus metrics tracking for ClickHouse and RPC operations#257
Add Prometheus metrics tracking for ClickHouse and RPC operations#257YaroShkvorets merged 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds Prometheus-based observability for key scraper operations (ClickHouse reads/writes and RPC requests), and documents the available metrics for monitoring and alerting.
Changes:
- Introduces new Prometheus histogram metrics for ClickHouse operations and RPC request latency, plus helper tracking functions.
- Instruments
lib/rpc.ts,lib/clickhouse.ts, andlib/batch-insert.tsto record success/error latency metrics. - Updates documentation (README) and adjusts Biome schema reference for local resolution.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/rpc.ts | Tracks per-attempt RPC latency with success/error labels. |
| lib/prometheus.ts | Adds histogram metrics + tracking helpers; tweaks config info labeling behavior. |
| lib/clickhouse.ts | Records ClickHouse read operation latency on success/error paths. |
| lib/batch-insert.ts | Records ClickHouse write (batch insert) latency on success/error paths. |
| biome.json | Switches $schema to local node_modules path. |
| README.md | Adds a Monitoring section describing metrics and sample queries. |
| const rpcRequests = new promClient.Histogram({ | ||
| name: 'scraper_rpc_requests', | ||
| help: 'Duration of RPC requests in seconds', | ||
| labelNames: ['method', 'status'], | ||
| buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10], |
There was a problem hiding this comment.
This histogram tracks a duration in seconds but the metric name doesn’t include a unit suffix. Prometheus naming conventions recommend suffixing duration metrics with _seconds (e.g., scraper_rpc_requests_seconds) to avoid ambiguity and improve compatibility with standard dashboards/alerts. Since this is newly introduced, it’s a good time to align the name (and update README queries accordingly).
| /** | ||
| * Track a ClickHouse operation | ||
| * @param operationType - Type of operation ('read' or 'write') | ||
| * @param durationSeconds - Duration of the operation in seconds | ||
| */ |
There was a problem hiding this comment.
JSDoc for trackClickHouseOperation is out of sync with the actual function signature: it mentions durationSeconds but the function takes status and startTime and computes the duration internally. Please update the doc (and include status) so callers don’t get misled.
There was a problem hiding this comment.
@copilot adjust JSDoc function right here in this pr
| * Track an RPC request | ||
| * @param method - RPC method name (e.g., 'eth_getBlockByNumber') | ||
| * @param status - Request status ('success' or 'error') | ||
| * @param durationSeconds - Duration of the request in seconds | ||
| */ |
There was a problem hiding this comment.
JSDoc for trackRpcRequest is out of sync with the actual function signature: it mentions durationSeconds but the function takes startTime and computes the duration internally. Please update the doc to match the parameters and behavior.
| export function trackClickHouseOperation( | ||
| operationType: 'read' | 'write', | ||
| status: 'success' | 'error', | ||
| startTime: number, | ||
| ): void { |
There was a problem hiding this comment.
lib/prometheus.test.ts covers the server/counters but doesn’t exercise the newly exported histogram helpers (trackClickHouseOperation / trackRpcRequest). Adding a small test that calls them and asserts the corresponding metric series (with expected labels) appears would help prevent regressions in metric names/labels (monitoring API stability).
| .labels( | ||
| CLICKHOUSE_URL, | ||
| CLICKHOUSE_DATABASE || 'not_set', | ||
| NODE_URL || 'not_set', | ||
| ) |
There was a problem hiding this comment.
scraper_config_info label values are sourced from CLICKHOUSE_URL / CLICKHOUSE_DATABASE / NODE_URL constants imported from lib/config.ts, but cli.ts mutates process.env after module import to apply CLI overrides. This means the gauge can report stale values (often not_set) even when the env vars are set via CLI flags. Consider reading from process.env inside startPrometheusServer() when initializing the gauge (mirroring lib/clickhouse.ts / lib/rpc.ts runtime reads) so the metric reflects the effective runtime configuration.
| const clickhouseOperations = new promClient.Histogram({ | ||
| name: 'scraper_clickhouse_operations', | ||
| help: 'Duration of ClickHouse operations in seconds', | ||
| labelNames: ['operation_type', 'status'], | ||
| buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10], |
There was a problem hiding this comment.
This histogram tracks a duration in seconds but the metric name doesn’t include a unit suffix. Prometheus naming conventions recommend suffixing duration metrics with _seconds (e.g., scraper_clickhouse_operations_seconds) to avoid ambiguity and improve compatibility with standard dashboards/alerts. Since this is newly introduced, it’s a good time to align the name (and update README queries accordingly).
README.md
Outdated
| **Configuration Info** | ||
| - `scraper_config_info` (gauge) - Configuration metadata | ||
| - Labels: `clickhouse_url`, `clickhouse_database`, `node_url` |
There was a problem hiding this comment.
Documenting scraper_config_info with labels clickhouse_url, clickhouse_database, and node_url reflects that full configuration URLs are being exposed as Prometheus metric labels, which can leak database credentials or RPC API keys via the /metrics endpoint. Anyone who can reach the metrics port (by default bound to all interfaces) could scrape these labels and reuse embedded passwords or tokens to access your ClickHouse instance or RPC provider. To avoid leaking secrets, avoid emitting full URLs or credentials as metric labels; instead, expose only non-sensitive, sanitized identifiers (for example, a redacted host or a boolean flag indicating whether a config value is set).
|
@YaroShkvorets I've opened a new pull request, #258, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@YaroShkvorets I've opened a new pull request, #259, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Initial plan * Address PR review comments: fix metric naming, JSDoc, security, and add tests Co-authored-by: YaroShkvorets <29608734+YaroShkvorets@users.noreply.github.com> * Improve JSDoc clarity and simplify fallback pattern Co-authored-by: YaroShkvorets <29608734+YaroShkvorets@users.noreply.github.com> * Use sanitized node_host instead of boolean node_url_configured Co-authored-by: YaroShkvorets <29608734+YaroShkvorets@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: YaroShkvorets <29608734+YaroShkvorets@users.noreply.github.com>
|
@copilot tests fail |
|
@YaroShkvorets I've opened a new pull request, #260, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@copilot your PR was garbage so I closed it. 2 tests that you added are failing because you are probably setting up the environment wrong. It has nothing to do with running concurrently - all tests are running prometheus server on different ports. Fix tests without changing the tested functions and make sure |
|
@YaroShkvorets I've opened a new pull request, #261, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Initial plan * Fix Prometheus test failures by cleaning up server state on error Co-authored-by: YaroShkvorets <29608734+YaroShkvorets@users.noreply.github.com> * use bun sleep * lint --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: YaroShkvorets <29608734+YaroShkvorets@users.noreply.github.com> Co-authored-by: YaroShkvorets <shkvorets@gmail.com>
This pull request introduces comprehensive Prometheus monitoring for the TRON ERC-20 token scraper, adds detailed documentation for available metrics, and refines metric tracking for ClickHouse and RPC operations. It also updates some configuration and code structure for improved observability and maintainability.
Monitoring and Metrics Enhancements:
README.mdto include a new Monitoring section, describing all available metrics, their labels, and example Prometheus queries. [1] [2]Codebase and Configuration Updates:
lib/batch-insert.ts,lib/clickhouse.ts, andlib/rpc.tsfor ClickHouse and RPC operations, using new helper functions. [1] [2] [3]biome.jsonschema path to use a local node modules reference for better compatibility.