Run PRAGMA optimize after commiting ingestion to the database#579
Run PRAGMA optimize after commiting ingestion to the database#579
PRAGMA optimize after commiting ingestion to the database#579Conversation
This is recommended practice for SQLite databases with short-lived connections such as the connection for ingestion: https://sqlite.org/pragma.html#pragma_optimize Note the callout of "low-quality indexes" on this page: https://sqlite.org/queryplanner-ng.html#howtofix Due to the nature of both topics and contract IDs, we have a huge variance in the K -> V ratio. Things like XLM have tons of event rows given fee events occur on every txn, and "transfer" is obviously a popular topic1 symbol. This explicitly breaks an indexing rule: > A low-quality index is one where there are **more than 10 or 20 > rows in the table that have the same value** for the > left-most column of the index. The solution is running `ANALYZE events;`, but this would take too long and grabs an exclusive lock on the DB, so we must periodically run `ANALYZE` via the `PRAGMA optimize;` directive which we execute at the end of every ingestion commit.
There was a problem hiding this comment.
Pull request overview
This PR adds query planner optimization after each ledger ingestion by running PRAGMA optimize alongside the existing WAL checkpoint operation. Additionally, it removes the unused ledgerEntries field from the database cache (since Core now manages ledger entries) and increases the log level of successful ledger ingestion from Debug to Info for better visibility.
Changes:
- Added
PRAGMA optimizeto post-commit database operations for improved query planning - Removed unused
ledgerEntriescache field and its transactional cache initialization - Changed ledger ingestion success log from Debug to Info level
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| cmd/stellar-rpc/internal/db/db.go | Added PRAGMA optimize to post-commit hook, removed unused ledgerEntries cache field, and reformatted comments for better readability |
| cmd/stellar-rpc/internal/ingest/service.go | Changed log level from Debug to Info for successful ledger ingestion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cmd/stellar-rpc/internal/db/db.go
Outdated
| // TODO: this is sqlite-only, it shouldn't be here | ||
| startTime := time.Now() | ||
| _, err := db.ExecRaw(ctx, "PRAGMA wal_checkpoint(TRUNCATE)") | ||
| _, err := db.ExecRaw(ctx, "PRAGMA wal_checkpoint(TRUNCATE); PRAGMA optimize;") |
There was a problem hiding this comment.
It is unclear what the upper bound on this PRAGMA is: obviously if it exceeds ledger close time this is hugely problematic. The alternative is to ask DB operators to run ANALYZE events regularly which seems silly.
agreed, this seems may be potential for transient delays during ingestion given the single threaded ingestion loop will block until this returns. It also introduces a lock on event table that could pause json-rpc getEvents requests at same time?
this usage does leverage the lightweight analyze behavior per analyze_limit docs as long as the sqlite version is >= 3.46.0
two things for consideration:
- move this to an async timer loop? Like once every
expected ledger close time+1sor something. That ensures an analyze is run close to the ledger ingestion period but avoids impacting the sync ingestion routine, could still introduce some occasional blocking on rpc query requests to events table. - run
optimizein a follow-on ExecRaw statement? wrap some logging output around that and new metrics key for duration ofoptimizeto get breakout visibility.
There was a problem hiding this comment.
This affects all of ingestion (post-commit for the whole ledger), not just events, so the impact is even bigger if it will interfere with reads.
Yeah your second point is ideal. Before mucking with a timed analyze call, I’m gonna isolate this and run it in the dev cluster to get timing metrics around how long the runs take. If they’re not concerning at all (ms), I won’t bother with strict timing bounds.
What
Ensures that we run query planner optimization at the end of every ledger ingest.
Stray optimization also included: since Core manages ledger entries now, the
ledgerEntriesfield of the db cache was unused entirely, so it has been removed.Why
See details in faa3d51 and this Slack thread.
Known limitations
It is unclear what the upper bound on this
PRAGMAis: obviously if it exceeds ledger close time this is hugely problematic. The alternative is to ask DB operators to runANALYZE eventsregularly which seems silly.