perf(core): Deep dive optimizations for hot path#355
Open
Conversation
- Add internal SQL object cache for string statements - Optimize SQL.copy to bypass initialization - Implement micro-cache in SQLProcessor for repeated queries - Optimize observability idle check - Streamline parameter processing and result construction
- Remove unnecessary dict() copy in _unpack_parse_cache_entry - Remove expression.copy() on parse cache store (only copy on retrieve when needed) - Defer expression.copy() to _apply_ast_transformers when transformers active - Fast type dispatch (type(x) is dict) vs ABC isinstance checks - Remove sorted() for dict keys in structural fingerprinting (use insertion order) - Cache is_idle check in ObservabilityRuntime (lifecycle/observers immutable) - Use frozenset intersection for parameter char detection in validator - Optimize ParameterProfile.styles computation for single-style case Benchmark (10,000 INSERTs): - Before: ~20x slowdown vs raw sqlite3 - After: ~15.5x slowdown (tuple params), ~18.8x (dict params) - Function calls reduced: 1.33M → 1.18M (11% fewer) - isinstance() calls reduced: 280k → 200k (28% fewer)
Add benchmark functions to isolate SQLGlot overhead: - bench_sqlite_sqlglot: Cached SQL (minimal overhead) - bench_sqlite_sqlglot_copy: expression.copy() per call - bench_sqlite_sqlglot_nocache: .sql() regeneration per call These help identify whether overhead comes from SQLGlot parsing/generation vs SQLSpec's own processing. Key findings: - SQLGlot cached parsing adds ~0% overhead - expression.copy() per call: 16x overhead (synthetic) - SQLSpec actual overhead: distributed across pipeline
- Updated type hints to use the new syntax for union types in driver.py, _async.py, and _common.py. - Improved readability by formatting long lines and breaking them into multiple lines in driver.py and _common.py. - Removed unnecessary comments and cleaned up import statements in config.py and typing.py. - Enhanced exception handling in AsyncMigrationCommands to use async input for user confirmation. - Refactored logic in CorrelationExtractor to simplify return statements. - Updated the write_fixture_async function to use AsyncPath for resolving paths asynchronously. - Improved test readability and consistency in test_sync_adapters.py and test_fast_path.py by formatting long lines.
- Create new sqlspec/driver/_query_cache.py module - Move CachedQuery namedtuple and QueryCache class - Rename _QueryCache to QueryCache (now public) - Rename _FAST_PATH_QUERY_CACHE_SIZE to QC_MAX_SIZE - Add clear() and __len__() methods to QueryCache - Update test imports - Remove unused OrderedDict import from _common.py Part of driver-arch-cleanup PRD, Chapter 1: qc-extract
Attribute renames: - _fast_path_binder → _qc_binder - _fast_path_enabled → _qc_enabled - _query_cache → _qc Method renames: - _update_fast_path_flag → _update_qc_flag - _fast_rebind → qc_rebind - _build_fast_statement → qc_build - _try_cached_compiled → qc_lookup - _execute_compiled → qc_execute - _maybe_cache_fast_path → qc_store - _configure_fast_path_binder → _configure_qc_binder Test file renamed: test_fast_path.py → test_query_cache.py Part of driver-arch-cleanup PRD, Chapter 2: qc-rename
…ation Move eligibility checks and preparation logic from qc_lookup into new qc_prepare method in _common.py. This eliminates ~15 lines of duplicated logic between sync and async implementations. Before: qc_lookup in both _common.py and _async.py contained identical eligibility checking, cache lookup, rebinding, and statement building. After: qc_prepare does all preparation work, qc_lookup becomes a thin wrapper that calls qc_prepare then qc_execute. Chapter 3 of driver-arch-cleanup_20260203 PRD.
c058f9d to
3833499
Compare
Move eligibility validation from qc_prepare (hot lookup path) to qc_store (store path, executed once per unique query). Before: qc_prepare had 6 condition checks including needs_static_script_compilation and many-params guard. After: qc_prepare has only 2 essential checks: 1. _qc_enabled flag 2. cache lookup + param count match All detailed validation happens at store time, ensuring only valid queries enter the cache in the first place. Chapter 4 of driver-arch-cleanup_20260203 PRD.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements deep dive optimizations identified in the core-hotpath-opt flow.
Key Changes
SQLProcessorto bypass dictionary lookups for repeated queries.prepare_statement.SQL.copyto fast-track parameter updates and streamlined parameter fingerprinting.is_idlecheck to bypass expensive instrumentation overhead when disabled.ExecutionResultcreation and metadata handling.Performance Impact
Benchmarks confirm a ~42% reduction in execution time (0.49s -> 0.28s) for the 10k insert workload. The slowdown factor versus raw
sqlite3improved from 33x to ~18x.