Skip to content

perf(core): Deep dive optimizations for hot path#355

Open
cofin wants to merge 54 commits intomainfrom
feat/performance
Open

perf(core): Deep dive optimizations for hot path#355
cofin wants to merge 54 commits intomainfrom
feat/performance

Conversation

@cofin
Copy link
Member

@cofin cofin commented Feb 2, 2026

This PR implements deep dive optimizations identified in the core-hotpath-opt flow.

Key Changes

  • Micro-caching: Added a single-slot cache in SQLProcessor to bypass dictionary lookups for repeated queries.
  • String Fast Paths: Implemented internal SQL object caching for raw string statements in prepare_statement.
  • Parameter Optimization: Optimized SQL.copy to fast-track parameter updates and streamlined parameter fingerprinting.
  • Observability: Added is_idle check to bypass expensive instrumentation overhead when disabled.
  • Result Construction: Optimized ExecutionResult creation and metadata handling.

Performance Impact

Benchmarks confirm a ~42% reduction in execution time (0.49s -> 0.28s) for the 10k insert workload. The slowdown factor versus raw sqlite3 improved from 33x to ~18x.

@cofin cofin changed the title perf(core): Deep dive optimizations for hot path (~42% faster) perf(core): Deep dive optimizations for hot path Feb 3, 2026
cofin added 29 commits February 3, 2026 15:40
- Add internal SQL object cache for string statements
- Optimize SQL.copy to bypass initialization
- Implement micro-cache in SQLProcessor for repeated queries
- Optimize observability idle check
- Streamline parameter processing and result construction
- Remove unnecessary dict() copy in _unpack_parse_cache_entry
- Remove expression.copy() on parse cache store (only copy on retrieve when needed)
- Defer expression.copy() to _apply_ast_transformers when transformers active
- Fast type dispatch (type(x) is dict) vs ABC isinstance checks
- Remove sorted() for dict keys in structural fingerprinting (use insertion order)
- Cache is_idle check in ObservabilityRuntime (lifecycle/observers immutable)
- Use frozenset intersection for parameter char detection in validator
- Optimize ParameterProfile.styles computation for single-style case

Benchmark (10,000 INSERTs):
- Before: ~20x slowdown vs raw sqlite3
- After: ~15.5x slowdown (tuple params), ~18.8x (dict params)
- Function calls reduced: 1.33M → 1.18M (11% fewer)
- isinstance() calls reduced: 280k → 200k (28% fewer)
Add benchmark functions to isolate SQLGlot overhead:
- bench_sqlite_sqlglot: Cached SQL (minimal overhead)
- bench_sqlite_sqlglot_copy: expression.copy() per call
- bench_sqlite_sqlglot_nocache: .sql() regeneration per call

These help identify whether overhead comes from SQLGlot
parsing/generation vs SQLSpec's own processing.

Key findings:
- SQLGlot cached parsing adds ~0% overhead
- expression.copy() per call: 16x overhead (synthetic)
- SQLSpec actual overhead: distributed across pipeline
cofin added 24 commits February 3, 2026 15:40
- Updated type hints to use the new syntax for union types in driver.py, _async.py, and _common.py.
- Improved readability by formatting long lines and breaking them into multiple lines in driver.py and _common.py.
- Removed unnecessary comments and cleaned up import statements in config.py and typing.py.
- Enhanced exception handling in AsyncMigrationCommands to use async input for user confirmation.
- Refactored logic in CorrelationExtractor to simplify return statements.
- Updated the write_fixture_async function to use AsyncPath for resolving paths asynchronously.
- Improved test readability and consistency in test_sync_adapters.py and test_fast_path.py by formatting long lines.
- Create new sqlspec/driver/_query_cache.py module
- Move CachedQuery namedtuple and QueryCache class
- Rename _QueryCache to QueryCache (now public)
- Rename _FAST_PATH_QUERY_CACHE_SIZE to QC_MAX_SIZE
- Add clear() and __len__() methods to QueryCache
- Update test imports
- Remove unused OrderedDict import from _common.py

Part of driver-arch-cleanup PRD, Chapter 1: qc-extract
Attribute renames:
- _fast_path_binder → _qc_binder
- _fast_path_enabled → _qc_enabled
- _query_cache → _qc

Method renames:
- _update_fast_path_flag → _update_qc_flag
- _fast_rebind → qc_rebind
- _build_fast_statement → qc_build
- _try_cached_compiled → qc_lookup
- _execute_compiled → qc_execute
- _maybe_cache_fast_path → qc_store
- _configure_fast_path_binder → _configure_qc_binder

Test file renamed: test_fast_path.py → test_query_cache.py

Part of driver-arch-cleanup PRD, Chapter 2: qc-rename
…ation

Move eligibility checks and preparation logic from qc_lookup into new
qc_prepare method in _common.py. This eliminates ~15 lines of duplicated
logic between sync and async implementations.

Before: qc_lookup in both _common.py and _async.py contained identical
eligibility checking, cache lookup, rebinding, and statement building.

After: qc_prepare does all preparation work, qc_lookup becomes a thin
wrapper that calls qc_prepare then qc_execute.

Chapter 3 of driver-arch-cleanup_20260203 PRD.
Move eligibility validation from qc_prepare (hot lookup path) to
qc_store (store path, executed once per unique query).

Before: qc_prepare had 6 condition checks including needs_static_script_compilation
and many-params guard.

After: qc_prepare has only 2 essential checks:
1. _qc_enabled flag
2. cache lookup + param count match

All detailed validation happens at store time, ensuring only valid
queries enter the cache in the first place.

Chapter 4 of driver-arch-cleanup_20260203 PRD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant