Skip to content

Commit e3ca965

Browse files
committed
feat: vectorize query execution
1 parent b14dbef commit e3ca965

File tree

3 files changed

+583
-147
lines changed

3 files changed

+583
-147
lines changed

specs/vectorized-execution.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Vectorized Execution Plan
2+
3+
## Goal
4+
Implement vectorized execution to improve query performance by processing documents in batches. This improves CPU cache locality and enables the compiler to use SIMD instructions for filtering and projection.
5+
6+
## Design
7+
8+
### 1. Batch Structure
9+
We introduce a `Batch` struct that holds a chunk of data to be processed together.
10+
11+
```rust
12+
pub struct Batch {
13+
// Row-oriented batching of LazyDocuments (ExecutionResult)
14+
pub items: Vec<ExecutionResult>,
15+
}
16+
```
17+
18+
### 2. Iterator Model
19+
Operators produce and consume `Batch` objects.
20+
21+
- `BatchIterator` trait (effectively `Iterator<Item = Batch>`).
22+
- Batch size: Defaults to 4096, but adaptable via hints (e.g. for `LIMIT` queries).
23+
24+
### 3. Vectorized Operators
25+
26+
#### BatchScanOperator
27+
- Reads from the underlying `MergedIterator`.
28+
- Accumulates `N` items into a `Batch`.
29+
- **Optimization:** Accepts a `batch_size` parameter (derived from `LIMIT` hint) to avoid over-fetching data from storage.
30+
31+
#### BatchFilterOperator (SIMD Target)
32+
- **Input:** `Batch` of `LazyDocument`s.
33+
- **Process:**
34+
1. Check if the predicate is suitable for vectorization (currently Simple Binary Numeric expressions).
35+
2. **Column Extraction:** Iterate through the batch and extract the specific field for all documents into a reusable typed buffer (`Vec<f64>`).
36+
- **Optimization:** Uses `evaluate_to_f64_lazy` (in `src/expression.rs`) to extract values directly from raw JSONB bytes without allocating intermediate `Value` enums or `BTreeMap`s.
37+
3. **SIMD Evaluation:** Perform the comparison loop over the extracted vector.
38+
- Relies on Rust/LLVM auto-vectorization for tight loops over primitive arrays.
39+
4. **Selection:** Filter the `Batch` in-place using the computed mask.
40+
- **Fallback:** If the predicate is complex (e.g. `OR`, nested paths, non-numeric), the execution planner falls back to the standard Row-based execution plan (`execute_row_plan`) to ensure no performance regression.
41+
42+
#### BatchProjectOperator
43+
- Iterate over the `Batch`.
44+
- Apply projection to each item map-style.
45+
- (Currently disabled for automatic vectorization selection to ensure stability, effectively using Row-based plan for Projections).
46+
47+
#### BatchLimitOperator / BatchOffsetOperator
48+
- Handle `LIMIT` and `OFFSET` directly on `Batch` streams to avoid switching contexts.
49+
50+
### 4. Execution Strategy (`execute_plan`)
51+
52+
The `execute_plan` function now intelligently chooses between Vectorized and Row-based execution:
53+
54+
1. **Check Vectorizability:** Analyzes the logical plan. If the plan consists of Scan and Simple Numeric Filters (and optionally Limit/Offset), it qualifies for vectorization.
55+
2. **Vectorized Path (`execute_batch_plan`):**
56+
- Constructs a pipeline of `Batch*` operators.
57+
- Propagates `LIMIT` values as hints to `BatchScanOperator`.
58+
- **Disable Pushdown:** Deliberately avoids pushing the predicate down to the storage engine (`db.scan`) for these simple cases. This forces the data into `BatchFilterOperator` where the efficient SIMD loop and allocation-free extraction can outperform the storage engine's row-by-row check.
59+
3. **Row-based Path (`execute_row_plan`):**
60+
- Used for complex queries (logical operators, projections, complex paths).
61+
- Utilizes standard `ScanOperator`, `FilterOperator`, etc.
62+
- Leverages full predicate pushdown to `MergedIterator`.
63+
64+
## Optimization Strategy
65+
- **Memory Reuse:** `BatchFilterOperator` reuses `buf_values` and `buf_valid` vectors between batches to avoid allocation churn.
66+
- **Allocation-Free Extraction:** `evaluate_to_f64_lazy` bypasses `Value` creation.
67+
- **Adaptive Batching:** `BatchScanOperator` scales batch size based on query limits.

src/expression.rs

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,38 @@ pub enum ScalarFunction {
7070

7171
// Lazy Evaluator
7272

73+
pub fn evaluate_to_f64_lazy<'a>(expr: &Expression<'a>, doc: &LazyDocument) -> Option<f64> {
74+
match expr {
75+
Expression::FieldReference(parts, _) => {
76+
let raw_root = RawJsonb::new(&doc.raw);
77+
if let Ok(Some(doc_owned)) = raw_root.get_by_index(1) {
78+
if let Some(field_bytes) = get_path_lazy(doc_owned, parts) {
79+
if let Ok(val) = jsonb_schema::from_slice(&field_bytes) {
80+
match val {
81+
jsonb_schema::Value::Number(n) => get_f64_from_number(&n),
82+
_ => None,
83+
}
84+
} else {
85+
None
86+
}
87+
} else {
88+
None
89+
}
90+
} else {
91+
None
92+
}
93+
}
94+
Expression::Literal(val) => match val {
95+
Value::Number(n) => get_f64_from_number(n),
96+
_ => None,
97+
},
98+
_ => match evaluate_expression_lazy(expr, doc) {
99+
Value::Number(n) => get_f64_from_number(&n),
100+
_ => None,
101+
},
102+
}
103+
}
104+
73105
pub fn evaluate_expression_lazy<'a>(expr: &Expression<'a>, doc: &LazyDocument) -> Value {
74106
match expr {
75107
Expression::FieldReference(parts, _) => {
@@ -214,7 +246,7 @@ pub fn evaluate_expression<'a>(expr: &Expression<'a>, doc: &Value) -> Value {
214246
}
215247
}
216248

217-
fn get_f64_from_number(n: &Number) -> Option<f64> {
249+
pub fn get_f64_from_number(n: &Number) -> Option<f64> {
218250
match n {
219251
Number::Int64(i) => Some(*i as f64),
220252
Number::UInt64(u) => Some(*u as f64),

0 commit comments

Comments
 (0)