feat: optimize CASE WHEN for divide-by-zero protection pattern#19994
feat: optimize CASE WHEN for divide-by-zero protection pattern#19994CuteChuanChuan wants to merge 6 commits intoapache:mainfrom
Conversation
267d8df to
334785f
Compare
|
run benchmark tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Hi @andygrove , |
|
run benchmark tpcds |
|
Also FYI @pepijnve |
|
🤖 |
|
It might be useful to add a microbenchmark to https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/benches/case_when.rs so we can compare before/after. I don't think any of the existing ones will cover this particular pattern. The TCP-DS queries that contain this pattern (based on |
|
🤖: Benchmark completed Details
|
- Adds a specialization for the common pattern: CASE WHEN y > 0 THEN x / y ELSE NULL END - Add EvalMethod::DivideByZeroProtection variant - Add pattern detection in find_best_eval_method() - Implement safe_divide using Arrow kernels - Handle CastExpr wrapping on divisor
334785f to
6479884
Compare
|
Microbenchmark result:
|
|
Hi @pepijnve , thanks for pointing out the need for a microbenchmark. I compared |
| } | ||
| } | ||
|
|
||
| fn benchmark_divide_by_zero_protection(c: &mut Criterion, batch_size: usize) { |
There was a problem hiding this comment.
Could you please pull this additional benchmark code code into its own PR we could then use our benchmarking scripts to compare performance of this PR with main?
Thank you 🙏
| }, | ||
| ); | ||
|
|
||
| group.bench_function( |
There was a problem hiding this comment.
I'm trying to spot the difference between this benchmark and the one just above it, they look identical to me. Yet the measurements seem to produce similar values. What causes that?
There was a problem hiding this comment.
Hi @pepijnve ,
The key difference is in which column is used in the WHEN condition:
- First benchmark (
DivideByZeroProtection): checks divisor_col > 0, and since divisor_col is also the divisor in numerator / divisor_col, this matches the pattern and triggers the optimization. - Second benchmark (
ExpressionOrExpression): checks divisor_copy_col > 0, but the division uses divisor_col. Since the checked column doesn't match the divisor, the optimization is not triggered and it falls back to ExpressionOrExpression.
I'll also add a comment in the code to make this distinction clearer.
There was a problem hiding this comment.
Ah of course. That's really subtle. A comment for future readers is indeed a good idea.
There was a problem hiding this comment.
A followup question I had, but I'm happy to measure that myself, is what's causing the performance gap? Might be interesting to see if there's something we could do to improve ExprOrExpr for this particular pattern.
There was a problem hiding this comment.
Hi @pepijnve,
I think the performance gap might come from execution model differences. Please correct me if I am wrong.
ExprOrExpr: evaluate condition → build selection → execute on selected rows → merge
DivideByZeroProtection: fully vectorized Arrow kernels
The vectorized path avoids selection/branching overhead, which is why it wins when most rows need computation (low zero density). However, ExprOrExpr's short-circuit helps when many rows are filtered (high zero density).
Although I don't have concrete ideas about potential ExprOrExpr improvements for this pattern yet, I'd be happy to explore further if you have any suggestions!
There was a problem hiding this comment.
I've been studying the results a bit. I'll add some extra comments on the relevant bits of code.
There was a problem hiding this comment.
There is another PR here from @pepijnve to improve some cases
|
Benchmark was added in #20076 |
## Which issue does this PR close? Related to #19994 - This PR extracts the benchmark code to allow performance comparison. ## Rationale for this change As pointed out by @alamb in #19994, this separates the microbenchmark code so that the benchmarking scripts can compare the optimization PR against main with the benchmark already in place. ## What changes are included in this PR? Adds a microbenchmark for the divide-by-zero protection pattern in `case_when.rs`: - Benchmarks with varying percentages of zeros (0%, 10%, 50%, 90%) - Compares `DivideByZeroProtection` pattern (where checked column matches divisor) vs `ExpressionOrExpression` fallback (where they don't match) ## Are these changes tested? benchmark code only. ## Are there any user-facing changes? No.
|
I merged up to get the new benchmarks so I could run them on our runner |
|
run benchmark case_when |
This comment was marked as outdated.
This comment was marked as outdated.
|
run benchmark case_when |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
Interesting results. Significantly faster when there are no zeroes, but as the percentage increases the balance tips and 90% zeroes is actually quite a bit slower. |
| use arrow::compute::kernels::numeric::div; | ||
|
|
||
| let zero = ScalarValue::new_zero(divisor.data_type())?.to_scalar()?; | ||
| let zero_mask = eq(divisor, &zero)?; |
There was a problem hiding this comment.
I think you need to make sure you use the condition from the when expression here in order to get the correct result. I added these SLTs and they were not all passing.
query I
SELECT CASE WHEN d != 0 THEN n / d ELSE NULL END FROM (VALUES (1, 1), (1, 0), (1, -1)) t(n,d)
----
1
NULL
-1
query I
SELECT CASE WHEN d > 0 THEN n / d ELSE NULL END FROM (VALUES (1, 1), (1, 0), (1, -1)) t(n,d)
----
1
NULL
NULL
query I
SELECT CASE WHEN d < 0 THEN n / d ELSE NULL END FROM (VALUES (1, 1), (1, 0), (1, -1)) t(n,d)
----
NULL
NULL
-1
|
In the microbenchmark, I think it might be preferable to use I had a closer look at what was going on in what I thought was the The changes I made for Here the Here From 50% on it looks like the number of divisions starts to outweigh the filtering work. Copying half the data to then only perform half the number of divisions ends up being beneficial. That would also explain the 90% results since you're doing much fewer divisions then. |
@pepijnve |
|
run benchmark case_when |
|
🤖 |
|
running with updated benchmarks |
|
run benchmark case_when |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Divide by zero protection is looking pretty good: For some reason the CI is now failing on this PR When #20097 is merged, then we can run the benchmarks again and see how much additional boost this special case gives us |
That's due to #19994 (comment) |
…0097) ## Which issue does this PR close? - Related to #11570 ## Rationale for this change While reviewing #19994 it became clear the optimised `ExpressionOrExpression` code path was not being used when the case expression has no `else` branch or has `else null`. In those situations the general evaluation strategies could end up being used. This PR refines the `ExpressionOrExpression` implementation to also handle `else null` expressions. ## What changes are included in this PR? Use `ExpressionOrExpression` for expressions of the form `CASE WHEN x THEN y [ELSE NULL]` ## Are these changes tested? Covered by existing SLTs ## Are there any user-facing changes? No
4319634 to
0a8f085
Compare
|
run benchmark case_when |
This comment was marked as outdated.
This comment was marked as outdated.
|
🤖 |
|
🤖: Benchmark completed Details
|
|
It seems that effect of Divide by zero protection is not significant.🤣 |
Which issue does this PR close?
Rationale for this change
The
CaseExprimplementation is expensive. A common usage pattern (particularly in TPC-DS benchmarks) is to protect against divide-by-zero:This entire expression can be replaced with a simpler divide operation that returns NULL when the divisor is zero, avoiding the overhead of full CASE evaluation.
What changes are included in this PR?
Are these changes tested?
Yes, added two new tests:
Are there any user-facing changes?
No. This is an internal optimization that produces the same results but with better performance.