fix: incorrect results when using NOT physical expression in physical… #20138

evangelisilva · 2026-02-03T17:03:58Z

Which issue does this PR close?

Closes #19264.

Rationale for this change

The analyze function in physical-expr was incorrectly returning None (implying an empty set/infeasibility) when analyzing NOT(Eq) expressions (e.g., NOT(a = 0.0)).
This was caused by propagate_comparison in cp_solver.rs returning Ok(None) for Operator::Eq when the parent interval is FALSE. Ok(None) in cp_solver is interpreted as "Infeasible", but NOT(a=b) is a feasible condition (it effectively creates two disjoint intervals, which the current Interval implementation cannot represent as a single interval, but it is certainly not empty).

What changes are included in this PR?

Modified propagate_comparison in cp_solver.rs. When handling Operator::Eq with a FALSE parent (meaning we are looking for "Not Equal"), it now returns the original intervals (identity) instead of None. This treats the constraint as valid/feasible but properly reflects that we cannot refine the bounds further with single-interval arithmetic.
Added a new regression test analyze_not_eq_around in datafusion/physical-expr/src/analysis.rs to verify that NOT(a = 0.0) now returns a valid interval encompassing the domain instead of None.

Are these changes tested?

Yes, tested locally.

Run command: cargo test -p datafusion-physical-expr --lib analysis::tests
Verification: The new test analyze_not_eq_around passes, and existing tests in analysis.rs also pass.

Are there any user-facing changes?

No, this is a bug fix in the physical expression analysis logic.

…_expr::analyze Fixes apache#19264

alamb

Thank you @evangelisilva

@pepijnve or @berkaysynnada -- I think you might be familiar with this code. Do you have time to help review this PR?

pepijnve · 2026-02-03T21:31:50Z

@alamb I'm not super familiar with this code, but looking at the review I think I can see a counter example that produces an incorrect response with this change. I will try to write that to confirm my hypothesis.

pepijnve · 2026-02-03T21:40:08Z

datafusion/physical-expr/src/intervals/cp_solver.rs

            Operator::Eq => {
                // TODO: Propagation is not possible until we support interval sets.
-                Ok(None)
+                Ok(Some((left_child.clone(), right_child.clone())))


I think this is correct, but only if the left and right child intervals do not intersect. So maybe this should be

match left_child.intersect(right_child)? { None => Ok(Some((left_child.clone(), right_child.clone()))), // TODO: Propagation is not possible until we support interval sets. Some(_) => Ok(None) }

instead?

In the test case

#[test] fn test_propagate_eq_false() -> Result<()> { let left = Interval::make(Some(-10_i64), Some(10_i64))?; let right = Interval::make(Some(0_i64), Some(0_i64))?; let outcome = propagate_comparison(&Operator::Eq, &Interval::FALSE, &left, &right)?; assert_eq!( None, outcome ); Ok(()) }

for instance, the result for the left child should be the intervals [-10, 0[ and ]0, 10], but as the TODO suggests that can't currently be represented. The proposed fix would return [-10, 10] for the left child, but that is not correct since a value of 0 for the left child would result in true, not false.

Based on @berkaysynnada's answer I see I misinterpreted the contract of propagate_comparison. I had assumed it needed to provide a correct stricter constraint if it could or None if it couldn't. If not restricting the child constraints is valid, which makes perfect sense, then indeed there's just the situation where we can know for sure that (left == right) == true, i.e. two equal point intervals, that we should return None. In all other cases we can simply retain the existing constraints.

berkaysynnada · 2026-02-03T22:18:59Z

The fix seems correct to me. To elaborate, assume we have 2 child intervals; A and B. Propagation of TRUE over Eq is done like that:

We prune A such: "choose a point in A. If that point cannot be satisfied by any point from B, then that point is pruned."
and for B, similarly: "choose a point in B. If that point cannot be satisfied by any point from A, then that point is pruned."

So, what I think now is, we don't actually require interval sets for this problem, only except this case:

A is a set which strictly contains B, and B is a single-point value. In that case, we need to split the interval A into 2 parts at the intersection point (or vice versa for B and A). However, leaving the fix as is is not a big problem. We will only not be capable of a tiny optimization.

But to not introduce another bug, the only case which should keep as infeasible is, having 2 same single-point values. In that case, having an equality as false is impossible.

cc @ozankabak if I missed anything

evangelisilva · 2026-02-03T22:25:10Z

@pepijnve @berkaysynnada Thanks for the review! I agree, that makes sense.

I will update the PR to handle the specific case where both sides are single points (constants). In that case, if left == right, we know Not(Eq) is impossible, so we can correctly return Infeasible (None).

For other interval cases, I'll keep the safe over-approximation (returning the original intervals) until interval sets are supported.

fix: incorrect results when using NOT physical expression in physical…

93d2a55

…_expr::analyze Fixes apache#19264

github-actions bot added the physical-expr Changes to the physical-expr crates label Feb 3, 2026

Merge branch 'main' into fix/19264-not-expr-analysis

f5fe792

alamb reviewed Feb 3, 2026

View reviewed changes

pepijnve reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: incorrect results when using NOT physical expression in physical… #20138

fix: incorrect results when using NOT physical expression in physical… #20138

Uh oh!

evangelisilva commented Feb 3, 2026

Uh oh!

alamb left a comment

Uh oh!

pepijnve commented Feb 3, 2026

Uh oh!

pepijnve Feb 3, 2026

Uh oh!

pepijnve Feb 4, 2026

Uh oh!

berkaysynnada commented Feb 3, 2026 •

edited

Loading

Uh oh!

evangelisilva commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: incorrect results when using NOT physical expression in physical… #20138

Are you sure you want to change the base?

fix: incorrect results when using NOT physical expression in physical… #20138

Uh oh!

Conversation

evangelisilva commented Feb 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

pepijnve commented Feb 3, 2026

Uh oh!

pepijnve Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

pepijnve Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

berkaysynnada commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evangelisilva commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

berkaysynnada commented Feb 3, 2026 •

edited

Loading