Skip to content

Conversation

@evangelisilva
Copy link

Which issue does this PR close?

Closes #19264.

Rationale for this change

The analyze function in physical-expr was incorrectly returning None (implying an empty set/infeasibility) when analyzing NOT(Eq) expressions (e.g., NOT(a = 0.0)).
This was caused by propagate_comparison in cp_solver.rs returning Ok(None) for Operator::Eq when the parent interval is FALSE. Ok(None) in cp_solver is interpreted as "Infeasible", but NOT(a=b) is a feasible condition (it effectively creates two disjoint intervals, which the current Interval implementation cannot represent as a single interval, but it is certainly not empty).

What changes are included in this PR?

  • Modified propagate_comparison in cp_solver.rs. When handling Operator::Eq with a FALSE parent (meaning we are looking for "Not Equal"), it now returns the original intervals (identity) instead of None. This treats the constraint as valid/feasible but properly reflects that we cannot refine the bounds further with single-interval arithmetic.
  • Added a new regression test analyze_not_eq_around in datafusion/physical-expr/src/analysis.rs to verify that NOT(a = 0.0) now returns a valid interval encompassing the domain instead of None.

Are these changes tested?

Yes, tested locally.

  • Run command: cargo test -p datafusion-physical-expr --lib analysis::tests
  • Verification: The new test analyze_not_eq_around passes, and existing tests in analysis.rs also pass.

Are there any user-facing changes?

No, this is a bug fix in the physical expression analysis logic.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Feb 3, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @evangelisilva

@pepijnve or @berkaysynnada -- I think you might be familiar with this code. Do you have time to help review this PR?

@pepijnve
Copy link
Contributor

pepijnve commented Feb 3, 2026

@alamb I'm not super familiar with this code, but looking at the review I think I can see a counter example that produces an incorrect response with this change. I will try to write that to confirm my hypothesis.

Operator::Eq => {
// TODO: Propagation is not possible until we support interval sets.
Ok(None)
Ok(Some((left_child.clone(), right_child.clone())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, but only if the left and right child intervals do not intersect. So maybe this should be

match left_child.intersect(right_child)? {
    None => Ok(Some((left_child.clone(), right_child.clone()))),
    // TODO: Propagation is not possible until we support interval sets.
    Some(_) => Ok(None)
}

instead?

In the test case

#[test]
fn test_propagate_eq_false() -> Result<()> {
    let left = Interval::make(Some(-10_i64), Some(10_i64))?;
    let right = Interval::make(Some(0_i64), Some(0_i64))?;
    let outcome = propagate_comparison(&Operator::Eq, &Interval::FALSE, &left, &right)?;
    assert_eq!(
        None,
        outcome
    );
    Ok(())
}

for instance, the result for the left child should be the intervals [-10, 0[ and ]0, 10], but as the TODO suggests that can't currently be represented. The proposed fix would return [-10, 10] for the left child, but that is not correct since a value of 0 for the left child would result in true, not false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @berkaysynnada's answer I see I misinterpreted the contract of propagate_comparison. I had assumed it needed to provide a correct stricter constraint if it could or None if it couldn't. If not restricting the child constraints is valid, which makes perfect sense, then indeed there's just the situation where we can know for sure that (left == right) == true, i.e. two equal point intervals, that we should return None. In all other cases we can simply retain the existing constraints.

@berkaysynnada
Copy link
Contributor

berkaysynnada commented Feb 3, 2026

The fix seems correct to me. To elaborate, assume we have 2 child intervals; A and B. Propagation of TRUE over Eq is done like that:

We prune A such: "choose a point in A. If that point cannot be satisfied by any point from B, then that point is pruned."
and for B, similarly: "choose a point in B. If that point cannot be satisfied by any point from A, then that point is pruned."

So, what I think now is, we don't actually require interval sets for this problem, only except this case:

A is a set which strictly contains B, and B is a single-point value. In that case, we need to split the interval A into 2 parts at the intersection point (or vice versa for B and A). However, leaving the fix as is is not a big problem. We will only not be capable of a tiny optimization.

But to not introduce another bug, the only case which should keep as infeasible is, having 2 same single-point values. In that case, having an equality as false is impossible.

cc @ozankabak if I missed anything

@evangelisilva
Copy link
Author

@pepijnve @berkaysynnada Thanks for the review! I agree, that makes sense.

I will update the PR to handle the specific case where both sides are single points (constants). In that case, if left == right, we know Not(Eq) is impossible, so we can correctly return Infeasible (None).

For other interval cases, I'll keep the safe over-approximation (returning the original intervals) until interval sets are supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

incorrect results when using NOT physical expression in physical_expr::analyze

4 participants