-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: incorrect results when using NOT physical expression in physical… #20138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: incorrect results when using NOT physical expression in physical… #20138
Conversation
…_expr::analyze Fixes apache#19264
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @evangelisilva
@pepijnve or @berkaysynnada -- I think you might be familiar with this code. Do you have time to help review this PR?
|
@alamb I'm not super familiar with this code, but looking at the review I think I can see a counter example that produces an incorrect response with this change. I will try to write that to confirm my hypothesis. |
| Operator::Eq => { | ||
| // TODO: Propagation is not possible until we support interval sets. | ||
| Ok(None) | ||
| Ok(Some((left_child.clone(), right_child.clone()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct, but only if the left and right child intervals do not intersect. So maybe this should be
match left_child.intersect(right_child)? {
None => Ok(Some((left_child.clone(), right_child.clone()))),
// TODO: Propagation is not possible until we support interval sets.
Some(_) => Ok(None)
}
instead?
In the test case
#[test]
fn test_propagate_eq_false() -> Result<()> {
let left = Interval::make(Some(-10_i64), Some(10_i64))?;
let right = Interval::make(Some(0_i64), Some(0_i64))?;
let outcome = propagate_comparison(&Operator::Eq, &Interval::FALSE, &left, &right)?;
assert_eq!(
None,
outcome
);
Ok(())
}
for instance, the result for the left child should be the intervals [-10, 0[ and ]0, 10], but as the TODO suggests that can't currently be represented. The proposed fix would return [-10, 10] for the left child, but that is not correct since a value of 0 for the left child would result in true, not false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @berkaysynnada's answer I see I misinterpreted the contract of propagate_comparison. I had assumed it needed to provide a correct stricter constraint if it could or None if it couldn't. If not restricting the child constraints is valid, which makes perfect sense, then indeed there's just the situation where we can know for sure that (left == right) == true, i.e. two equal point intervals, that we should return None. In all other cases we can simply retain the existing constraints.
|
The fix seems correct to me. To elaborate, assume we have 2 child intervals; A and B. Propagation of TRUE over Eq is done like that: We prune A such: "choose a point in A. If that point cannot be satisfied by any point from B, then that point is pruned." So, what I think now is, we don't actually require interval sets for this problem, only except this case: A is a set which strictly contains B, and B is a single-point value. In that case, we need to split the interval A into 2 parts at the intersection point (or vice versa for B and A). However, leaving the fix as is is not a big problem. We will only not be capable of a tiny optimization. But to not introduce another bug, the only case which should keep as infeasible is, having 2 same single-point values. In that case, having an equality as false is impossible. cc @ozankabak if I missed anything |
|
@pepijnve @berkaysynnada Thanks for the review! I agree, that makes sense. I will update the PR to handle the specific case where both sides are single points (constants). In that case, if For other interval cases, I'll keep the safe over-approximation (returning the original intervals) until interval sets are supported. |
Which issue does this PR close?
Closes #19264.
Rationale for this change
The
analyzefunction inphysical-exprwas incorrectly returningNone(implying an empty set/infeasibility) when analyzingNOT(Eq)expressions (e.g.,NOT(a = 0.0)).This was caused by
propagate_comparisonincp_solver.rsreturningOk(None)forOperator::Eqwhen the parent interval isFALSE.Ok(None)incp_solveris interpreted as "Infeasible", butNOT(a=b)is a feasible condition (it effectively creates two disjoint intervals, which the currentIntervalimplementation cannot represent as a single interval, but it is certainly not empty).What changes are included in this PR?
propagate_comparisonincp_solver.rs. When handlingOperator::Eqwith aFALSEparent (meaning we are looking for "Not Equal"), it now returns the original intervals (identity) instead ofNone. This treats the constraint as valid/feasible but properly reflects that we cannot refine the bounds further with single-interval arithmetic.analyze_not_eq_aroundindatafusion/physical-expr/src/analysis.rsto verify thatNOT(a = 0.0)now returns a valid interval encompassing the domain instead ofNone.Are these changes tested?
Yes, tested locally.
cargo test -p datafusion-physical-expr --lib analysis::testsanalyze_not_eq_aroundpasses, and existing tests inanalysis.rsalso pass.Are there any user-facing changes?
No, this is a bug fix in the physical expression analysis logic.