-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: impl handle_child_pushdown_result for UnionExec
#20145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| OptimizationTest: | ||
| input: | ||
| - FilterExec: a@0 = foo | ||
| - UnionExec | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false | ||
| output: | ||
| Ok: | ||
| - UnionExec | ||
| - FilterExec: a@0 = foo | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false | ||
| - FilterExec: a@0 = foo | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think even for this case, pushdown filter do not have some bad effect.
| OptimizationTest: | ||
| input: | ||
| - FilterExec: a@0 = foo | ||
| - UnionExec | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=true | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false | ||
| output: | ||
| Ok: | ||
| - UnionExec | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=true, predicate=a@0 = foo | ||
| - FilterExec: a@0 = foo | ||
| - DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is main purpose for this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one really seems like it should be reproducible from an SLT test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me try, maybe one memory and one parquet file can reproduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // For non-Pre phase, use default behavior | ||
| if !matches!(phase, FilterPushdownPhase::Pre) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the Pre / Post phase need different behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm thinking the purpose for this pr can only happen in the pre phase, post phase is for dynamic filter, seem like not related, so i keep the default behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The intuition here is to let the creator of the filter decide what to do with it.
I don't like that it makes assumptions about the implementation / what the creators of the filter want to do, but I don't see a better way to handle this.
I don't think forcing creation of the FilterExec would be good at least as things currently stand.
But we should add a comment explaining this.
| return Ok(FilterPushdownPropagation::if_all(child_pushdown_result)); | ||
| } | ||
|
|
||
| // Collect unsupported filters for each child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle we could make UnionExec transparent similar to CoalesceBatchesExec.
Help me understand why we are adding the FilterExec here.
My guess is that you are trying to address the case of Child2 supporting pushdown but Child1 not supporting it.
Without this specialized implementation we would get:
FilterExec
UnionExec
Child1
Child2
i.e. no changes to the plan, not incorrect but we are applying filters to the output of Child2 that are unnecessary (it already applied these filters)
With this logic we get:
UnionExec
FilterExec
Child1
Child2
Which skips re-applying filters to the output of Child2.
Is this interpretation correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, thank you for such good explain👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add something along these lines as a comment justifying the added complexity?
|
Thanks for your reviews @adriangb , i add the slt test case for reproduce. |
Which issue does this PR close?
Closes #20144
Rationale for this change
see #20144
What changes are included in this PR?
This PR impl
handle_child_pushdown_resultforUnionExec, for any case, the filter will always pushdown to UnionExecAre these changes tested?
yes, add two test cases
Are there any user-facing changes?