Skip to content

Conversation

@haohuaijin
Copy link
Contributor

Which issue does this PR close?

Closes #20144

Rationale for this change

see #20144

What changes are included in this PR?

This PR impl handle_child_pushdown_result for UnionExec, for any case, the filter will always pushdown to UnionExec

Are these changes tested?

yes, add two test cases

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Feb 4, 2026
Comment on lines +1855 to +1867
OptimizationTest:
input:
- FilterExec: a@0 = foo
- UnionExec
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false
output:
Ok:
- UnionExec
- FilterExec: a@0 = foo
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false
- FilterExec: a@0 = foo
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think even for this case, pushdown filter do not have some bad effect.

Comment on lines +1825 to +1836
OptimizationTest:
input:
- FilterExec: a@0 = foo
- UnionExec
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=true
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false
output:
Ok:
- UnionExec
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=true, predicate=a@0 = foo
- FilterExec: a@0 = foo
- DataSourceExec: file_groups={1 group: [[test.parquet]]}, projection=[a, b, c], file_type=test, pushdown_supported=false
Copy link
Contributor Author

@haohuaijin haohuaijin Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is main purpose for this pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one really seems like it should be reproducible from an SLT test

Copy link
Contributor Author

@haohuaijin haohuaijin Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try, maybe one memory and one parquet file can reproduce

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +386 to +387
// For non-Pre phase, use default behavior
if !matches!(phase, FilterPushdownPhase::Pre) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the Pre / Post phase need different behavior?

Copy link
Contributor Author

@haohuaijin haohuaijin Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm thinking the purpose for this pr can only happen in the pre phase, post phase is for dynamic filter, seem like not related, so i keep the default behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The intuition here is to let the creator of the filter decide what to do with it.
I don't like that it makes assumptions about the implementation / what the creators of the filter want to do, but I don't see a better way to handle this.
I don't think forcing creation of the FilterExec would be good at least as things currently stand.

But we should add a comment explaining this.

return Ok(FilterPushdownPropagation::if_all(child_pushdown_result));
}

// Collect unsupported filters for each child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle we could make UnionExec transparent similar to CoalesceBatchesExec.

Help me understand why we are adding the FilterExec here.
My guess is that you are trying to address the case of Child2 supporting pushdown but Child1 not supporting it.
Without this specialized implementation we would get:

FilterExec
  UnionExec
     Child1
     Child2

i.e. no changes to the plan, not incorrect but we are applying filters to the output of Child2 that are unnecessary (it already applied these filters)

With this logic we get:

UnionExec
  FilterExec
    Child1
  Child2

Which skips re-applying filters to the output of Child2.

Is this interpretation correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, thank you for such good explain👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add something along these lines as a comment justifying the added complexity?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 4, 2026
@haohuaijin haohuaijin requested a review from adriangb February 4, 2026 15:51
@haohuaijin
Copy link
Contributor Author

haohuaijin commented Feb 4, 2026

Thanks for your reviews @adriangb , i add the slt test case for reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pushdown filter though UnionExec when some child support pushdown and some child do not support

2 participants