-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What would you like to happen?
This PR enabled column pruning support in Managed Iceberg IO.
We as users were able to use the "keep" / "drop" property to define the columns that we wanted to fetch from Iceberg.
I wanted to know, if it is possible to define a nested column in this property and fetch only the nested column from Iceberg rather than the entire struct. Can we enable that feature in managed Iceberg IO in apache beam.
Something like this
- keep: ["colA.colB", "colE.colC"]
When i tried doing the above, i faced the following error -
26/02/03 11:34:23 ERROR IcebergIO: Error reading from Iceberg tables: Invalid source configuration: 'keep' specifies unknown field(s): [data.entry.field1.field2] java.lang.IllegalArgumentException: Invalid source configuration: 'keep' specifies unknown field(s): [data.entry.field1.field2] at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143) at org.apache.beam.sdk.io.iceberg.IcebergScanConfig.validate(IcebergScanConfig.java:332) at org.apache.beam.sdk.io.iceberg.IcebergIO$ReadRows.expand(IcebergIO.java:608)
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner