Skip to content

[Feature Request]: Enable Nested Column support in Column Pruning in Iceberg IO #37486

@barunkumaracharya

Description

@barunkumaracharya

What would you like to happen?

This PR enabled column pruning support in Managed Iceberg IO.
We as users were able to use the "keep" / "drop" property to define the columns that we wanted to fetch from Iceberg.
I wanted to know, if it is possible to define a nested column in this property and fetch only the nested column from Iceberg rather than the entire struct. Can we enable that feature in managed Iceberg IO in apache beam.

Something like this

  • keep: ["colA.colB", "colE.colC"]

When i tried doing the above, i faced the following error -
26/02/03 11:34:23 ERROR IcebergIO: Error reading from Iceberg tables: Invalid source configuration: 'keep' specifies unknown field(s): [data.entry.field1.field2] java.lang.IllegalArgumentException: Invalid source configuration: 'keep' specifies unknown field(s): [data.entry.field1.field2] at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143) at org.apache.beam.sdk.io.iceberg.IcebergScanConfig.validate(IcebergScanConfig.java:332) at org.apache.beam.sdk.io.iceberg.IcebergIO$ReadRows.expand(IcebergIO.java:608)

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions