Avoid always adding columns as optional on schema evolution by kumarpritam863 · Pull Request #15205 · apache/iceberg

kumarpritam863 · 2026-02-01T10:59:22Z

Problem

When performing schema evolution (adding a new field), the code currently calls addColumn() → addInternalColumn(), which always marks the new column as OPTIONAL (i.e. nullable = true), regardless of what the Connect schema says.

This behaviour is inconsistent with table creation:

On initial table creation we correctly respect schema.isOptional() (and the related configuration) to decide whether a column should be REQUIRED or OPTIONAL.
On schema evolution we ignore that and force the new column to be OPTIONAL.

As a result:

If you create the table directly with schema v2 (which already contains the new field), the column is created with the correct nullability.
If you create the table with v1 and later evolve to v2, the same column is added as OPTIONAL → different physical schema.

Solution

This PR aligns schema-evolution nullability handling with the logic used during initial table creation.

Now the same code path (respecting ConnectSchema.isOptional() and the configured default for required fields) is used in both cases, eliminating the dual behaviour and guaranteeing that the resulting Parquet files have identical column nullability no matter whether the field was present at creation time or added later via schema evolution.

Result

Consistent Parquet schema across creation and evolution paths
New columns respect the nullability declared in the Connect schema
No more surprise OPTIONAL columns when the source schema says the field is required

This reverts commit 67619ec.

This reverts commit c0a2665.

Copilot

Pull request overview

This PR fixes an inconsistency in schema evolution where new columns were always added as optional, regardless of the Connect schema's nullability specification. The fix ensures that column nullability is handled consistently whether fields are present at table creation or added later via schema evolution.

Changes:

Modified SchemaUpdate.AddColumn to track nullability via a new isOptional boolean field
Updated SchemaUtils.commitSchemaUpdates() to call either addColumn() or addRequiredColumn() based on the field's optionality
Added comprehensive test coverage for optional and required column handling in both flat and nested structures

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
SchemaUpdate.java	Added `isOptional` field and parameter to `AddColumn` class to track nullability
SchemaUtils.java	Modified commit logic to respect column optionality when applying schema updates
RecordConverter.java	Updated `addColumn()` calls to pass nullability information based on Connect schema
TestSchemaUpdate.java	Added test coverage for required column handling
TestSchemaUtils.java	Updated existing tests and added new tests for required column scenarios
TestRecordConverter.java	Added comprehensive tests for various column optionality scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pritam Kumar Mishra and others added 13 commits August 9, 2025 11:27

added metadat and data path in case of dynamic routing

c0a2665

spotless

67619ec

Revert "spotless"

6b15ae4

This reverts commit 67619ec.

Revert "added metadat and data path in case of dynamic routing"

8398e4c

This reverts commit c0a2665.

Merge branch 'apache:main' into main

fbf52a9

Merge branch 'apache:main' into main

c92ec66

Merge branch 'apache:main' into main

9392a6d

Merge branch 'apache:main' into main

ecd8b55

Merge branch 'apache:main' into main

5e76e04

Merge branch 'apache:main' into main

a1ec7e6

Merge branch 'apache:main' into main

4eaf70b

Merge branch 'apache:main' into main

1508513

Add option to add required column

f081752

github-actions bot added the KAFKACONNECT label Feb 1, 2026

manuzhang requested a review from Copilot February 1, 2026 13:29

Copilot AI reviewed Feb 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid always adding columns as optional on schema evolution#15205

Avoid always adding columns as optional on schema evolution#15205
kumarpritam863 wants to merge 13 commits intoapache:mainfrom
kumarpritam863:take_column_optional_value_from_schema

kumarpritam863 commented Feb 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kumarpritam863 commented Feb 1, 2026

Problem

Solution

Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant