Avoid always adding columns as optional on schema evolution#15205
Open
kumarpritam863 wants to merge 13 commits intoapache:mainfrom
Open
Avoid always adding columns as optional on schema evolution#15205kumarpritam863 wants to merge 13 commits intoapache:mainfrom
kumarpritam863 wants to merge 13 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes an inconsistency in schema evolution where new columns were always added as optional, regardless of the Connect schema's nullability specification. The fix ensures that column nullability is handled consistently whether fields are present at table creation or added later via schema evolution.
Changes:
- Modified
SchemaUpdate.AddColumnto track nullability via a newisOptionalboolean field - Updated
SchemaUtils.commitSchemaUpdates()to call eitheraddColumn()oraddRequiredColumn()based on the field's optionality - Added comprehensive test coverage for optional and required column handling in both flat and nested structures
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| SchemaUpdate.java | Added isOptional field and parameter to AddColumn class to track nullability |
| SchemaUtils.java | Modified commit logic to respect column optionality when applying schema updates |
| RecordConverter.java | Updated addColumn() calls to pass nullability information based on Connect schema |
| TestSchemaUpdate.java | Added test coverage for required column handling |
| TestSchemaUtils.java | Updated existing tests and added new tests for required column scenarios |
| TestRecordConverter.java | Added comprehensive tests for various column optionality scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When performing schema evolution (adding a new field), the code currently calls
addColumn()→addInternalColumn(), which always marks the new column asOPTIONAL(i.e. nullable = true), regardless of what the Connect schema says.This behaviour is inconsistent with table creation:
schema.isOptional()(and the related configuration) to decide whether a column should beREQUIREDorOPTIONAL.OPTIONAL.As a result:
OPTIONAL→ different physical schema.Solution
This PR aligns schema-evolution nullability handling with the logic used during initial table creation.
Now the same code path (respecting
ConnectSchema.isOptional()and the configured default for required fields) is used in both cases, eliminating the dual behaviour and guaranteeing that the resulting Parquet files have identical column nullability no matter whether the field was present at creation time or added later via schema evolution.Result
OPTIONALcolumns when the source schema says the field is required