Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions core/src/main/java/org/apache/iceberg/DataFiles.java
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,11 @@ public Builder withSortOrder(SortOrder newSortOrder) {
return this;
}

public Builder withSortOrderId(int newSortOrderId) {
this.sortOrderId = newSortOrderId;
return this;
Comment on lines +325 to +326
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need assertions if the withSortOrder is setting some different sort order id ?

worth adding tests for someone who accidentally use both the APIs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar situation with org.apache.iceberg.DataFiles.Builder#withRecordCount vs org.apache.iceberg.DataFiles.Builder#withMetrics. I kept it simple since we don't try to validate those either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a caller uses both withSortOrder and withSortOrderId and the IDs differ, could that mislead planning or optimizations that rely on sortOrderId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is that with either API, the input should be the correct one. It would be a bug in the caller if either API were given the wrong input.
If there is still concern about this, then may be I can mark org.apache.iceberg.DataFiles.Builder#withSortOrder as deprecated to encourage callers to just use withSortOrderId ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need assertions if the withSortOrder is setting some different sort order id ?

worth adding tests for someone who accidentally use both the APIs

This is not different than calling withSortOrder twice. The second call wins. That's also how builders works typically.

}

public Builder withFirstRowId(Long nextRowId) {
this.firstRowId = nextRowId;
return this;
Expand Down
11 changes: 11 additions & 0 deletions core/src/test/java/org/apache/iceberg/ScanTestBase.java
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,17 @@ public void testDataFileSorted() throws Exception {
.build())
.commit();

table
.newFastAppend()
.appendFile(
DataFiles.builder(PartitionSpec.unpartitioned())
.withPath("/path/to/data/b.parquet")
.withFileSizeInBytes(10)
.withRecordCount(1)
.withSortOrderId(1)
.build())
.commit();

TableScan scan = table.newScan();
try (CloseableIterable<FileScanTask> tasks = scan.planFiles()) {
for (FileScanTask fileScanTask : tasks) {
Expand Down