Skip to content

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Feb 3, 2026

Changelog category (leave one):

  • Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Remove non-upstream protocol versions

Documentation entry for user-facing changes

Solved #1236

Read optimization using Iceberg metadata was introduced in #1019. This optimization required to send metadata with columns information on replicas for cluster requests.

In first iteration in Antalya 25.6 this was made by change file path on JSON with file path and additional metadata.

In Antalya 25.8 upstream adds protocol version for cluster functions to give ability to send additional information on replicas. This was used in frontport #1069, new version with column metadata was added, and metadata are sent as protocol field.

But this change was not accepted by upstream, and upstream added other features with the same protocol version. This produces incompatibility between Antalya and upstream versions, and make a incompatibility with new Antalya versions in future after rebase on newer code.

This PR removes protocol version and returns using of JSON to send column metadata. This is weird, but must fix version compatibility.

Combination with Antalya as initiator and upstream as replicas still not working, but just due to feature. Feature must be turned off with setting allow_experimental_iceberg_read_optimization=0 if need to use this configuration by some reason.

  • Documentation is written (mandatory for new features)

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fcdf3f78f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 198 to 199
if (column.second.hyperrectangle.has_value())
column_info->set("range", column.second.hyperrectangle.value().serialize());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid embedding binary range blobs in JSON

DataFileMetaInfo::toJson stores Range::serialize() directly as a JSON string, but Range::serialize writes binary writeFieldBinary data (see Range.cpp), which can include NUL/control bytes for many field types. When Poco::JSON::Stringifier emits this, those bytes may be truncated or produce invalid JSON, and DataFileMetaInfo(Poco::JSON::Object::Ptr) will then fail to reconstruct ranges (or log warnings and drop them). This affects distributed reads with Iceberg metadata where any column has a non‑trivial range, so consider encoding the binary blob (e.g., Base64) or switching to a text-safe representation before putting it in JSON.

Useful? React with 👍 / 👎.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Workflow [PR], commit [d61af62]

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a4dfad2aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 184 to 191
if (iceberg_read_optimization_enabled)
{
auto file_meta_info = object_info->getFileMetaInfo();
if (file_meta_info.has_value())
{
auto file_path = send_over_whole_archive ? file_identifier : object_info->getPath();
object_info->command.setFilePath(file_path);
object_info->command.setFileMetaInfo(file_meta_info.value());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use real file path, not identifier, when sending JSON command

When Iceberg optimization is enabled, the command JSON uses file_identifier as file_path whenever send_over_whole_archive is true. For non-archive files file_identifier comes from PathWithMetadata::getIdentifier(), which prefers absolute_path and can append file_bucket_info for hashing. That identifier is not a valid object key for reads, so replicas will reconstruct relative_path from it and attempt to read the wrong path (e.g., absolute URI or bucket-suffixed key), causing file-not-found or misrouted reads. This only shows up when file_meta_info is present, but that’s exactly the Iceberg optimization case. Use the actual object path (getPath() or getPathOrPathToArchiveIfArchive() for archives) instead of the identifier when populating the command.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

auto object = std::make_shared<ObjectInfo>(path);
object->data_lake_metadata = data_lake_metadata;
object->file_meta_info = file_meta_info;

P2 Badge Preserve metadata parsed from command path

With the protocol field removed, replicas now have to recover file_meta_info from the JSON embedded in path (via PathWithMetadata's constructor). However getObjectInfo() overwrites that parsed value unconditionally with the response's file_meta_info, which is always empty after deserialize() no longer fills it. In the JSON-command case (Iceberg optimization enabled), this clears the metadata and disables column-pruning on the receiver. Consider only assigning object->file_meta_info when the response field is set, or skipping this assignment entirely when the path is a command JSON.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru
Copy link
Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants