-
Notifications
You must be signed in to change notification settings - Fork 14
Remove non-upstream protocol versions #1360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: antalya-25.8
Are you sure you want to change the base?
Remove non-upstream protocol versions #1360
Conversation
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fcdf3f78f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (column.second.hyperrectangle.has_value()) | ||
| column_info->set("range", column.second.hyperrectangle.value().serialize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid embedding binary range blobs in JSON
DataFileMetaInfo::toJson stores Range::serialize() directly as a JSON string, but Range::serialize writes binary writeFieldBinary data (see Range.cpp), which can include NUL/control bytes for many field types. When Poco::JSON::Stringifier emits this, those bytes may be truncated or produce invalid JSON, and DataFileMetaInfo(Poco::JSON::Object::Ptr) will then fail to reconstruct ranges (or log warnings and drop them). This affects distributed reads with Iceberg metadata where any column has a non‑trivial range, so consider encoding the binary blob (e.g., Base64) or switching to a text-safe representation before putting it in JSON.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a4dfad2aa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (iceberg_read_optimization_enabled) | ||
| { | ||
| auto file_meta_info = object_info->getFileMetaInfo(); | ||
| if (file_meta_info.has_value()) | ||
| { | ||
| auto file_path = send_over_whole_archive ? file_identifier : object_info->getPath(); | ||
| object_info->command.setFilePath(file_path); | ||
| object_info->command.setFileMetaInfo(file_meta_info.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use real file path, not identifier, when sending JSON command
When Iceberg optimization is enabled, the command JSON uses file_identifier as file_path whenever send_over_whole_archive is true. For non-archive files file_identifier comes from PathWithMetadata::getIdentifier(), which prefers absolute_path and can append file_bucket_info for hashing. That identifier is not a valid object key for reads, so replicas will reconstruct relative_path from it and attempt to read the wrong path (e.g., absolute URI or bucket-suffixed key), causing file-not-found or misrouted reads. This only shows up when file_meta_info is present, but that’s exactly the Iceberg optimization case. Use the actual object path (getPath() or getPathOrPathToArchiveIfArchive() for archives) instead of the identifier when populating the command.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
ClickHouse/src/Interpreters/ClusterFunctionReadTask.cpp
Lines 59 to 61 in f9ab0c6
| auto object = std::make_shared<ObjectInfo>(path); | |
| object->data_lake_metadata = data_lake_metadata; | |
| object->file_meta_info = file_meta_info; |
With the protocol field removed, replicas now have to recover file_meta_info from the JSON embedded in path (via PathWithMetadata's constructor). However getObjectInfo() overwrites that parsed value unconditionally with the response's file_meta_info, which is always empty after deserialize() no longer fills it. In the JSON-command case (Iceberg optimization enabled), this clears the metadata and disables column-pruning on the receiver. Consider only assigning object->file_meta_info when the response field is set, or skipping this assignment entirely when the path is a command JSON.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Remove non-upstream protocol versions
Documentation entry for user-facing changes
Solved #1236
Read optimization using Iceberg metadata was introduced in #1019. This optimization required to send metadata with columns information on replicas for cluster requests.
In first iteration in Antalya 25.6 this was made by change file path on JSON with file path and additional metadata.
In Antalya 25.8 upstream adds protocol version for cluster functions to give ability to send additional information on replicas. This was used in frontport #1069, new version with column metadata was added, and metadata are sent as protocol field.
But this change was not accepted by upstream, and upstream added other features with the same protocol version. This produces incompatibility between Antalya and upstream versions, and make a incompatibility with new Antalya versions in future after rebase on newer code.
This PR removes protocol version and returns using of JSON to send column metadata. This is weird, but must fix version compatibility.
Combination with Antalya as initiator and upstream as replicas still not working, but just due to feature. Feature must be turned off with setting
allow_experimental_iceberg_read_optimization=0if need to use this configuration by some reason.