Add support for Azure Blob Storage and ADLS Gen2 in Hive connector by mehradpk · Pull Request #1 · nishithakbhaskaran/presto-oss-fix

mehradpk · 2025-05-12T16:26:17Z

Description

Introduce support for Azure storage backends including Azure Blob Storage (using the wasb:// scheme) and Azure Data Lake Storage Gen2 (using the abfs:// scheme) in the Hive connector.

Key changes:

Added HiveAzureConfigurationInitializer to inject relevant Azure configurations into Hadoop Configuration
Introduced HiveAzureConfig to allow catalog-level configuration of Azure properties
Updated HdfsConfigurationInitializer and HiveConnectorFactory to delegate Azure-specific config setup
Registered configuration initializer in Hive module

Supports shared key and OAuth2-based authentication.

Motivation and Context

Several enterprise data lake workloads are hosted on Azure storage platforms. This change allows Presto to directly query data from Azure Blob Storage and ADLS Gen2, bringing Azure compatibility in line with other cloud storage systems like Amazon S3 and Google Cloud Storage.

Impact

Adds support for Azure cloud storage within the Hive connector
Introduces new configuration properties under HiveAzureConfig

No breaking changes to existing Hive catalogs or connectors

Test Plan

Test done via Hive Connector.

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

== RELEASE NOTES ==

Hive Connector Changes
* Add support for Azure Blob Storage and Azure Data Lake Storage Gen2 in the Hive connector
* Supports wasb[s]:// and abfs[s]:// URI schemes
* Allows shared key and OAuth2 authentication for Azure storage

imjalpreet

@mehradpk Thank you for the PR, can you raise a draft PR from your branch in OSS as well? I want to see if there are any test failures.

@nishithakbhaskaran can you take a first pass at reviewing this?

nishithakbhaskaran · 2025-05-13T11:49:00Z

@mehradpk Changes looks good.
@imjalpreet Just checking , do we require extra tests for this or current tests is enough?

github-actions · 2025-10-06T06:38:23Z

Codenotify: Notifying subscribers in CODENOTIFY files for diff cb0c461...7b41678.

Notify	File(s)
@aditi-pandit	presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo	presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur	presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel	presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

Summary: finishCpu += operator.getFinishCpu().roundTo(NANOSECONDS); getFinishCpu has underlying issue that causes finisheCpu to overflow thus resulting in a negative result. This causes exception while reporting query completion event leading to stats not being reported. Fix it by setting finishCpu to max value when it overflows # Release Note ``` == NO RELEASE NOTE == ```

…mit metadata for query event listeners (prestodb#26331) Summary: Currently, the `Input` and `Output` query metadata classes retain two source of connector-specific information that can be useful for reporting via an `EventListener`: ``` Optional<Object> connectorInfo; String serializedCommitOutput; ``` * `connectorInfo` can be cast back to the correct type in an `EventListener` implementation, allowing rich access to the underlying data * `serializedCommitOutput` however, is serialized in a given format by the `ConnectorCommitHandle` implementation, which makes it difficult to correctly represent the reporting requirements in an EventListener (which may need correlation with data in the `connectorInfo` result). For example, `HiveCommitHandle` retains the lastDataCommitTime for each partition in a simple array associated with the table name, where the partition names are retained in the `HiveInputInfo` instance carried through in connectorInfo. For these times to be mapped back to individual partitions, the entries must be in the exact same order as the entries in HiveInputInfo. This change simply replaces the `serializedCommitOutput` property with an `Optional<Object>` instance, providing parity with the `connectorInfo`, and allowing `EventListener` implementations to cast the commit handle back to the correct type for richer access to the underlying data. Differential Revision: D84382446 ## Release Notes ``` == RELEASE NOTES == SPI Changes * Replaces the ``String serializedCommitOutput`` argument with ``Optional<Object> commitOutput`` in the ``com.facebook.presto.spi.eventlistener.QueryInputMetadata`` and ``com.facebook.presto.spi.eventlistener.QueryOutputMetadata`` constructors * Adds ``getCommitOutputForRead()`` and ``getCommitOutputForWrite()`` methods to ``ConnectorCommitHandle``, and deprecates the existing ``getSerializedCommitOutputForRead()`` and ``getSerializedCommitOutputForWrite()`` methods ```

…restodb#26557) Summary: Remove the uninitialized bytes in binaryData, so we can reduce the binary response size. {F1983340076} Differential Revision: D85720910 ### RELEASE NOTES ### ``` == RELEASE NOTES == General Changes * Replace the java standard base64 encoder with BaseEncoding from Guava ```

…nicode escapes (prestodb#26443) Summary: Modified `ExpressionFormatter.formatStringLiteral()` to preserve common whitespace characters (newlines, tabs, carriage returns) in their literal form rather than converting them to Unicode escape sequences (e.g., `\000A` for newline). This change improves SQL standard compliance and fixes issues with embedded code (like Python UDF) and regex patterns that require proper whitespace handling. Differential Revision: D85380265 ``` == NO RELEASE NOTE == ```

## Description Current code will try to add a round robin local exchange below the merge join node, which will break the sorted property of the input. In this PR, we fixed it. ## Motivation and Context Bug fix ## Impact Bug fix ## Test Plan Unit test ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```

…stodb#26403) ## Summary This PR introduces sorted exchange functionality to Presto, enabling efficient sort-merge joins by allowing data to be sorted during shuffle operations rather than requiring separate sort steps. This optimization eliminates redundant sorting, reduces memory pressure, and improves query performance for distributed joins and aggregations that require sorted inputs. ## Motivation Currently, when Presto needs to perform a sort-merge join in a distributed query, it must: 1. Shuffle data across workers (ExchangeNode) 2. Explicitly sort the shuffled data (SortNode) This approach is inefficient because sorting happens as a separate operation after data movement. By pushing the sort operation into the exchange itself, we can sort data during the shuffle, eliminating the redundant SortNode and improving overall query performance. ## High-Level Changes 1. Core Infrastructure (3161b24) - Add `orderingScheme` field to `PlanFragment` class (Java) - Add `outputOrderingScheme` field to C++ PlanFragment protocol - Implement JSON serialization/deserialization for C++ integration - Update `PrestoToVeloxQueryPlan.cpp` to consume ordering scheme and convert to sorting keys - Update all `PlanFragment` constructor call sites to support the new field 2. Planner Support (130b14f) - Extend `ExchangeNode` to support SORTED partition type - Update `BasePlanFragmenter` to populate and propagate orderingScheme between fragments - Add `PlanFragmenterUtils` support for sorted exchanges - Enhance `PlanPrinter` to display sorted exchange information in EXPLAIN output 3. Optimizer Rule (6951cab) - Introduce SortedExchangeRule optimizer that identifies and transforms Sort→Exchange patterns - Add `sorted_exchange_enabled` session property (experimental, default: false) - Add `optimizer.experimental.sorted-exchange-enabled` configuration property - Integrate into optimizer pipeline alongside existing join optimizers - Only applies to REMOTE REPARTITION exchanges - Validates ordering variables are available in exchange output 4. Spark Integration (960bc93) - Update `AbstractPrestoSparkQueryExecution` to handle sorted exchanges - Add `MutablePartitionIdOrdering` class to track partition ordering in Spark - Update `PrestoSparkRddFactory` to preserve sort order during shuffles - Enable Spark-based queries to leverage sorted exchanges ## Plan Transformation Example Before: ``` SortNode(orderBy: [a, b]) └─ ExchangeNode(type: REPARTITION, scope: REMOTE) ``` After: ``` ExchangeNode(type: REPARTITION, scope: REMOTE, orderingScheme: [a, b]) ``` ## Configuration The feature is controlled by: - Session property: enable_sorted_exchanges (experimental, default: false) - Config property: experimental.optimizer.sorted-exchange-enabled ## Testing - Added TestSortedExchangeRule with test cases covering various scenarios ## Performance Benefits - Reduced sorting overhead: Eliminates redundant SortNode operations - Lower memory usage: Avoids buffering data for explicit sorting Backward Compatibility - Feature is disabled by default (experimental flag) - All existing queries continue to work without modification - No breaking changes to public APIs - Graceful degradation when feature is disabled ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * Add experimental support for sorted exchanges to improve sort-merge join performance. When enabled via the `sorted_exchange_enabled` session property or `experimental.optimizer.sorted-exchange-enabled` configuration property, the query planner will push sort operations into exchange nodes, eliminating redundant sorting steps and reducing memory usage for distributed queries with sort-merge joins. This feature is disabled by default.

Summary: Impl sort key for LocalShuffleWriter Differential Revision: D86322593

…restodb#27009) Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. CVE-2025-13465 <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/lodash/lodash/commit/dec55b7a3b382da075e2eac90089b4cd00a26cbb"><code>dec55b7</code></a> Bump main to v4.17.23 (<a href="https://redirect.github.com/lodash/lodash/issues/6088">#6088</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/19c9251b3631d7cf220b43bc757eb33f1084f117"><code>19c9251</code></a> fix: setCacheHas JSDoc return type should be boolean (<a href="https://redirect.github.com/lodash/lodash/issues/6071">#6071</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/b5e672995ae26929d111a6e94589f8d03fb8e578"><code>b5e6729</code></a> jsdoc: Add -0 and BigInt zeros to _.compact falsey values list (<a href="https://redirect.github.com/lodash/lodash/issues/6062">#6062</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/edadd452146f7e4bad4ea684e955708931d84d81"><code>edadd45</code></a> Prevent prototype pollution on baseUnset function</li> <li><a href="https://github.com/lodash/lodash/commit/4879a7a7d0a4494b0e83c7fa21bcc9fc6e7f1a6d"><code>4879a7a</code></a> doc: fix autoLink function, conversion of source links (<a href="https://redirect.github.com/lodash/lodash/issues/6056">#6056</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/9648f692b0fc7c2f6a7a763d754377200126c2e8"><code>9648f69</code></a> chore: remove <code>yarn.lock</code> file (<a href="https://redirect.github.com/lodash/lodash/issues/6053">#6053</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/dfa407db0bf5b200f2c7a9e4f06830ceaf074be9"><code>dfa407d</code></a> ci: remove legacy configuration files (<a href="https://redirect.github.com/lodash/lodash/issues/6052">#6052</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/156e1965ae78b121a88f81178ab81632304e8d64"><code>156e196</code></a> feat: add renovate setup (<a href="https://redirect.github.com/lodash/lodash/issues/6039">#6039</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/933e1061b8c344d3fc742cdc400175d5ffc99bce"><code>933e106</code></a> ci: add pipeline for Bun (<a href="https://redirect.github.com/lodash/lodash/issues/6023">#6023</a>)</li> <li><a href="https://github.com/lodash/lodash/commit/072a807ff7ad8ffc7c1d2c3097266e815d138e20"><code>072a807</code></a> docs: update links related to Open JS Foundation (<a href="https://redirect.github.com/lodash/lodash/issues/5968">#5968</a>)</li> <li>Additional commits viewable in <a href="https://github.com/lodash/lodash/compare/4.17.21...4.17.23">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lodash&package-manager=npm_and_yarn&previous-version=4.17.21&new-version=4.17.23)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/prestodb/presto/network/alerts). </details> ``` == RELEASE NOTES == Security Changes * Upgrade lodash from 4.17.21 to 4.17.23 to address `CVE-2025-13465 <https://github.com/advisories/GHSA-xxjr-mmjv-4gpg>`_. ``` Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…#26718) ## Description Added new documentation explaining how to use the Presto C++ engine. The documentation provides step-by-step instructions for configuring, and running the Presto C++ worker ## Motivation and Context There was no consolidated or beginner-friendly documentation for Presto C++ in the open-source project. Users often had difficulty understanding how to build and run the C++ worker, what dependencies were required, and how it integrates with a Presto coordinator. ## Impact There is no performance impact. ## Test Plan  ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes ``` == NO RELEASE NOTE == ```

## Description Upgrade postgresql to version 42.7.9 ## Motivation and Context Using a more recent version helps avoid potential vulnerabilities and ensures we aren't relying on outdated or unsupported code. ## Impact  ## Test Plan  ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```

…ons to sidecar for expression optimization (prestodb#27043) ## Description Avoid sending aggregate and window functions to sidecar for expression optimization. ## Motivation and Context Encountered while investigating prestodb#26920. The bug reported in the issue is different but the general idea is we should avoid sending aggregate and window functions to sidecar as they cannot be constant folded. The failing queries in the issue are added as test cases. ## Impact No impact. ## Test Plan Unit tests, CI. ``` == NO RELEASE NOTE == ```

## Description This PR adds subfield pushdown optimization for the `cardinality()` function in Presto. When enabled, this optimization allows the query engine to skip reading map keys/values or array elements when only the cardinality (count) of these collections is needed. This PR contains coordinator-side changes only; the corresponding worker-side changes will be added separately to the C++ worker. Since this feature is not yet fully tested end-to-end with the worker, the session property is disabled by default. Additionally, this implementation takes a conservative approach to subfield pushdown for cardinality: if a column already has other subfields being accessed (e.g., `features['key']`), we skip adding the structure-only subfield for cardinality to avoid potential correctness issues. Key Changes: 1. New StructureOnly PathElement (Subfield.java): Introduced a new path element type represented as [$] that indicates only the structural metadata (size/count) is needed, not the actual content 2. SubfieldTokenizer Update: Added parsing support for the $ subscript pattern in subfield paths 3. FunctionResolution: Added isCardinalityFunction() method to identify cardinality function calls 4. PushdownSubfields Optimizer: Extended the subfield extraction logic to recognize cardinality() calls on maps and arrays, generating [$] subfield hints that downstream readers can use to skip content 5. Session/Config Properties: Added pushdown_subfields_for_cardinality configuration option (disabled by default) ## Motivation and Context When queries only need to know the size of a map or array (e.g., `SELECT cardinality(features) FROM table or WHERE cardinality(tags) > 10), there's no need to read all the keys/values or both. This optimization helps reduce shuffles improve the query performance. ## Impact - Performance: Reduces I/O and deserialization overhead for queries using cardinality() on maps/arrays - Backward Compatible: Feature is disabled by default via optimizer.pushdown-subfield-for-cardinality config - No Breaking Changes: Existing behavior is preserved when the feature is disabled - Added a new session property `pushdown-subfield-for-cardinality` ## Test Plan Added comprehensive unit tests in TestHiveLogicalPlanner.java covering: - Simple cardinality pushdown for MAP - Verifies cardinality(x) generates x[$] subfield - Cardinality pushdown for ARRAY - Verifies array cardinality generates correct subfield - Cardinality in WHERE clause - Tests WHERE cardinality(features) > 10 - Cardinality in aggregation - Tests AVG(cardinality(data)) - Multiple cardinalities - Tests multiple cardinality calls in same query - Cardinality with complex expressions - Tests cardinality(tags) * 2 - Cardinality on nested structures - Tests transform(arr_of_maps, m -> cardinality(m)) - Cardinality combined with subscript access - Verifies that when both cardinality(features) and features['key'] are used, the specific subscript takes precedence (avoiding redundant structure-only reads) ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [x] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes ``` == NO RELEASE NOTE == ```

…restodb#27044) ## Description For remote functions, sometimes we want to limit the concurrency to avoid throttling the remote service. In this PR, I added session properties to set the number of tasks for a remote projection, so the plan will be like: scan -> remote exchange (with specified number of tasks) -> remote project node -> remote exchange -> output The remote project will run in a separate stage. There are two session properties, `remote_function_fixed_parallelism_task_count` to specify how many tasks to use `remote_function_names_for_fixed_parallelism` to specify the pattern of remote function names to match. ## Motivation and Context As in description ## Impact To control the number of tasks for a remote project node ## Test Plan unit tests and local end to end test ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * Add options to control the number of tasks for remote project node ``` ## Summary by Sourcery Add configurable fixed-parallelism support for remote function projections and wire it through planning, partitioning, and session properties. New Features: - Introduce session and config properties to control fixed parallelism for selected remote functions via regex-matched names and an optional task count. - Extend exchange planning to insert bounded round-robin remote exchanges around qualifying remote project nodes based on the configured properties. Enhancements: - Augment system partitioning handles and exchange nodes to carry an optional partition count for fixed distributions and honor it when selecting nodes. Tests: - Add planner and configuration tests covering regex matching behavior for remote-function fixed parallelism and property mappings for the new optimizer settings.

## Description The Provisio plugin dumps all the native plugins under `native-plugin/` and not` native-plugins/`. ## Motivation and Context See attached screenshot for <img width="416" height="249" alt="Screenshot 2026-01-29 at 10 23 38 AM" src="https://github.com/user-attachments/assets/8edf2856-d61d-4b0c-8d27-20712c0ad044" /> ## Impact No user impact ## Test Plan Docs only change ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```

…talog (prestodb#26958)

…restodb#27050) ## Description Due to Iceberg issue apache/iceberg#15128, using a binary type as a partition column may cause incorrect calculation of partition bounds in the generated manifest files when deleting data files. This can lead to incorrect results in subsequent queries. Therefore, we temporarily disables metadata deletion and filter thoroughly pushdown for varbinary columns. This restrict can be lifted once the Iceberg issue is resolved. ## Motivation and Context Fix the bug when use varbinary columns as partition columns in Iceberg ## Impact This change is not visible to users. ## Test Plan - Newly added test case in `IcebergDistributedTestBase.testPartitionedByVarbinaryType` through `@DataProvider`, which would explicitly fail without this fix. ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Guard Iceberg plan optimization from enforcing metadata constraints on VARBINARY-partitioned columns and strengthen test coverage for varbinary partitioning behavior. Bug Fixes: - Avoid pushing down column constraints into Iceberg partition specs for VARBINARY columns to prevent incorrect metadata-based deletions and query results when varbinary is used as a partition key. Tests: - Extend the varbinary partitioning integration test to cover multiple insert value orderings and updated expected partition counts via a TestNG data provider.

…in AddLocalExchanges (prestodb#26960) We observed that the use of parent preference in AddLocalExchanges can limit parallelism when the cardinality of the partition column of parent preference is low. In a setup where a query is allowed to use many cores, limiting the parallelism significantly affect the query latency. More details can be found in prestodb#26961. This PR makes three changes: * This PR introduces a new feature config `localExchangeParentPreferenceStrategy` that has three values: ALWAYS, NEVER, and AUTOMATIC. The default value is ALWAYS (i.e., current behavior). * This PR makes AddLocalExchanges to use parent preference according to the localExchangeParentPreferenceStrategy. When localExchangeParentPreferenceStrategy is ALWAYS, it always uses parent preference. When localExchangeParentPreferenceStrategy is NEVER, it always not uses parent preference. When localExchangeParentPreferenceStrategy is AUTOMATIC, it uses parent preference only when the estimated cardinality is larger than the task concurrency. (If estimated stats is not available, parent preference is not used.) - Notice that the estimated stats is only calculated when localExchangeParentPreferenceStrategy is AUTOMATIC. * This PR adds unit tests of the new config and the change to local-exchange. ## Description  ## Motivation and Context   ## Impact  ## Test Plan  ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Introduce a configurable strategy for using parent preferences in AddLocalExchanges and make local exchange partitioning for aggregations cost-aware based on estimated cardinality and task concurrency. New Features: - Add a local_exchange_parent_preference_strategy session/feature config to control how local exchanges use parent partitioning preferences with options ALWAYS, NEVER, and AUTOMATIC. Enhancements: - Update AddLocalExchanges to optionally use stats-based decisions when applying parent partitioning preferences for aggregation local exchanges, leveraging the existing stats calculator. - Wire the stats calculator into AddLocalExchanges through PlanOptimizers to enable precomputation of plan statistics when the AUTOMATIC strategy is selected. Tests: - Add planner tests validating local exchange behavior under ALWAYS, NEVER, and AUTOMATIC parent preference strategies and different task concurrency settings. - Extend FeaturesConfig tests to cover default and explicit mappings for the new local_exchange_parent_preference_strategy config.

## Description Remove unused code in `presto-hive-metastore` module ## Motivation and Context Remove unused code in `presto-hive-metastore` module ## Impact Maintainance ## Test Plan None ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Enhancements: - Clean up the in-memory caching Hive metastore by removing an unused method for invalidating stale partitions.

## Description Earlier the Iceberg connector did not get linked to a valid page, and this change fixes the issue by correctly mapping it to the Iceberg connector documentation page. ## Motivation and Context The previous documentation link for the Iceberg connector was invalid, which could confuse users trying to navigate to the correct connector documentation. This change ensures the link points to the correct and valid page. ## Impact Documentation-only change. No public API, user-facing behavior, or performance impact. ## Test Plan Verified the updated link points to the correct Iceberg connector documentation page. ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] Adequate tests were added if applicable. - [x] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes ``` == NO RELEASE NOTE == ```

…er writer (prestodb#26989) ## Description For INSERT/CTAS operations on Iceberg tables with a large number of partitions, the partition count per writer can far exceed 100. In such cases, we may want the operation to succeed rather than fail fast—for example, when the data volume is known to be small; or when we are willing to trade off speed for lower memory usage by reducing the configuration values of `parquet_writer_block_size` or `orc_optimized_writer_max_stripe_size`. Currently, the only way to configure this limit is through the connector property `iceberg.max-partitions-per-writer`, which requires a cluster restart to take effect and applies globally to all SQLs and sessions. This PR introduces the corresponding iceberg connector session property `max_partitions_per_writer` to set the max partitions per writer. This provides a much lighter and more flexible approach, allowing adjustments to take effect immediately without a restart. ## Motivation and Context Provide per-session or even per-statement configuration to adjust insert behavior and avoid failures. ## Impact Users can now set the max limit of partitions per writer via the `SET SESSION` statement. ## Test Plan - Newly added test cases to show the effect of the session property in CTAS/INSERT statement. ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes ``` == NO RELEASE NOTE == ```

``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Chores: - Update the Velox submodule reference used by presto-native-execution to the latest desired commit. --------- Co-authored-by: Ping Liu <lpingbj@cn.ibm.com> Co-authored-by: Christian Zentgraf <czentgr@us.ibm.com>

…riter (prestodb#27054) Summary: Session property to control the file size for presto writers Differential Revision: D91361183 ## Summary by Sourcery New Features: - Introduce a NATIVE_MAX_TARGET_FILE_SIZE session property to control when writers roll over to a new output file based on size. ### Release Notes ``` == RELEASE NOTES == Prestissimo (Native Execution) Changes * Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file. ```

``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Chores: - Update the Velox submodule reference used by presto-native-execution.

Summary: Fix unnecessary copies in the Presto HTTP module: - Use std::move() for shared_ptr, SSLContextPtr, and callback assignments - Use const reference for path variable to avoid copy from getPath() These changes eliminate unnecessary copy operations and improve performance. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Address performance-related cleanups in the Presto HTTP client and server by eliminating unnecessary copies of objects and strings. Enhancements: - Move HTTP client and server callbacks, timers, and context objects instead of copying to avoid redundant allocations and ownership transfers. - Bind the HTTP request path as a const reference rather than copying the string when dispatching request handlers.

…TRY() (prestodb#26976) Add a session property to control whether TRY() function can catch errors from remote function execution. This allows users to enable error catching for remote functions on a per-session basis. Changes: - Add TRY_CATCH_REMOTE_FUNCTION_ERRORS constant to SystemSessionProperties - Add isTryCatchRemoteFunctionErrors() to FeaturesConfig with default false - Add isTryCatchRemoteFunctionErrorsEnabled() getter for session access - Add unit test for the new config property ``` == NO RELEASE NOTE == ```

…restodb#27067) ## Description The news session property would allow Partitioned Output Velox operators to flush (return) data eagerly, as soon as it arrives. This would match default Presto Java behavior of returning results eagerly to the caller, while the query is still running (scanning). ## Motivation and Context For "needle in a haystack" type of queries running in various UIs this early return functionality is crucial. ## Test Plan Existing session property test. Ran the custom build in a Prestissimo cluster to ensure session property changes query behavior accordingly. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Add a native session property to control eager flushing behavior of partitioned output operators. New Features: - Introduce the native_partitioned_output_eager_flush session property to enable eager flushing of PartitionedOutput operator rows in native execution. Documentation: - Document the native_partitioned_output_eager_flush session property in the Presto native session properties reference. Tests: - Extend session property mapping tests to cover the new native_partitioned_output_eager_flush property.

prestodb#27059) Summary: MV query optimizer fails to rewrite queries when the specified table name differs between the MV definition and the incoming query (ex: `base_table` vs `schema.base_table`). This fix resolves table references to schema-qualified names, ensuring consistent table matching regardless of how the table was specified. Reviewed By: zation99 Differential Revision: D91699496 ## Summary by Sourcery Ensure materialized view query optimization consistently matches base tables regardless of schema qualification in table names. Bug Fixes: - Fix materialized view rewrites failing when base tables are referenced with different schema qualifications between the MV definition and the incoming query. Tests: - Add coverage to verify materialized view query optimization works when base tables are referenced both with and without schema-qualified names in various query shapes. ## Release Notes ``` == RELEASE NOTES == General Changes * Fix MV query optimizer by correctly resolving table references to schema-qualified names. ```

…odb#26905) Summary: Ported the IpPrefix and IpAddress tests in https://github.com/prestodb/presto/blob/master/presto-main-base/src/test/java/com/facebook/presto/operator/scalar/TestIpPrefixFunctions.java to run with Presto Native engine in presto-native-tests. This is a continuation of the work to refactor scalar function tests from `presto-main-base` to `presto-main-tests` from this PR: prestodb#26013 Also moved IpPrefixType and IpAddressType into `presto-common` from `presto-main-base` due to some dependency cycles that appeared after refactoring. == NO RELEASE NOTE ==

## Description Fix for prestodb#26685 Fix for prestodb#26808 ## Motivation and Context   ## Impact  ## Test Plan  ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * ... * ... Hive Connector Changes * ... * ... ``` If release note is NOT required, use: ``` == NO RELEASE NOTE == ```

``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Add configurable shard count for the async data cache and wire it through server initialization. New Features: - Introduce a new system config option to control the number of async cache shards with a default value. - Expose the async cache shard count to the async data cache options during server initialization. Tests: - Add unit tests covering default and custom values for the async cache shard count system config.

…6951) ## Description Fixes Velox to Presto `IN` expression conversion. When the `IN-list` is constant, the Velox expression representation uses a constant expression with an array vector to store the list (see conversion [here](https://github.com/prestodb/presto/blob/4e91f155d0f4704325552fac3807da0efdba6a35/presto-native-execution/presto_cpp/main/types/PrestoToVeloxExpr.cpp#L780)). The Presto `IN` expression expects the values from constant `IN-list` to be distinct arguments to the `SpecialFormExpression`. The `VeloxToPrestoExpr` is modified accordingly. ## Motivation and Context Resolves prestodb#26921. ## Impact Fixes bug with `IN` expression in native expression optimizer. ## Test Plan Added e2e test. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Fix Velox-to-Presto conversion of IN expressions to correctly construct Presto special form arguments and add coverage for the native expression optimizer. Bug Fixes: - Correct Velox IN expression conversion when the IN-list is represented as a constant array so Presto receives individual arguments instead of a single array-typed constant. Tests: - Add an end-to-end test ensuring IN expressions are handled correctly by the native expression optimizer in the sidecar plugin test suite.

…restodb#26978) ## Description Velox now supports `KHyperLogLog` type (ref: facebookincubator/velox@1165703). Adds support for this type to the `NativeTypeManager`. Also adds `KHyperLogLog` to `StandardTypes` in `presto-common` to avoid a dependency on `presto-main-base` in `presto-native-sidecar-plugin`. ## Motivation and Context Fix test failure uncovered in `presto-native-tests`. Required for prestodb#23671. ## Impact Queries with `KHyperLogLog` won't fail on sidecar enabled Presto C++ deployments. ## Test Plan Added e2e test. ``` == NO RELEASE NOTE == ```

Introduce support for Azure storage backends including Azure Blob Storage (using the wasbs:// scheme) and Azure Data Lake Storage Gen2 (using the abfss:// scheme) in the Hive connector. Key changes: - Added HiveAzureConfigurationInitializer to inject relevant Azure configurations into Hadoop Configuration - Introduced HiveAzureConfig to allow catalog-level configuration of Azure properties - Updated HdfsConfigurationInitializer and HiveConnectorFactory to delegate Azure-specific config setup - Registered configuration initializer in Hive module Supports shared key and OAuth2-based authentication.

imjalpreet reviewed May 13, 2025

View reviewed changes

imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch from 49d3ff3 to 61bd31a Compare May 13, 2025 10:54

mehradpk force-pushed the adls_support branch from 1feea7e to 87ac275 Compare May 14, 2025 08:48

imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch 4 times, most recently from e0009df to 908fc4e Compare May 22, 2025 09:36

mehradpk force-pushed the adls_support branch from 87ac275 to 64eff1c Compare May 29, 2025 14:21

imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch 2 times, most recently from 9a3edd5 to 75ff57c Compare July 8, 2025 22:29

imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch from 1eb9af2 to 444a8f4 Compare July 28, 2025 11:51

imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch from 444a8f4 to 17b66a1 Compare August 21, 2025 20:50

imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch 4 times, most recently from 79b9fb2 to cb0c461 Compare September 16, 2025 20:49

mehradpk force-pushed the adls_support branch from 64eff1c to b221385 Compare September 22, 2025 04:05

mehradpk force-pushed the adls_support branch from b221385 to 46c87ca Compare October 6, 2025 06:37

mehradpk force-pushed the adls_support branch 2 times, most recently from f98eb88 to 9025dd1 Compare October 30, 2025 07:14

mehradpk force-pushed the adls_support branch from 9025dd1 to a65a447 Compare November 10, 2025 20:21

han-yan01 and others added 7 commits November 12, 2025 10:07

feat: Impl sort key for LocalShuffleWriter (prestodb#26547)

6b4c406

Summary: Impl sort key for LocalShuffleWriter Differential Revision: D86322593

dependabot bot and others added 2 commits January 27, 2026 10:09

mehradpk force-pushed the adls_support branch from 92776b9 to 971afcd Compare January 27, 2026 21:25

bibith4 and others added 26 commits January 28, 2026 11:51

chore: Upgrade io.opentelemetry version to 1.58.0 (prestodb#26644)

1a67c8e

feat(plugin-iceberg): Add support for materialized views with Hive ca…

32207b9

…talog (prestodb#26958)

chore(ci): Advance velox (prestodb#27069)

4d1c6d0

``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Chores: - Update the Velox submodule reference used by presto-native-execution.

feat(ci): Expand CVE reporting to med and low (prestodb#27081)

c3b3eeb

mehradpk force-pushed the adls_support branch from 971afcd to 7b41678 Compare February 5, 2026 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Azure Blob Storage and ADLS Gen2 in Hive connector#1

Add support for Azure Blob Storage and ADLS Gen2 in Hive connector#1
mehradpk wants to merge 565 commits intonishithakbhaskaran:hadoop-upgrade-3.4.1from
mehradpk:adls_support

mehradpk commented May 12, 2025

Uh oh!

imjalpreet left a comment

Uh oh!

nishithakbhaskaran commented May 13, 2025

Uh oh!

github-actions bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

mehradpk commented May 12, 2025

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Uh oh!

imjalpreet left a comment

Choose a reason for hiding this comment

Uh oh!

nishithakbhaskaran commented May 13, 2025

Uh oh!

github-actions bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

github-actions bot commented Oct 6, 2025 •

edited

Loading