Skip to content

Add support for Azure Blob Storage and ADLS Gen2 in Hive connector#1

Open
mehradpk wants to merge 565 commits intonishithakbhaskaran:hadoop-upgrade-3.4.1from
mehradpk:adls_support
Open

Add support for Azure Blob Storage and ADLS Gen2 in Hive connector#1
mehradpk wants to merge 565 commits intonishithakbhaskaran:hadoop-upgrade-3.4.1from
mehradpk:adls_support

Conversation

@mehradpk
Copy link

Description

Introduce support for Azure storage backends including Azure Blob Storage (using the wasb:// scheme) and Azure Data Lake Storage Gen2 (using the abfs:// scheme) in the Hive connector.

Key changes:

  • Added HiveAzureConfigurationInitializer to inject relevant Azure configurations into Hadoop Configuration
  • Introduced HiveAzureConfig to allow catalog-level configuration of Azure properties
  • Updated HdfsConfigurationInitializer and HiveConnectorFactory to delegate Azure-specific config setup
  • Registered configuration initializer in Hive module

Supports shared key and OAuth2-based authentication.

Motivation and Context

Several enterprise data lake workloads are hosted on Azure storage platforms. This change allows Presto to directly query data from Azure Blob Storage and ADLS Gen2, bringing Azure compatibility in line with other cloud storage systems like Amazon S3 and Google Cloud Storage.

Impact

  • Adds support for Azure cloud storage within the Hive connector
  • Introduces new configuration properties under HiveAzureConfig

No breaking changes to existing Hive catalogs or connectors

Test Plan

Test done via Hive Connector.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

== RELEASE NOTES ==

Hive Connector Changes
* Add support for Azure Blob Storage and Azure Data Lake Storage Gen2 in the Hive connector
* Supports wasb[s]:// and abfs[s]:// URI schemes
* Allows shared key and OAuth2 authentication for Azure storage

Copy link

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mehradpk Thank you for the PR, can you raise a draft PR from your branch in OSS as well? I want to see if there are any test failures.

@nishithakbhaskaran can you take a first pass at reviewing this?

@imjalpreet imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch from 49d3ff3 to 61bd31a Compare May 13, 2025 10:54
@nishithakbhaskaran
Copy link
Owner

@mehradpk Changes looks good.
@imjalpreet Just checking , do we require extra tests for this or current tests is enough?

@imjalpreet imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch 4 times, most recently from e0009df to 908fc4e Compare May 22, 2025 09:36
@imjalpreet imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch 2 times, most recently from 9a3edd5 to 75ff57c Compare July 8, 2025 22:29
@imjalpreet imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch from 1eb9af2 to 444a8f4 Compare July 28, 2025 11:51
@imjalpreet imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch from 444a8f4 to 17b66a1 Compare August 21, 2025 20:50
@imjalpreet imjalpreet force-pushed the hadoop-upgrade-3.4.1 branch 4 times, most recently from 79b9fb2 to cb0c461 Compare September 16, 2025 20:49
@github-actions
Copy link

github-actions bot commented Oct 6, 2025

Codenotify: Notifying subscribers in CODENOTIFY files for diff cb0c461...7b41678.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

@mehradpk mehradpk force-pushed the adls_support branch 2 times, most recently from f98eb88 to 9025dd1 Compare October 30, 2025 07:14
han-yan01 and others added 7 commits November 12, 2025 10:07
Summary:
finishCpu += operator.getFinishCpu().roundTo(NANOSECONDS);

getFinishCpu has underlying issue that causes finisheCpu to overflow
thus resulting in a negative result. This causes exception while
reporting query completion event leading to stats not being reported.
Fix it by setting finishCpu to max value when it overflows


# Release Note
```
== NO RELEASE NOTE ==
```
…mit metadata for query event listeners (prestodb#26331)

Summary:
Currently, the `Input` and `Output` query metadata classes retain two
source of connector-specific information that can be useful for
reporting via an `EventListener`:
```
Optional<Object> connectorInfo;
String serializedCommitOutput;
```
* `connectorInfo` can be cast back to the correct type in an
`EventListener` implementation, allowing rich access to the underlying
data
* `serializedCommitOutput` however, is serialized in a given format by
the `ConnectorCommitHandle` implementation, which makes it difficult to
correctly represent the reporting requirements in an EventListener
(which may need correlation with data in the `connectorInfo` result).

For example, `HiveCommitHandle` retains the lastDataCommitTime for each
partition in a simple array associated with the table name, where the
partition names are retained in the `HiveInputInfo` instance carried
through in connectorInfo. For these times to be mapped back to
individual partitions, the entries must be in the exact same order as
the entries in HiveInputInfo.

This change simply replaces the `serializedCommitOutput` property with
an `Optional<Object>` instance, providing parity with the
`connectorInfo`, and allowing `EventListener` implementations to cast
the commit handle back to the correct type for richer access to the
underlying data.

Differential Revision: D84382446

## Release Notes
```
== RELEASE NOTES ==

SPI Changes
* Replaces the ``String serializedCommitOutput`` argument with ``Optional<Object> commitOutput`` in the ``com.facebook.presto.spi.eventlistener.QueryInputMetadata`` and ``com.facebook.presto.spi.eventlistener.QueryOutputMetadata`` constructors
* Adds ``getCommitOutputForRead()`` and ``getCommitOutputForWrite()`` methods to ``ConnectorCommitHandle``, and deprecates the existing ``getSerializedCommitOutputForRead()`` and ``getSerializedCommitOutputForWrite()`` methods
```
…restodb#26557)

Summary:
Remove the uninitialized bytes in binaryData, so we can reduce the
binary response size.

 {F1983340076}

Differential Revision: D85720910
### RELEASE NOTES ###
```
== RELEASE NOTES ==
General Changes
* Replace the java standard base64 encoder with BaseEncoding from Guava 
```
…nicode escapes (prestodb#26443)

Summary:
Modified `ExpressionFormatter.formatStringLiteral()` to preserve common
whitespace characters (newlines, tabs, carriage returns) in their
literal form rather than converting them to Unicode escape sequences
(e.g., `\000A` for newline). This change improves SQL standard
compliance and fixes issues with embedded code (like Python UDF) and
regex patterns that require proper whitespace handling.

Differential Revision: D85380265


```
== NO RELEASE NOTE ==
```
## Description
Current code will try to add a round robin local exchange below the
merge join node, which will break the sorted property of the input. In
this PR, we fixed it.

## Motivation and Context
Bug fix

## Impact
Bug fix

## Test Plan
Unit test

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.


```
== NO RELEASE NOTE ==
```
…stodb#26403)

## Summary

This PR introduces sorted exchange functionality to Presto, enabling
efficient sort-merge joins by allowing data to be sorted
during shuffle operations rather than requiring separate sort steps.
This optimization eliminates redundant sorting, reduces
memory pressure, and improves query performance for distributed joins
and aggregations that require sorted inputs.

## Motivation

Currently, when Presto needs to perform a sort-merge join in a
distributed query, it must:
  1. Shuffle data across workers (ExchangeNode)
  2. Explicitly sort the shuffled data (SortNode)

This approach is inefficient because sorting happens as a separate
operation after data movement. By pushing the sort operation into the
exchange itself, we can sort data during the shuffle, eliminating the
redundant SortNode and improving overall query performance.

 ## High-Level Changes

  1. Core Infrastructure (3161b24)

  - Add `orderingScheme` field to `PlanFragment` class (Java)
  - Add `outputOrderingScheme` field to C++ PlanFragment protocol
  - Implement JSON serialization/deserialization for C++ integration
- Update `PrestoToVeloxQueryPlan.cpp` to consume ordering scheme and
convert to sorting keys
- Update all `PlanFragment` constructor call sites to support the new
field

  2. Planner Support (130b14f)

  - Extend `ExchangeNode` to support SORTED partition type
- Update `BasePlanFragmenter` to populate and propagate orderingScheme
between fragments
  - Add `PlanFragmenterUtils` support for sorted exchanges
- Enhance `PlanPrinter` to display sorted exchange information in
EXPLAIN output

  3. Optimizer Rule (6951cab)

- Introduce SortedExchangeRule optimizer that identifies and transforms
Sort→Exchange patterns
- Add `sorted_exchange_enabled` session property (experimental, default:
false)
- Add `optimizer.experimental.sorted-exchange-enabled` configuration
property
  - Integrate into optimizer pipeline alongside existing join optimizers
  - Only applies to REMOTE REPARTITION exchanges
  - Validates ordering variables are available in exchange output


  4. Spark Integration (960bc93)

- Update `AbstractPrestoSparkQueryExecution` to handle sorted exchanges
- Add `MutablePartitionIdOrdering` class to track partition ordering in
Spark
- Update `PrestoSparkRddFactory` to preserve sort order during shuffles
  - Enable Spark-based queries to leverage sorted exchanges

  ## Plan Transformation Example

  Before:
```
  SortNode(orderBy: [a, b])
    └─ ExchangeNode(type: REPARTITION, scope: REMOTE)
```
  After:
```
  ExchangeNode(type: REPARTITION, scope: REMOTE, orderingScheme: [a, b])
```
 ## Configuration

  The feature is controlled by:
- Session property: enable_sorted_exchanges (experimental, default:
false)
  - Config property: experimental.optimizer.sorted-exchange-enabled

 ## Testing
  - Added TestSortedExchangeRule with test cases covering various scenarios

## Performance Benefits

  - Reduced sorting overhead: Eliminates redundant SortNode operations
  - Lower memory usage: Avoids buffering data for explicit sorting

  Backward Compatibility

  - Feature is disabled by default (experimental flag)
  - All existing queries continue to work without modification
  - No breaking changes to public APIs
  - Graceful degradation when feature is disabled
## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
  * Add experimental support for sorted exchanges to improve sort-merge join performance. When enabled via the
  `sorted_exchange_enabled` session property or `experimental.optimizer.sorted-exchange-enabled` configuration property, the query
  planner will push sort operations into exchange nodes, eliminating redundant sorting steps and reducing memory usage for
  distributed queries with sort-merge joins. This feature is disabled by default.
Summary: Impl sort key for LocalShuffleWriter

Differential Revision: D86322593
dependabot bot and others added 2 commits January 27, 2026 10:09
…restodb#27009)

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to
4.17.23. CVE-2025-13465
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/lodash/lodash/commit/dec55b7a3b382da075e2eac90089b4cd00a26cbb"><code>dec55b7</code></a>
Bump main to v4.17.23 (<a
href="https://redirect.github.com/lodash/lodash/issues/6088">#6088</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/19c9251b3631d7cf220b43bc757eb33f1084f117"><code>19c9251</code></a>
fix: setCacheHas JSDoc return type should be boolean (<a
href="https://redirect.github.com/lodash/lodash/issues/6071">#6071</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/b5e672995ae26929d111a6e94589f8d03fb8e578"><code>b5e6729</code></a>
jsdoc: Add -0 and BigInt zeros to _.compact falsey values list (<a
href="https://redirect.github.com/lodash/lodash/issues/6062">#6062</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/edadd452146f7e4bad4ea684e955708931d84d81"><code>edadd45</code></a>
Prevent prototype pollution on baseUnset function</li>
<li><a
href="https://github.com/lodash/lodash/commit/4879a7a7d0a4494b0e83c7fa21bcc9fc6e7f1a6d"><code>4879a7a</code></a>
doc: fix autoLink function, conversion of source links (<a
href="https://redirect.github.com/lodash/lodash/issues/6056">#6056</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/9648f692b0fc7c2f6a7a763d754377200126c2e8"><code>9648f69</code></a>
chore: remove <code>yarn.lock</code> file (<a
href="https://redirect.github.com/lodash/lodash/issues/6053">#6053</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/dfa407db0bf5b200f2c7a9e4f06830ceaf074be9"><code>dfa407d</code></a>
ci: remove legacy configuration files (<a
href="https://redirect.github.com/lodash/lodash/issues/6052">#6052</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/156e1965ae78b121a88f81178ab81632304e8d64"><code>156e196</code></a>
feat: add renovate setup (<a
href="https://redirect.github.com/lodash/lodash/issues/6039">#6039</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/933e1061b8c344d3fc742cdc400175d5ffc99bce"><code>933e106</code></a>
ci: add pipeline for Bun (<a
href="https://redirect.github.com/lodash/lodash/issues/6023">#6023</a>)</li>
<li><a
href="https://github.com/lodash/lodash/commit/072a807ff7ad8ffc7c1d2c3097266e815d138e20"><code>072a807</code></a>
docs: update links related to Open JS Foundation (<a
href="https://redirect.github.com/lodash/lodash/issues/5968">#5968</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/lodash/lodash/compare/4.17.21...4.17.23">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lodash&package-manager=npm_and_yarn&previous-version=4.17.21&new-version=4.17.23)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/prestodb/presto/network/alerts).

</details>

```
== RELEASE NOTES ==

Security Changes
* Upgrade lodash from 4.17.21 to 4.17.23 to address `CVE-2025-13465 <https://github.com/advisories/GHSA-xxjr-mmjv-4gpg>`_.

```

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#26718)

## Description
Added new documentation explaining how to use the Presto C++ engine.
The documentation provides step-by-step instructions for configuring,
and running the Presto C++ worker

## Motivation and Context
There was no consolidated or beginner-friendly documentation for Presto
C++ in the open-source project.
Users often had difficulty understanding how to build and run the C++
worker, what dependencies were required, and how it integrates with a
Presto coordinator.

## Impact
There is no performance impact.
## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes

```
== NO RELEASE NOTE ==
```
bibith4 and others added 26 commits January 28, 2026 11:51
## Description
Upgrade postgresql to version 42.7.9

## Motivation and Context
Using a more recent version helps avoid potential vulnerabilities and
ensures we aren't relying on outdated or unsupported code.

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```
…ons to sidecar for expression optimization (prestodb#27043)

## Description
Avoid sending aggregate and window functions to sidecar for expression
optimization.

## Motivation and Context
Encountered while investigating
prestodb#26920.
The bug reported in the issue is different but the general idea is we
should avoid sending aggregate and window functions to sidecar as they
cannot be constant folded.
The failing queries in the issue are added as test cases.

## Impact
No impact.

## Test Plan
Unit tests, CI.

```
== NO RELEASE NOTE ==
```
## Description
This PR adds subfield pushdown optimization for the `cardinality()`
function in Presto. When enabled, this optimization allows the query
engine to skip reading map keys/values or array elements when only the
cardinality (count) of these collections is needed.

This PR contains coordinator-side changes only; the corresponding
worker-side changes will be added separately to the C++ worker. Since
this feature is not yet fully tested end-to-end with the worker, the
session property is disabled by default.

Additionally, this implementation takes a conservative approach to
subfield pushdown for cardinality: if a column already has other
subfields being accessed (e.g., `features['key']`), we skip adding the
structure-only subfield for cardinality to avoid potential correctness
issues.

Key Changes:
1. New StructureOnly PathElement (Subfield.java): Introduced a new path
element type represented as [$] that indicates only the structural
metadata (size/count) is needed, not the actual content
2. SubfieldTokenizer Update: Added parsing support for the $ subscript
pattern in subfield paths
3. FunctionResolution: Added isCardinalityFunction() method to identify
cardinality function calls
4. PushdownSubfields Optimizer: Extended the subfield extraction logic
to recognize cardinality() calls on maps and arrays, generating [$]
subfield hints that downstream readers can use to skip content
5. Session/Config Properties: Added pushdown_subfields_for_cardinality
configuration option (disabled by default)

## Motivation and Context
When queries only need to know the size of a map or array (e.g., `SELECT
cardinality(features) FROM table or WHERE cardinality(tags) > 10),
there's no need to read all the keys/values or both. This optimization
helps reduce shuffles improve the query performance.

## Impact
- Performance: Reduces I/O and deserialization overhead for queries
using cardinality() on maps/arrays
- Backward Compatible: Feature is disabled by default via
optimizer.pushdown-subfield-for-cardinality config
- No Breaking Changes: Existing behavior is preserved when the feature
is disabled
- Added a new session property `pushdown-subfield-for-cardinality`

## Test Plan
Added comprehensive unit tests in TestHiveLogicalPlanner.java covering:
- Simple cardinality pushdown for MAP - Verifies cardinality(x)
generates x[$] subfield
- Cardinality pushdown for ARRAY - Verifies array cardinality generates
correct subfield
- Cardinality in WHERE clause - Tests WHERE cardinality(features) > 10
- Cardinality in aggregation - Tests AVG(cardinality(data))
- Multiple cardinalities - Tests multiple cardinality calls in same
query
- Cardinality with complex expressions - Tests cardinality(tags) * 2
- Cardinality on nested structures - Tests transform(arr_of_maps, m ->
cardinality(m))
- Cardinality combined with subscript access - Verifies that when both
cardinality(features) and features['key'] are used, the specific
subscript takes precedence (avoiding redundant structure-only reads)

## Contributor checklist

- [x] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [x] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [x] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [x] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes

```
== NO RELEASE NOTE ==
```
…restodb#27044)

## Description
For remote functions, sometimes we want to limit the concurrency to
avoid throttling the remote service.
In this PR, I added session properties to set the number of tasks for a
remote projection, so the plan will be like:

scan -> remote exchange (with specified number of tasks) -> remote
project node -> remote exchange -> output

The remote project will run in a separate stage.

There are two session properties,
`remote_function_fixed_parallelism_task_count` to specify how many tasks
to use
`remote_function_names_for_fixed_parallelism` to specify the pattern of
remote function names to match.

## Motivation and Context
As in description

## Impact
To control the number of tasks for a remote project node

## Test Plan
unit tests and local end to end test

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Add options to control the number of tasks for remote project node
```

## Summary by Sourcery

Add configurable fixed-parallelism support for remote function
projections and wire it through planning, partitioning, and session
properties.

New Features:
- Introduce session and config properties to control fixed parallelism
for selected remote functions via regex-matched names and an optional
task count.
- Extend exchange planning to insert bounded round-robin remote
exchanges around qualifying remote project nodes based on the configured
properties.

Enhancements:
- Augment system partitioning handles and exchange nodes to carry an
optional partition count for fixed distributions and honor it when
selecting nodes.

Tests:
- Add planner and configuration tests covering regex matching behavior
for remote-function fixed parallelism and property mappings for the new
optimizer settings.
## Description
The Provisio plugin dumps all the native plugins under `native-plugin/`
and not` native-plugins/`.

## Motivation and Context
See attached screenshot for 
<img width="416" height="249" alt="Screenshot 2026-01-29 at 10 23 38 AM"
src="https://github.com/user-attachments/assets/8edf2856-d61d-4b0c-8d27-20712c0ad044"
/>

## Impact
No user impact

## Test Plan
Docs only change

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```
…restodb#27050)

## Description

Due to Iceberg issue apache/iceberg#15128,
using a binary type as a partition column may cause incorrect
calculation of partition bounds in the generated manifest files when
deleting data files. This can lead to incorrect results in subsequent
queries.

Therefore, we temporarily disables metadata deletion and filter
thoroughly pushdown for varbinary columns. This restrict can be lifted
once the Iceberg issue is resolved.

## Motivation and Context

Fix the bug when use varbinary columns as partition columns in Iceberg

## Impact

This change is not visible to users.

## Test Plan

- Newly added test case in
`IcebergDistributedTestBase.testPartitionedByVarbinaryType` through
`@DataProvider`, which would explicitly fail without this fix.

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes

```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Guard Iceberg plan optimization from enforcing metadata constraints on
VARBINARY-partitioned columns and strengthen test coverage for varbinary
partitioning behavior.

Bug Fixes:
- Avoid pushing down column constraints into Iceberg partition specs for
VARBINARY columns to prevent incorrect metadata-based deletions and
query results when varbinary is used as a partition key.

Tests:
- Extend the varbinary partitioning integration test to cover multiple
insert value orderings and updated expected partition counts via a
TestNG data provider.
…in AddLocalExchanges (prestodb#26960)

We observed that the use of parent preference in AddLocalExchanges can
limit parallelism when the cardinality of the partition column of parent
preference is low. In a setup where a query is allowed to use many
cores, limiting the parallelism significantly affect the query latency.
More details can be found in
prestodb#26961.

This PR makes three changes:
* This PR introduces a new feature config
`localExchangeParentPreferenceStrategy` that has three values: ALWAYS,
NEVER, and AUTOMATIC. The default value is ALWAYS (i.e., current
behavior).
* This PR makes AddLocalExchanges to use parent preference according to
the localExchangeParentPreferenceStrategy. When
localExchangeParentPreferenceStrategy is ALWAYS, it always uses parent
preference. When localExchangeParentPreferenceStrategy is NEVER, it
always not uses parent preference. When
localExchangeParentPreferenceStrategy is AUTOMATIC, it uses parent
preference only when the estimated cardinality is larger than the task
concurrency. (If estimated stats is not available, parent preference is
not used.)
- Notice that the estimated stats is only calculated when
localExchangeParentPreferenceStrategy is AUTOMATIC.
* This PR adds unit tests of the new config and the change to
local-exchange.


## Description
<!---Describe your changes in detail-->

## Motivation and Context
<!---Why is this change required? What problem does it solve?-->
<!---If it fixes an open issue, please link to the issue here.-->

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Introduce a configurable strategy for using parent preferences in
AddLocalExchanges and make local exchange partitioning for aggregations
cost-aware based on estimated cardinality and task concurrency.

New Features:
- Add a local_exchange_parent_preference_strategy session/feature config
to control how local exchanges use parent partitioning preferences with
options ALWAYS, NEVER, and AUTOMATIC.

Enhancements:
- Update AddLocalExchanges to optionally use stats-based decisions when
applying parent partitioning preferences for aggregation local
exchanges, leveraging the existing stats calculator.
- Wire the stats calculator into AddLocalExchanges through
PlanOptimizers to enable precomputation of plan statistics when the
AUTOMATIC strategy is selected.

Tests:
- Add planner tests validating local exchange behavior under ALWAYS,
NEVER, and AUTOMATIC parent preference strategies and different task
concurrency settings.
- Extend FeaturesConfig tests to cover default and explicit mappings for
the new local_exchange_parent_preference_strategy config.
## Description
Remove unused code in `presto-hive-metastore` module

## Motivation and Context
Remove unused code in `presto-hive-metastore` module 

## Impact
Maintainance

## Test Plan
None

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Enhancements:
- Clean up the in-memory caching Hive metastore by removing an unused
method for invalidating stale partitions.
## Description
Earlier the Iceberg connector did not get linked to a valid page, and
this change fixes the issue by correctly mapping it to the Iceberg
connector documentation page.

## Motivation and Context
The previous documentation link for the Iceberg connector was invalid,
which could confuse users trying to navigate to the correct connector
documentation. This change ensures the link points to the correct and
valid page.

## Impact
Documentation-only change. No public API, user-facing behavior, or
performance impact.

## Test Plan
Verified the updated link points to the correct Iceberg connector
documentation page.

## Contributor checklist

- [x] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [x] PR description addresses the issue accurately and concisely.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] Adequate tests were added if applicable.
- [x] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
```
 == NO RELEASE NOTE == 
```
…er writer (prestodb#26989)

## Description

For INSERT/CTAS operations on Iceberg tables with a large number of
partitions, the partition count per writer can far exceed 100. In such
cases, we may want the operation to succeed rather than fail fast—for
example, when the data volume is known to be small; or when we are
willing to trade off speed for lower memory usage by reducing the
configuration values of `parquet_writer_block_size` or
`orc_optimized_writer_max_stripe_size`.

Currently, the only way to configure this limit is through the connector
property `iceberg.max-partitions-per-writer`, which requires a cluster
restart to take effect and applies globally to all SQLs and sessions.

This PR introduces the corresponding iceberg connector session property
`max_partitions_per_writer` to set the max partitions per writer. This
provides a much lighter and more flexible approach, allowing adjustments
to take effect immediately without a restart.

## Motivation and Context

Provide per-session or even per-statement configuration to adjust insert
behavior and avoid failures.

## Impact

Users can now set the max limit of partitions per writer via the `SET
SESSION` statement.

## Test Plan

- Newly added test cases to show the effect of the session property in
CTAS/INSERT statement.

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes

```
== NO RELEASE NOTE ==
```
```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Chores:
- Update the Velox submodule reference used by presto-native-execution
to the latest desired commit.

---------

Co-authored-by: Ping Liu <lpingbj@cn.ibm.com>
Co-authored-by: Christian Zentgraf <czentgr@us.ibm.com>
…riter (prestodb#27054)

Summary: Session property to control the file size for presto writers

Differential Revision: D91361183

## Summary by Sourcery

New Features:
- Introduce a NATIVE_MAX_TARGET_FILE_SIZE session property to control
when writers roll over to a new output file based on size.


### Release Notes

```
== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
* Add ``native_max_target_file_size`` session property to control the maximum target file size for writers. When a file exceeds this size during writing, the writer will close the current file and start writing to a new file.
```
```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Chores:
- Update the Velox submodule reference used by presto-native-execution.
Summary:
Fix unnecessary copies in the Presto HTTP module:
- Use std::move() for shared_ptr, SSLContextPtr, and callback
assignments
- Use const reference for path variable to avoid copy from getPath()

These changes eliminate unnecessary copy operations and improve
performance.

```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Address performance-related cleanups in the Presto HTTP client and
server by eliminating unnecessary copies of objects and strings.

Enhancements:
- Move HTTP client and server callbacks, timers, and context objects
instead of copying to avoid redundant allocations and ownership
transfers.
- Bind the HTTP request path as a const reference rather than copying
the string when dispatching request handlers.
…TRY() (prestodb#26976)

Add a session property to control whether TRY() function can catch
errors
from remote function execution. This allows users to enable error
catching
for remote functions on a per-session basis.

Changes:
- Add TRY_CATCH_REMOTE_FUNCTION_ERRORS constant to
SystemSessionProperties
- Add isTryCatchRemoteFunctionErrors() to FeaturesConfig with default
false
- Add isTryCatchRemoteFunctionErrorsEnabled() getter for session access
- Add unit test for the new config property

```
== NO RELEASE NOTE ==
```
…restodb#27067)

## Description
The news session property would allow Partitioned Output Velox operators
to flush (return) data eagerly, as soon as it arrives.
This would match default Presto Java behavior of returning results
eagerly to the caller, while the query is still running (scanning).

## Motivation and Context
For "needle in a haystack" type of queries running in various UIs this
early return functionality is crucial.

## Test Plan
Existing session property test.
Ran the custom build in a Prestissimo cluster to ensure session property
changes query behavior accordingly.

```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Add a native session property to control eager flushing behavior of
partitioned output operators.

New Features:
- Introduce the native_partitioned_output_eager_flush session property
to enable eager flushing of PartitionedOutput operator rows in native
execution.

Documentation:
- Document the native_partitioned_output_eager_flush session property in
the Presto native session properties reference.

Tests:
- Extend session property mapping tests to cover the new
native_partitioned_output_eager_flush property.
prestodb#27059)

Summary:
MV query optimizer fails to rewrite queries when the specified table
name differs between the MV definition and the incoming query (ex:
`base_table` vs `schema.base_table`).

This fix resolves table references to schema-qualified names, ensuring
consistent table matching regardless of how the table was specified.

Reviewed By: zation99

Differential Revision: D91699496

## Summary by Sourcery

Ensure materialized view query optimization consistently matches base
tables regardless of schema qualification in table names.

Bug Fixes:
- Fix materialized view rewrites failing when base tables are referenced
with different schema qualifications between the MV definition and the
incoming query.

Tests:
- Add coverage to verify materialized view query optimization works when
base tables are referenced both with and without schema-qualified names
in various query shapes.

## Release Notes
```
== RELEASE NOTES ==

General Changes
* Fix MV query optimizer by correctly resolving table references to schema-qualified names.
```
…odb#26905)

Summary:
Ported the IpPrefix and IpAddress tests in
https://github.com/prestodb/presto/blob/master/presto-main-base/src/test/java/com/facebook/presto/operator/scalar/TestIpPrefixFunctions.java
to run with Presto Native engine in presto-native-tests.

This is a continuation of the work to refactor scalar function tests
from `presto-main-base` to `presto-main-tests` from this PR:
prestodb#26013

Also moved IpPrefixType and IpAddressType into `presto-common` from
`presto-main-base` due to some dependency cycles that appeared after
refactoring.

== NO RELEASE NOTE ==
## Description
Fix for prestodb#26685
Fix for prestodb#26808

## Motivation and Context
<!---Why is this change required? What problem does it solve?-->
<!---If it fixes an open issue, please link to the issue here.-->

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 
```

If release note is NOT required, use:

```
== NO RELEASE NOTE ==
```
```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Add configurable shard count for the async data cache and wire it
through server initialization.

New Features:
- Introduce a new system config option to control the number of async
cache shards with a default value.
- Expose the async cache shard count to the async data cache options
during server initialization.

Tests:
- Add unit tests covering default and custom values for the async cache
shard count system config.
…6951)

## Description
Fixes Velox to Presto `IN` expression conversion. When the `IN-list` is
constant, the Velox expression representation uses a constant expression
with an array vector to store the list (see conversion
[here](https://github.com/prestodb/presto/blob/4e91f155d0f4704325552fac3807da0efdba6a35/presto-native-execution/presto_cpp/main/types/PrestoToVeloxExpr.cpp#L780)).
The Presto `IN` expression expects the values from constant `IN-list` to
be distinct arguments to the `SpecialFormExpression`. The
`VeloxToPrestoExpr` is modified accordingly.

## Motivation and Context
Resolves prestodb#26921.

## Impact
Fixes bug with `IN` expression in native expression optimizer.

## Test Plan
Added e2e test.


```
== NO RELEASE NOTE ==
```

## Summary by Sourcery

Fix Velox-to-Presto conversion of IN expressions to correctly construct
Presto special form arguments and add coverage for the native expression
optimizer.

Bug Fixes:
- Correct Velox IN expression conversion when the IN-list is represented
as a constant array so Presto receives individual arguments instead of a
single array-typed constant.

Tests:
- Add an end-to-end test ensuring IN expressions are handled correctly
by the native expression optimizer in the sidecar plugin test suite.
…restodb#26978)

## Description
Velox now supports `KHyperLogLog` type (ref:
facebookincubator/velox@1165703).
Adds support for this type to the `NativeTypeManager`. Also adds
`KHyperLogLog` to `StandardTypes` in `presto-common` to avoid a
dependency on `presto-main-base` in `presto-native-sidecar-plugin`.

## Motivation and Context
Fix test failure uncovered in `presto-native-tests`. Required for
prestodb#23671.

## Impact
Queries with `KHyperLogLog` won't fail on sidecar enabled Presto C++
deployments.

## Test Plan
Added e2e test.


```
== NO RELEASE NOTE ==
```
Introduce support for Azure storage backends including Azure Blob Storage
(using the wasbs:// scheme) and Azure Data Lake Storage Gen2 (using the abfss://
scheme) in the Hive connector.

Key changes:
- Added HiveAzureConfigurationInitializer to inject relevant Azure configurations
  into Hadoop Configuration
- Introduced HiveAzureConfig to allow catalog-level configuration of Azure properties
- Updated HdfsConfigurationInitializer and HiveConnectorFactory to delegate Azure-specific
  config setup
- Registered configuration initializer in Hive module

Supports shared key and OAuth2-based authentication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.