Skip to content

update to develop with empty list view cherry pick#15

Open
asubiotto wants to merge 132 commits intodevelopfrom
ps-update
Open

update to develop with empty list view cherry pick#15
asubiotto wants to merge 132 commits intodevelopfrom
ps-update

Conversation

@asubiotto
Copy link
Member

No description provided.

AdamGS and others added 30 commits January 6, 2026 18:48
vortex-data#5874)

This PR adds an additional pruning check on every poll of the File
stream, so if the underlying dynamic expression is updated, the stream
will stop.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Closes: vortex-data#5742

Also adds documentation everywhere and refactors a bunch of stuff to be
more ergonomic since it's now "more" user facing.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
vortex-data#5765)

Fixes vortex-data#5759

This PR maps string_views to strings and binary_views to binary so that
`to_substrait` will no longer raise ArrowNotImplementedError when
constructing the substrait.

Vortex supports expressions over views and Arrow compute doesn't, but
[to_substrait](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression.to_substrait)
raises ArrowNotImplementedError based on Arrow compute kernels...
regardless of the backend.

---------

Signed-off-by: Paul Timmins <paul@iqmo.com>
Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
We're currently pulling a full desktop dev setup for the windows CI, so
this is an attempt to make it a faster. It seems to save just over a
minute per CI run.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
This seems to mostly affect ratatui (and lance, which we ignore).

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Removing a label requires `contents: write` permissions, which we don't
want to grant to forks.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Makes it easy to test against DuckDB either from a source commit, or
from a released version using pre-compiled libraries for releases.

With a little help from Claude.

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Taken from the
[docs](https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-commands#adding-a-job-summary),
also only try to comment on the PR if its not coming from a fork.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
This PR tries to reduce `time_elapsed_opening` for Vortex scans
(observed vs Parquet).

Changes:
- `vortex-file`: shrink default footer initial read from 1MiB to
`MAX_POSTSCRIPT_SIZE + EOF_SIZE` (~64KiB) and add a regression test.
- `vortex-scan`: make `ScanBuilder::into_stream()` lazy (defer
`prepare()` / split registration until first poll) and add a unit test
to ensure stream construction has no split-planning side effects.
- `vortex-datafusion`: expose the footer initial read size as a format
option (`footer_initial_read_size_bytes`) and plumb it into
`VortexOpenOptions::with_initial_read_size`.

Notes:
- Scan planning errors now surface on first poll instead of during
`into_stream()` construction.
- If the footer/schema/layout don’t fit in the initial window,
`read_footer` will issue additional reads as before.

Tests:
- `cargo +nightly fmt --all --check`
- `cargo clippy -p vortex-datafusion --all-targets --all-features -- -D
warnings`
- `cargo test --locked -p vortex-file -p vortex-scan -p vortex-io`
- `cargo test --locked -p vortex-datafusion`

Related: vortex-data#4677

---------

Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Co-authored-by: godnight10061 <godnight10061@users.noreply.github.com>
…nge column names (vortex-data#5881)

This has started to fail after DF 51 upgrade where we added additional
logic to cast to support reordering

---------

Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
…5795)

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Previously we only checked that options were the correct type, and not
that the vtable itself was the same. We had multiple expressions in
SpiralDB that use `Options = FieldName` and they falsely matched the
downcast.

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [prost](https://redirect.github.com/tokio-rs/prost) |
workspace.dependencies | patch | `0.14.1` → `0.14.3` |
| [prost-build](https://redirect.github.com/tokio-rs/prost) |
workspace.dependencies | patch | `0.14.1` → `0.14.3` |
| [prost-types](https://redirect.github.com/tokio-rs/prost) |
workspace.dependencies | patch | `0.14.1` → `0.14.3` |

---

### Release Notes

<details>
<summary>tokio-rs/prost (prost)</summary>

###
[`v0.14.3`](https://redirect.github.com/tokio-rs/prost/releases/tag/v0.14.3)

[Compare
Source](https://redirect.github.com/tokio-rs/prost/compare/v0.14.2...v0.14.3)

##### What's Changed

- fix some forgotten prost import paths by
[@&#8203;GlenDC](https://redirect.github.com/GlenDC) in
[#&#8203;1385](https://redirect.github.com/tokio-rs/prost/pull/1385)
- build(deps): bump actions/upload-artifact from 5 to 6 by
[@&#8203;dependabot](https://redirect.github.com/dependabot)\[bot] in
[#&#8203;1381](https://redirect.github.com/tokio-rs/prost/pull/1381)
- build(deps): update pulldown-cmark-to-cmark requirement from 21 to 22
by [@&#8203;dependabot](https://redirect.github.com/dependabot)\[bot] in
[#&#8203;1384](https://redirect.github.com/tokio-rs/prost/pull/1384)
- Bugfix: default Name implementation produces invalid URLs with empty
packages. by
[@&#8203;aaronjeline](https://redirect.github.com/aaronjeline) in
[#&#8203;1386](https://redirect.github.com/tokio-rs/prost/pull/1386)
- fix: Add back `DecodeError::new` by
[@&#8203;caspermeijn](https://redirect.github.com/caspermeijn) in
[#&#8203;1382](https://redirect.github.com/tokio-rs/prost/pull/1382)
- chore: remove protobuf submodule and leverage cmake for it by
[@&#8203;LucioFranco](https://redirect.github.com/LucioFranco) in
[#&#8203;1389](https://redirect.github.com/tokio-rs/prost/pull/1389)

##### New Contributors

- [@&#8203;GlenDC](https://redirect.github.com/GlenDC) made their first
contribution in
[#&#8203;1385](https://redirect.github.com/tokio-rs/prost/pull/1385)
- [@&#8203;aaronjeline](https://redirect.github.com/aaronjeline) made
their first contribution in
[#&#8203;1386](https://redirect.github.com/tokio-rs/prost/pull/1386)

**Full Changelog**:
<tokio-rs/prost@v0.14.2...v0.14.3>

###
[`v0.14.2`](https://redirect.github.com/tokio-rs/prost/blob/HEAD/CHANGELOG.md#Prost-version-0142)

[Compare
Source](https://redirect.github.com/tokio-rs/prost/compare/v0.14.1...v0.14.2)

*PROST!* is a [Protocol Buffers](https://protobuf.dev/) implementation
for the [Rust Language](https://www.rust-lang.org/). `prost` generates
simple, idiomatic Rust code from `proto2` and `proto3` files.

#### ⚠️ Heads-up

- Increase MSRV to 1.82
([#&#8203;1356](https://redirect.github.com/tokio-rs/prost/issues/1356))
- Update maintenance status to Passively Maintained
([#&#8203;1359](https://redirect.github.com/tokio-rs/prost/issues/1359))

  This excerpt is from the readme:

> The current maintainer is not contributing new features and doesn't
have the time to review new features. Bug fixes and small improvements
are welcome. Feel free to contribute small and easily reviewable PRs.
  >
> Bug fixes are still important, and security fixes will be released as
soon as possible. Contact the `#prost` channel in [Tokio
discord](https://discord.gg/tokio) if you feel a bug or security fix is
not getting enough attention.
  >
> The maintainer expects the official `protobuf` project to release
their rust library soon and expects it to be as fully featured as the
C++ library. See their [source
code](https://redirect.github.com/protocolbuffers/protobuf/tree/main/rust)
and [crate](https://crates.io/crates/protobuf/4.33.1-release) for more
information.

#### 🚀 Features

- Configure prost path via `prost_build::Config` or `#[(prost(prost_path
= "::prost")]`
([#&#8203;1274](https://redirect.github.com/tokio-rs/prost/issues/1274))
- Support for deprecated enum and oneof fields
([#&#8203;1316](https://redirect.github.com/tokio-rs/prost/issues/1316))

#### 🐛 Bug Fixes

- *(prost-build)* Resolve OneOf type name conflict with embedded message
([#&#8203;1294](https://redirect.github.com/tokio-rs/prost/issues/1294))
- *(prost-build)* Avoid OneOf type collision with enums and keyword
names
([#&#8203;1341](https://redirect.github.com/tokio-rs/prost/issues/1341))

#### 💼 Dependencies

- Use `trait Error` from core
([#&#8203;1179](https://redirect.github.com/tokio-rs/prost/issues/1179))
- *(deps)* Update protobuf to v25.8
([#&#8203;1323](https://redirect.github.com/tokio-rs/prost/issues/1323))
- *(deps)* Update criterion requirement from 0.6 to 0.7
([#&#8203;1308](https://redirect.github.com/tokio-rs/prost/issues/1308))
- *(deps)* Update petgraph to 0.8
([#&#8203;1327](https://redirect.github.com/tokio-rs/prost/issues/1327))
- *(deps)* Bump actions/upload-artifact from 4 to 5
([#&#8203;1351](https://redirect.github.com/tokio-rs/prost/issues/1351))
- *(deps)* Bump actions/checkout from 5 to 6
([#&#8203;1370](https://redirect.github.com/tokio-rs/prost/issues/1370))
- Bump actions/checkout to v5
([#&#8203;1312](https://redirect.github.com/tokio-rs/prost/issues/1312))
- Update clippy to version 1.87
([#&#8203;1292](https://redirect.github.com/tokio-rs/prost/issues/1292))
- Replace once\_cell dependency by std lib
([#&#8203;1119](https://redirect.github.com/tokio-rs/prost/issues/1119))

#### 📚 Documentation

- Update outdated link is test documentation
([#&#8203;1289](https://redirect.github.com/tokio-rs/prost/issues/1289))
- Describe use of encoding module
([#&#8203;1322](https://redirect.github.com/tokio-rs/prost/issues/1322))
- Update the readme MSRV to the actual number
([#&#8203;1331](https://redirect.github.com/tokio-rs/prost/issues/1331))
- Update URLs after manual review
([#&#8203;1336](https://redirect.github.com/tokio-rs/prost/issues/1336))
- Answer why fields are wrapped in option
([#&#8203;1358](https://redirect.github.com/tokio-rs/prost/issues/1358))

#### 🎨 Styling

- Add spaces to derive arguments in generated code
([#&#8203;1290](https://redirect.github.com/tokio-rs/prost/issues/1290))
- Use variables directly in the `format!` string
([#&#8203;1293](https://redirect.github.com/tokio-rs/prost/issues/1293))
- Remove unneeded lint allow statements
([#&#8203;1326](https://redirect.github.com/tokio-rs/prost/issues/1326))
- Remove allocation in tests
([#&#8203;1332](https://redirect.github.com/tokio-rs/prost/issues/1332))
- Simplify DecodeError description to an enum
([#&#8203;1330](https://redirect.github.com/tokio-rs/prost/issues/1330))
- Use variables directly in the `format!` string
([#&#8203;1361](https://redirect.github.com/tokio-rs/prost/issues/1361))
- Fix typo in prost/src/encoding.rs
([#&#8203;1369](https://redirect.github.com/tokio-rs/prost/issues/1369))

#### 🧪 Testing

- Rename package of `ident_conversion`
([#&#8203;1291](https://redirect.github.com/tokio-rs/prost/issues/1291))
- Add test for split buffer varint decoding
([#&#8203;1321](https://redirect.github.com/tokio-rs/prost/issues/1321))
- Add descriptive reason of test failure
([#&#8203;1320](https://redirect.github.com/tokio-rs/prost/issues/1320))
- Additionally test `decode_varint_slice` with roundtrips
([#&#8203;1325](https://redirect.github.com/tokio-rs/prost/issues/1325))
- *(result\_struct)* Move tests to separate module
([#&#8203;1333](https://redirect.github.com/tokio-rs/prost/issues/1333))
- *(proto3\_presence)* Move test to separate module
([#&#8203;1334](https://redirect.github.com/tokio-rs/prost/issues/1334))
- *(result\_enum)* Move test to separate module
([#&#8203;1342](https://redirect.github.com/tokio-rs/prost/issues/1342))
- *(option\_enum)* Move test to separate module
([#&#8203;1344](https://redirect.github.com/tokio-rs/prost/issues/1344))
- *(option\_struct)* Move tests to separate module
([#&#8203;1345](https://redirect.github.com/tokio-rs/prost/issues/1345))
- *(message\_encoding)* Roundtrip `Coumpound`
([#&#8203;1365](https://redirect.github.com/tokio-rs/prost/issues/1365))
- *(no\_unused\_results)* Add roundtrip for Test message
([#&#8203;1364](https://redirect.github.com/tokio-rs/prost/issues/1364))
- *(ServiceGenerator)* Verify the content of all generated files
([#&#8203;1357](https://redirect.github.com/tokio-rs/prost/issues/1357))
- *(derive\_copy)* Allow dead code
([#&#8203;1362](https://redirect.github.com/tokio-rs/prost/issues/1362))
- Always choose macOS 14
([#&#8203;1324](https://redirect.github.com/tokio-rs/prost/issues/1324))

#### ⚙️ Miscellaneous Tasks

- Replace duplicate README by a symlink
([#&#8203;1303](https://redirect.github.com/tokio-rs/prost/issues/1303))
- Add `cargo-semver-checks`
([#&#8203;1337](https://redirect.github.com/tokio-rs/prost/issues/1337))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM, only on
Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule
defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/vortex-data/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuNzQuNSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…rtex-data#5895)

Use `execute_canonical` instead of `execute` [likely no one will need to
fix anything since the API is very new

---------

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
<img width="1384" height="241" alt="image"
src="https://github.com/user-attachments/assets/a1a9c4d0-ba14-4872-a8e6-a59893015095"
/>

looks weird?

---------

Signed-off-by: blaginin <dima@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
…5915)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| CodSpeedHQ/action | action | digest | `dbda711` → `99c1668` |

---

### Configuration

📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM, only on
Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule
defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/vortex-data/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuNzQuNSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
joseph-isaacs and others added 30 commits January 19, 2026 20:43
…in VTable (vortex-data#6045)

Internal Array API break, just use execute

---------

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
As part of moving towards an explicit execution model, arrays now all
return "Validity" which represents the logical validity of the array.
scalar_at and min/max operations then work as normal over this validity
data structure.

Follow ups:
- [ ] Remove ValidityVTable, moving `validity` function over to the main
Array vtable.

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Minimize the tokio build check to the minimal set of crates that needs
building.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
vortex-data#5976)

This PR fixes how a file or a scan's `DType` is mapped into an arrow
types in cases where the table has non-logical types that don't round
trip between the types systems.

This change both simplifies how we push projection expressions and
allows us to save on unnecessary work converting arrays between various
logically-equivalent types.

---------

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Co-authored-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
…6018)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| CodSpeedHQ/action | action | digest | `99c1668` → `94b8856` |

---

### Configuration

📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM, only on
Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule
defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/vortex-data/vortex).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuODUuMSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
We don't the cache suffix explicitly anymore as we use the job id as
part of the cache key, and it serves the same purpose to the same effect
in all the places we added it.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Having started to implement ExpressionArray::deserialize, I realised the
best thing to do is pass a session to the deserialize function.

If we do that.... then there's no reason for Array VTables to hold
state. They're just ZSTs.

So this should avoid a whole bunch of heap allocations.

We should follow up with other vtables and eventually converge on a
pattern.

---------

Signed-off-by: Nicholas Gates <nick@nickgates.com>
It doesn't seem to make much of a difference in the overall coverage,
and just takes a long time to actually run.

---------

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Saw it somewhere, should save on cache space and windows has been really
taking up a lot of cache space

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
1. Its already enabled by `vortex-array`, which basically everything
will pull.
2. Its unnecessarily avoided in `vortex-scalar`, which even more things
pull.
3. We really don't need three different variants here, surely one is
enough.
4. This has benefits in CI, just reducing the overall size of the
internal dependency matrix.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Follow up for vortex-data#6056, the cache uses available env vars when it runs, so
the variable must be available for both setup and post run steps.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
As part of this, FoR kernel execution is now async. 
GPU benchmarks show unchanged performance.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Fixes a potential bug where decoding a large FSST and VarBin arrays
results in an invalid VarBinViewArray.

When you have a large buffer that is, currently we generate a new VBV
with the single buffer plus some views built against it. There will be
trouble if the buffer is > 2GiB though.

This PR splits out a separate `build_views` function that takes a
`max_buffer_len` parameter and as it generates views, it splits
(zero-copy) the underlying buffer into segments of no more than
`max_buffer_len`.

---------

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
…-data#6072)

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Co-authored-by: Nicholas Gates <nick@nickgates.com>
Experimenting showed that just using sccache is generally better or
roughly equal, at least for our current setup with runs-on backed
storage.
I've audited all of our CI setup, and found a few places where we run on
github runners so using sccache is wasteful and disabled it, and also
found a few places where `rust-cache` should still be useful.

---------

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
)

This PR allows using non object-store based readers when working with
the Vortex `FileSource`, like when users want custom caching,
observability or anything else.

---------

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Pandas released a new version which broke the docs for us for python
reasons, but this pulls a specific version.

---------

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
…Rule (vortex-data#6090)

The previous code was assuming that target fields were in the same order
as source fields and all target fields existing in the source schema
were the first k fields. This would cause incorrect query results.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
The code previously cause an underflow because it assumed a non-empty sizes
array.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Previously when filtering a list array, we could create a mask longer than the
sliced elements range, causing a "Filter mask length mismatch" panic.

Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.