Conversation
vortex-data#5874) This PR adds an additional pruning check on every poll of the File stream, so if the underlying dynamic expression is updated, the stream will stop. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Closes: vortex-data#5742 Also adds documentation everywhere and refactors a bunch of stuff to be more ergonomic since it's now "more" user facing. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…5866) Signed-off-by: cancaicai <2356672992@qq.com>
vortex-data#5765) Fixes vortex-data#5759 This PR maps string_views to strings and binary_views to binary so that `to_substrait` will no longer raise ArrowNotImplementedError when constructing the substrait. Vortex supports expressions over views and Arrow compute doesn't, but [to_substrait](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Expression.html#pyarrow.dataset.Expression.to_substrait) raises ArrowNotImplementedError based on Arrow compute kernels... regardless of the backend. --------- Signed-off-by: Paul Timmins <paul@iqmo.com> Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
We're currently pulling a full desktop dev setup for the windows CI, so this is an attempt to make it a faster. It seems to save just over a minute per CI run. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
…#5880) Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
This seems to mostly affect ratatui (and lance, which we ignore). Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Removing a label requires `contents: write` permissions, which we don't want to grant to forks. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Makes it easy to test against DuckDB either from a source commit, or from a released version using pre-compiled libraries for releases. With a little help from Claude. --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Taken from the [docs](https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-commands#adding-a-job-summary), also only try to comment on the PR if its not coming from a fork. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
This PR tries to reduce `time_elapsed_opening` for Vortex scans (observed vs Parquet). Changes: - `vortex-file`: shrink default footer initial read from 1MiB to `MAX_POSTSCRIPT_SIZE + EOF_SIZE` (~64KiB) and add a regression test. - `vortex-scan`: make `ScanBuilder::into_stream()` lazy (defer `prepare()` / split registration until first poll) and add a unit test to ensure stream construction has no split-planning side effects. - `vortex-datafusion`: expose the footer initial read size as a format option (`footer_initial_read_size_bytes`) and plumb it into `VortexOpenOptions::with_initial_read_size`. Notes: - Scan planning errors now surface on first poll instead of during `into_stream()` construction. - If the footer/schema/layout don’t fit in the initial window, `read_footer` will issue additional reads as before. Tests: - `cargo +nightly fmt --all --check` - `cargo clippy -p vortex-datafusion --all-targets --all-features -- -D warnings` - `cargo test --locked -p vortex-file -p vortex-scan -p vortex-io` - `cargo test --locked -p vortex-datafusion` Related: vortex-data#4677 --------- Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com> Co-authored-by: godnight10061 <godnight10061@users.noreply.github.com>
…nge column names (vortex-data#5881) This has started to fail after DF 51 upgrade where we added additional logic to cast to support reordering --------- Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
…5795) Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Previously we only checked that options were the correct type, and not that the vtable itself was the same. We had multiple expressions in SpiralDB that use `Options = FieldName` and they falsely matched the downcast. --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [prost](https://redirect.github.com/tokio-rs/prost) | workspace.dependencies | patch | `0.14.1` → `0.14.3` | | [prost-build](https://redirect.github.com/tokio-rs/prost) | workspace.dependencies | patch | `0.14.1` → `0.14.3` | | [prost-types](https://redirect.github.com/tokio-rs/prost) | workspace.dependencies | patch | `0.14.1` → `0.14.3` | --- ### Release Notes <details> <summary>tokio-rs/prost (prost)</summary> ### [`v0.14.3`](https://redirect.github.com/tokio-rs/prost/releases/tag/v0.14.3) [Compare Source](https://redirect.github.com/tokio-rs/prost/compare/v0.14.2...v0.14.3) ##### What's Changed - fix some forgotten prost import paths by [@​GlenDC](https://redirect.github.com/GlenDC) in [#​1385](https://redirect.github.com/tokio-rs/prost/pull/1385) - build(deps): bump actions/upload-artifact from 5 to 6 by [@​dependabot](https://redirect.github.com/dependabot)\[bot] in [#​1381](https://redirect.github.com/tokio-rs/prost/pull/1381) - build(deps): update pulldown-cmark-to-cmark requirement from 21 to 22 by [@​dependabot](https://redirect.github.com/dependabot)\[bot] in [#​1384](https://redirect.github.com/tokio-rs/prost/pull/1384) - Bugfix: default Name implementation produces invalid URLs with empty packages. by [@​aaronjeline](https://redirect.github.com/aaronjeline) in [#​1386](https://redirect.github.com/tokio-rs/prost/pull/1386) - fix: Add back `DecodeError::new` by [@​caspermeijn](https://redirect.github.com/caspermeijn) in [#​1382](https://redirect.github.com/tokio-rs/prost/pull/1382) - chore: remove protobuf submodule and leverage cmake for it by [@​LucioFranco](https://redirect.github.com/LucioFranco) in [#​1389](https://redirect.github.com/tokio-rs/prost/pull/1389) ##### New Contributors - [@​GlenDC](https://redirect.github.com/GlenDC) made their first contribution in [#​1385](https://redirect.github.com/tokio-rs/prost/pull/1385) - [@​aaronjeline](https://redirect.github.com/aaronjeline) made their first contribution in [#​1386](https://redirect.github.com/tokio-rs/prost/pull/1386) **Full Changelog**: <tokio-rs/prost@v0.14.2...v0.14.3> ### [`v0.14.2`](https://redirect.github.com/tokio-rs/prost/blob/HEAD/CHANGELOG.md#Prost-version-0142) [Compare Source](https://redirect.github.com/tokio-rs/prost/compare/v0.14.1...v0.14.2) *PROST!* is a [Protocol Buffers](https://protobuf.dev/) implementation for the [Rust Language](https://www.rust-lang.org/). `prost` generates simple, idiomatic Rust code from `proto2` and `proto3` files. ####⚠️ Heads-up - Increase MSRV to 1.82 ([#​1356](https://redirect.github.com/tokio-rs/prost/issues/1356)) - Update maintenance status to Passively Maintained ([#​1359](https://redirect.github.com/tokio-rs/prost/issues/1359)) This excerpt is from the readme: > The current maintainer is not contributing new features and doesn't have the time to review new features. Bug fixes and small improvements are welcome. Feel free to contribute small and easily reviewable PRs. > > Bug fixes are still important, and security fixes will be released as soon as possible. Contact the `#prost` channel in [Tokio discord](https://discord.gg/tokio) if you feel a bug or security fix is not getting enough attention. > > The maintainer expects the official `protobuf` project to release their rust library soon and expects it to be as fully featured as the C++ library. See their [source code](https://redirect.github.com/protocolbuffers/protobuf/tree/main/rust) and [crate](https://crates.io/crates/protobuf/4.33.1-release) for more information. #### 🚀 Features - Configure prost path via `prost_build::Config` or `#[(prost(prost_path = "::prost")]` ([#​1274](https://redirect.github.com/tokio-rs/prost/issues/1274)) - Support for deprecated enum and oneof fields ([#​1316](https://redirect.github.com/tokio-rs/prost/issues/1316)) #### 🐛 Bug Fixes - *(prost-build)* Resolve OneOf type name conflict with embedded message ([#​1294](https://redirect.github.com/tokio-rs/prost/issues/1294)) - *(prost-build)* Avoid OneOf type collision with enums and keyword names ([#​1341](https://redirect.github.com/tokio-rs/prost/issues/1341)) #### 💼 Dependencies - Use `trait Error` from core ([#​1179](https://redirect.github.com/tokio-rs/prost/issues/1179)) - *(deps)* Update protobuf to v25.8 ([#​1323](https://redirect.github.com/tokio-rs/prost/issues/1323)) - *(deps)* Update criterion requirement from 0.6 to 0.7 ([#​1308](https://redirect.github.com/tokio-rs/prost/issues/1308)) - *(deps)* Update petgraph to 0.8 ([#​1327](https://redirect.github.com/tokio-rs/prost/issues/1327)) - *(deps)* Bump actions/upload-artifact from 4 to 5 ([#​1351](https://redirect.github.com/tokio-rs/prost/issues/1351)) - *(deps)* Bump actions/checkout from 5 to 6 ([#​1370](https://redirect.github.com/tokio-rs/prost/issues/1370)) - Bump actions/checkout to v5 ([#​1312](https://redirect.github.com/tokio-rs/prost/issues/1312)) - Update clippy to version 1.87 ([#​1292](https://redirect.github.com/tokio-rs/prost/issues/1292)) - Replace once\_cell dependency by std lib ([#​1119](https://redirect.github.com/tokio-rs/prost/issues/1119)) #### 📚 Documentation - Update outdated link is test documentation ([#​1289](https://redirect.github.com/tokio-rs/prost/issues/1289)) - Describe use of encoding module ([#​1322](https://redirect.github.com/tokio-rs/prost/issues/1322)) - Update the readme MSRV to the actual number ([#​1331](https://redirect.github.com/tokio-rs/prost/issues/1331)) - Update URLs after manual review ([#​1336](https://redirect.github.com/tokio-rs/prost/issues/1336)) - Answer why fields are wrapped in option ([#​1358](https://redirect.github.com/tokio-rs/prost/issues/1358)) #### 🎨 Styling - Add spaces to derive arguments in generated code ([#​1290](https://redirect.github.com/tokio-rs/prost/issues/1290)) - Use variables directly in the `format!` string ([#​1293](https://redirect.github.com/tokio-rs/prost/issues/1293)) - Remove unneeded lint allow statements ([#​1326](https://redirect.github.com/tokio-rs/prost/issues/1326)) - Remove allocation in tests ([#​1332](https://redirect.github.com/tokio-rs/prost/issues/1332)) - Simplify DecodeError description to an enum ([#​1330](https://redirect.github.com/tokio-rs/prost/issues/1330)) - Use variables directly in the `format!` string ([#​1361](https://redirect.github.com/tokio-rs/prost/issues/1361)) - Fix typo in prost/src/encoding.rs ([#​1369](https://redirect.github.com/tokio-rs/prost/issues/1369)) #### 🧪 Testing - Rename package of `ident_conversion` ([#​1291](https://redirect.github.com/tokio-rs/prost/issues/1291)) - Add test for split buffer varint decoding ([#​1321](https://redirect.github.com/tokio-rs/prost/issues/1321)) - Add descriptive reason of test failure ([#​1320](https://redirect.github.com/tokio-rs/prost/issues/1320)) - Additionally test `decode_varint_slice` with roundtrips ([#​1325](https://redirect.github.com/tokio-rs/prost/issues/1325)) - *(result\_struct)* Move tests to separate module ([#​1333](https://redirect.github.com/tokio-rs/prost/issues/1333)) - *(proto3\_presence)* Move test to separate module ([#​1334](https://redirect.github.com/tokio-rs/prost/issues/1334)) - *(result\_enum)* Move test to separate module ([#​1342](https://redirect.github.com/tokio-rs/prost/issues/1342)) - *(option\_enum)* Move test to separate module ([#​1344](https://redirect.github.com/tokio-rs/prost/issues/1344)) - *(option\_struct)* Move tests to separate module ([#​1345](https://redirect.github.com/tokio-rs/prost/issues/1345)) - *(message\_encoding)* Roundtrip `Coumpound` ([#​1365](https://redirect.github.com/tokio-rs/prost/issues/1365)) - *(no\_unused\_results)* Add roundtrip for Test message ([#​1364](https://redirect.github.com/tokio-rs/prost/issues/1364)) - *(ServiceGenerator)* Verify the content of all generated files ([#​1357](https://redirect.github.com/tokio-rs/prost/issues/1357)) - *(derive\_copy)* Allow dead code ([#​1362](https://redirect.github.com/tokio-rs/prost/issues/1362)) - Always choose macOS 14 ([#​1324](https://redirect.github.com/tokio-rs/prost/issues/1324)) #### ⚙️ Miscellaneous Tasks - Replace duplicate README by a symlink ([#​1303](https://redirect.github.com/tokio-rs/prost/issues/1303)) - Add `cargo-semver-checks` ([#​1337](https://redirect.github.com/tokio-rs/prost/issues/1337)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM, only on Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/vortex-data/vortex). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuNzQuNSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…rtex-data#5895) Use `execute_canonical` instead of `execute` [likely no one will need to fix anything since the API is very new --------- Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
<img width="1384" height="241" alt="image" src="https://github.com/user-attachments/assets/a1a9c4d0-ba14-4872-a8e6-a59893015095" /> looks weird? --------- Signed-off-by: blaginin <dima@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
…ortex-data#5928) Resolves vortex-data#5927 Signed-off-by: Daniel King <dan@spiraldb.com>
…rtex-data#5905) Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
…5915) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | CodSpeedHQ/action | action | digest | `dbda711` → `99c1668` | --- ### Configuration 📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM, only on Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/vortex-data/vortex). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuNzQuNSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…in VTable (vortex-data#6045) Internal Array API break, just use execute --------- Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
As part of moving towards an explicit execution model, arrays now all return "Validity" which represents the logical validity of the array. scalar_at and min/max operations then work as normal over this validity data structure. Follow ups: - [ ] Remove ValidityVTable, moving `validity` function over to the main Array vtable. --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>
Minimize the tokio build check to the minimal set of crates that needs building. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
vortex-data#5976) This PR fixes how a file or a scan's `DType` is mapped into an arrow types in cases where the table has non-logical types that don't round trip between the types systems. This change both simplifies how we push projection expressions and allows us to save on unnecessary work converting arrays between various logically-equivalent types. --------- Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com> Signed-off-by: Adam Gutglick <adam@spiraldb.com> Co-authored-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
…6018) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | CodSpeedHQ/action | action | digest | `99c1668` → `94b8856` | --- ### Configuration 📅 **Schedule**: Branch creation - Between 12:00 AM and 03:59 AM, only on Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/vortex-data/vortex). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi43NC41IiwidXBkYXRlZEluVmVyIjoiNDIuODUuMSIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiY2hvcmUiXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
We don't the cache suffix explicitly anymore as we use the job id as part of the cache key, and it serves the same purpose to the same effect in all the places we added it. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Having started to implement ExpressionArray::deserialize, I realised the best thing to do is pass a session to the deserialize function. If we do that.... then there's no reason for Array VTables to hold state. They're just ZSTs. So this should avoid a whole bunch of heap allocations. We should follow up with other vtables and eventually converge on a pattern. --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>
It doesn't seem to make much of a difference in the overall coverage, and just takes a long time to actually run. --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Saw it somewhere, should save on cache space and windows has been really taking up a lot of cache space Signed-off-by: Adam Gutglick <adam@spiraldb.com>
1. Its already enabled by `vortex-array`, which basically everything will pull. 2. Its unnecessarily avoided in `vortex-scalar`, which even more things pull. 3. We really don't need three different variants here, surely one is enough. 4. This has benefits in CI, just reducing the overall size of the internal dependency matrix. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Follow up for vortex-data#6056, the cache uses available env vars when it runs, so the variable must be available for both setup and post run steps. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
As part of this, FoR kernel execution is now async. GPU benchmarks show unchanged performance. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
…a#6067) fix vortex-data#6065 Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Fixes a potential bug where decoding a large FSST and VarBin arrays results in an invalid VarBinViewArray. When you have a large buffer that is, currently we generate a new VBV with the single buffer plus some views built against it. There will be trouble if the buffer is > 2GiB though. This PR splits out a separate `build_views` function that takes a `max_buffer_len` parameter and as it generates views, it splits (zero-copy) the underlying buffer into segments of no more than `max_buffer_len`. --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>
…-data#6072) Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com> Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk> Co-authored-by: Nicholas Gates <nick@nickgates.com>
Experimenting showed that just using sccache is generally better or roughly equal, at least for our current setup with runs-on backed storage. I've audited all of our CI setup, and found a few places where we run on github runners so using sccache is wasteful and disabled it, and also found a few places where `rust-cache` should still be useful. --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Pandas released a new version which broke the docs for us for python reasons, but this pulls a specific version. --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
…Rule (vortex-data#6090) The previous code was assuming that target fields were in the same order as source fields and all target fields existing in the source schema were the first k fields. This would cause incorrect query results. Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
The code previously cause an underflow because it assumed a non-empty sizes array. Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Previously when filtering a list array, we could create a mask longer than the sliced elements range, causing a "Filter mask length mismatch" panic. Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
Signed-off-by: Alfonso Subiotto Marques <alfonso.subiotto@polarsignals.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.