Skip to content

Main 5.0.6 rebase sai fixes#2251

Open
adelapena wants to merge 1569 commits intomck/main-5.0.6-rebasefrom
main-5.0.6-rebase-sai-fixes
Open

Main 5.0.6 rebase sai fixes#2251
adelapena wants to merge 1569 commits intomck/main-5.0.6-rebasefrom
main-5.0.6-rebase-sai-fixes

Conversation

@adelapena
Copy link

What is the issue

Multiple SAI tests are failing, likely due to rebase issues.

What does this PR fix and why was it fixed

Fix SAI tests failing due to rebase issues.

djatnieks and others added 30 commits February 27, 2026 18:53
…nTest (#1664)

### What is the issue
`PendingAntiCompactionTest.testRetriesTimeout` is flaky on `main`
failing with a timeout reported in the junit test task.
riptano/cndb#13524

### What does this PR fix and why was it fixed
`PendingAntiCompactionTest.testRetriesTimeout`, and a couple other
tests, are calling `Future.get` without any timeout. This change adds a
30 seconds timeout to prevent an indefinite wait for the future and a
log in the failing test rather than in the junit test task.

 (Rebase of commit c11b523)
### What is the issue
This is a follow up test for
riptano/cndb#13696.

### What does this PR fix and why was it fixed
The fix was already merged, but this confirms that without the fix,
collections are affected by the NoSuchElementException bug.

 (Rebase of commit 88f08e2)
Relates to riptano/cndb#13766

Brings in several improvements.

CNDB PR: riptano/cndb#13789

 (Rebase of commit 14cbbc6)
### What is the issue
Didn't create an issue as this is just a new test.

### What does this PR fix and why was it fixed
We implicitly rely on this and
are exposing this as a feature
to our end users, so we want
to confirm that the stop words
are the expected stop words.

 (Rebase of commit 27ac572)
Fixes: riptano/cndb#13718

We had adopted this version of lucene
datastax/lucene@5ea8bb4
in order to support our custom modifications of HNSW on top of lucene.
We now use https://github.com/datastax/jvector for vector search and no
longer need a custom build.

I propose we use 9.8.0 since that is the closest release to the one we
have been using.

CNDB test pr: riptano/cndb#13757

 (Rebase of commit 3d5d7d6)
This introduces a `PrefetchingRebufferer` that, when enabled, prefetches
(read from the underlying reader) a configurable number of chunks in
advance (so only work on top of rebufferer factories that work with
chunks, i.e. not uncompressed mmap).

Prefetching must first be enabled globally by setting
`-Dcassandra.read_prefetching_size_kb` to the desired value. With that,
and assuming the disk access mode allows it (meaning, it's not
uncompressed mmap), then prefetching will be applied to reads that are
"clearly" sequential. Mostly, as of this patch, this means the sstable
"scanners" and `SortedStringTableCursor`, so compaction, SAI index
building and tools that read sstable fully (scrub, verifier) will
benefit from it.

Since rebufferers are synchronous, prefetching is done through a fixed
thread pool. The number of thread of that pool can be set with
`-Dcassandra.read_prefetching_threads` (but default to the number of
"processors").

The `-Dcassandra.read_prefetching_window` can also be set to define how
often prefetching is re-triggered. By default, when there is less than
half of the value of `-Dcassandra.read_prefetching_size_kb` prefetched,
then prefetching is triggered.

See riptano/cndb#13639 for additional
motivation.

I'll note that this patch is adapted from the similar DSE behavior.
However, there is a fair bit of modification since the DSE version was
relying on the asynchronous nature of rebufferers there (it also had a
few behavior I didn't fully understood and I didn't dig when it felt a
bit specific to DSE).

One point worth mentioning was that the DSE version was relying on the
ability of the underlying channel to create batches when multiple chunks
are prefetched. This is something that could have advantages and we
could try to add back over time, but it's a lot harder to see how to do
that in the non-async world of C*. Typically, in DSE, since all the
prefetches were initiated on the current thread (without blocking it),
building the "batch" was relatively easy, but it doesn't translate in
this patch. Not to mention that currently the underlying channels have
no batching options and those would have to be added (in a way that
trickled down to the channel implementation within CNDB). Anyway, as of
this patch, chunks are prefetched in parallel but with no batching
optimization. Which does mean the `window' parameter is probably not as
useful as for DSE, but I've kept it nonetheless as it's not exactly
adding complexity.

In the context of compaction in CNDB though, I think this mean that it
would kind of be ideal if we could use a relatively large "chunk size"
for the readers: we read the full file anyway, so there is no waste
there to use a large-ish chunk size, and it would kind of provide
batching for prefetching "for free" (in the sense that if we want to
prefetch, say, 128kb in advance, then we only need 2 chunks with a 64kb
chunk, but need a lot more with a 4kb chunk size).

---------

Co-authored-by: Branimir Lambov <branimir.lambov@datastax.com>

 (Rebase of commit c57974a)
…intervals in Interval.AsymetricOrdering are Comparable and the SSTableReader mock in BaseCompactionStrategyTest violated the Comparator general contract, raising IllegalArgumentException: Comparison method violates its general contract! The root cause was that the objects being compared in Interval.minOrdering were not the same object using '==' but the i1.data.compareTo(i2.data) result was 0, thus violating the comparator contract. Copilot analysis was that Mockito mocks do not have a meaningful compareTo implementation, so their comparison is not stable or meaningful.

This fix changes BaseCompactionStrategyTest.mockSSTable so that getId returns unique values and provides an explicit comparator that compares using getId.

 (Rebase of commit 390a5ce)
…acy_<version>_tuple" sstables. (#1758)

Fix LegacySStableTest to only perform tuple checks on versions "ba" and
higher because we are unable to generate tuple sstables in earlier
versions.

 (Rebase of commit d57b1c7)
…DRA-20189, but due to rebase conflicts the implementation was overwritten by CC SAI code.

 (Rebase of commit ce68642)
…ASSANDRA-20100, but due to rebase conflicts the implementation was overwritten by CC SAI code.

 (Rebase of commit 655714f)
…essions, changing default value of StorageAttachedIndexOptions.prioritize_over_legacy_index to 'true'.

The prioritize_over_legacy_index was introduced in 5.0.4 by CASSANDRA-20334 using a default of 'false'

 (Rebase of commit d5896b9)
…1707)

### What is the issue
Failing `EarlyOpenCachingTest` due to incorrect assignment of data file
length.

### What does this PR fix and why was it fixed
Passes the correct lengths to the early open sstable construction. This
lets `SimpleSSTableScanner` correctly finish iteration.

This was already fixed in CC (as part of CNDB-9104).

 (Rebase of commit 6e759fa)
…ild heap (#1693)

### What is the issue
Fixes: riptano/cndb#13689

### What does this PR fix and why was it fixed
This PR utilizes the NodeQueue::pushMany method to decrease the time
complexity required to build the NodeQueue from `O(n log(n))` to `O(n)`.
This is likely only significant for sufficiently large hybrid queries.
For example, we have seen cases of the search producing 400k rows, which
means that we do 400k insertions into these NodeQueue objects.

 (Rebase of commit 0f7e07b)
Fixes riptano/cndb#13822

CNDB test pr riptano/cndb#13826

SAI’s DiskANN/JVector engine currently **always searches with pruning
enabled**, trading recall for latency. That is fine for most workloads,
but:

* **Threshold / bounded ANN queries** can lose matches when pruning
exits early.
* Performance‑testing users need an easy way to turn pruning off to
measure the recall/latency curve.

This patch introduces a per‑query **`use_pruning`** option so users and
operators can choose the trade‑off that suits them.

---

```cql
WITH ann_options = {'use_pruning': true|false}
```

*When omitted we fall back to the node level default (see below).*

* Cluster‑wide default is controlled by the JVM system property:
  ```
  -Dcassandra.sai.jvector.use_pruning_default=<true|false>
  ```
  exposed as `V3OnDiskFormat.JVECTOR_USE_PRUNING_DEFAULT`.
* The property defaults to `true`, preserving existing behaviour.

* Value must be the literal `true` or `false` (case‑insensitive).
* Unknown ANN option keys continue to raise `InvalidRequestException`.

* `Orderer` computes the value of the `usePruning` option by using the
`use_pruning` value if it is not null or the jvm default if it is null
and passes it down to **all** `graph.search()` calls.
* Threshold / bounded ANN queries always pass `use_pruning = false`
because correctness > latency for those paths (this is a net new change,
but it's very minor and might not have any impact on those queries
depending on the jvector implementation)

* We added one flag bit to `ANNOptions` serialization; older nodes
ignore unknown bits, so mixed‑version clusters are fine (though they do
throw an exception for unknown settings)

* Parsing, validation and transport round‑trips (`ANNOptionsTest`).
* Distributed smoke (`ANNOptionsDistributedTest`).
* Recall regression for pruning vs no‑pruning
(`VectorSiftSmallTest.ensureDisablingPruningIncreasesRecall`).

 (Rebase of commit 23be2e7)
…1705)

### What is the issue
Fixes: riptano/cndb#13625

### What does this PR fix and why was it fixed
When `rerank_k` is non-positive, we will skip reranking where possible.

A minor note: this change is not strictly backwards compatible because
of the original implementation. However, it will fail quickly if the
coordinator accepts the value but the writer does not, and since this is
only a query-time parameter, I propose that we move forward and accept
the net-new value that was previously disallowed.

 (Rebase of commit 717971d)
…ation (#1709)

Fixes: riptano/cndb#13847

We already make use of the `Token#getLongValue` method in SAI, which
only works for the Murmur partitioner, so I propose that we add a
symmetric method that allows us to create tokens without converting a
long to a byte buffer back to a long.

 (Rebase of commit 68a92da)
The number of rows stored in SSTable's SAI index was incorrect for
analyzed indexes and for indexes on collections as it was counting
posting lists for each term. In non-analyzed or non-collection indexes
there is a term per row, while in analyzed index there can be many terms
in a row. This led to incorrect count of rows.

Fixes counting number of rows stored in SSTable SAI index.

Minor fixes of code warnings in the affected files.

 (Rebase of commit 29ef34e)
The test for KD-tree posting list was meant to run a query that
performs skips and then verifies if the skip metric has been
bumped up.

Unfortunately there were several problems with it:
* the query didn't do skips because the clause intersection got
  optimized out by the SAI optimizer
* the test for the metric didn't inspect the metric value, but only
  checked if the metric was updated; so essentially it boiled down
  to checking if the index was used
* the metric value incorrectly recorded queries with no skips, because
  the histogram was configured to not allow zeroes.

In addition, the usage of index is dependent on the version of the
optimizer, so that made the test behave differently between DC and EC
index version. Now that we disable the optimizer for this test,
that difference is gone.

Fixes riptano/cndb#13830

 (Rebase of commit 17b4ad7)
The new request failure reason should be used when the on-disk format of an
index in a replica doesn't support a requested feature.

Also, rename org.apache.cassandra.index.sai.disk.format.Version.latest() to Version.current(),
and add separate Version.CURRENT and Version.LATEST constants.

 (Rebase of commit 523435a)
…clustering columns in descending order - port CASSANDRA-20295

 (Rebase of commit f10abed)
lesnik2u and others added 22 commits February 27, 2026 19:00
…llback (#2124)

riptano/cndb#15990

- What: Eliminates unnecessary array allocations in
PathUtils.atomicMoveWithFallback() by pre-allocating CopyOption[] arrays
as static constants.

- Why: JFR profiling showed this method was a memory allocation hotspot
during RemoteFileCache preloading. Each call to Files.move() with
varargs parameters created new arrays, causing excessive GC pressure.

 (Rebase of commit e054ecf)
…ting lists (#2112)

Fixes: riptano/cndb#15919
Test PR: riptano/cndb#15949

In the original implementation for
#820, we introduced the
`PrimaryKeyMapIterator` to iterate all primary keys in an sstable and
then do an anti-join on the result of an equality query. That design
works, but requires some additional reads from disk to get primary keys
that are unnecessary.

There are two possible solutions:

1. We can use row ids (either sstable or segment) to do the complement
of the resulting posting lists. This will be the most performant, since
it avoids object allocations. The main issue with this solution is that
it is much more complicated to implement and had unaddressed edge cases.
2. We can use the `primaryKeyFromRowId` that takes primary key bounds
and then uses a row id, when rows are from the same sstable. This will
be worse that solution 1 because it creates an object per key and
requires comparing sstable ids before comparing sstable row ids, but it
is a significant improvement over the current solution, which hits disk
to load the primary key.

When testing on my local machine and reviewing the JMH benchmarks, I can
see that the current solution is about 16x worse than the minimum
solution (2) and 32x worse than the optimal (1) solution. Given that the
benchmarks in question are highly specific to the use case, I do no
think we have sufficient motivation to introduce the exceedingly complex
(1) solution.

Note that the ideal solution to 1, that would have much less complexity,
is to convert posting lists into a single iterator of sstable row ids,
and then to take the complement of them.

 (Rebase of commit 9dd4c28)
…ility on all coordinators. Rename ambiguous waitForIndexqueryable() to waitForIndexQueryableOnFirstNode() and add coordinator-aware alternatives (#2127)

 (Rebase of commit 5d94ed4)

ninjafix – CNDB-16177: CNDB-15968: Fix flaky SAI tests by ensuring index queryability on all coordinators. Rename ambiguous waitForIndexqueryable() to waitForIndexQueryableOnFirstNode() and add coordinator-aw…

squash to a74ea99 / 5d94ed4
Adds coordinator CAS read/write latencies

 (Rebase of commit c32e5af)
…#2176)

### What is the issue

After CNDB-16249, hints are written with the storage-compatible
messaging version (e.g., VERSION_40), but dispatch was still using the
peer's negotiated version (VERSION_DS_20). This mismatch forced hints to
be decoded and re-encoded on dispatch, which fails when the tenant is
unassigned because tables are unknown.

### What does this PR fix and why was it fixed

Cap the dispatch version to the storage compatibility mode's version so
hints can use the encoded path (no deserialization) when the versions
match.

 (Rebase of commit 7d8f28b)
…compatibility mode (#2179)

### What is the issue

Fix version mismatch error when dispatching encoded hints in storage
compatibility mode

### What does this PR fix and why was it fixed

Remove version checks in HintMessage.Encoded serialization that
prevented dispatching hints written at storage compatibility version
(e.g., 12) when the peer connection uses a different messaging version
(e.g., 110).

UUID and VInt serialization are version-independent, and hint bytes are
written verbatim, so the version check was unnecessarily restrictive.

 (Rebase of commit c36fbcf)
ULID-based SSTable ID generation can fail with an NPE when generating a new ID. The root cause is that the underlying ULID generator can generate an empty Optional when the clock is moved backwards to before the previously generated ID or in certain rare overflow conditions when timestamp collides. If it's our first time through the generation loop, we prematurely exit with a null newVal.

Top of the error stack:
```
java.lang.NullPointerException
	at org.apache.cassandra.utils.TimeUUID.approximateFromULID(TimeUUID.java:58)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId.<init>(ULIDBasedSSTableId.java:52)
	at org.apache.cassandra.io.sstable.ULIDBasedSSTableId$Builder.lambda$generator$0(ULIDBasedSSTableId.java:129)
```
This can cause a flush to fail.

Continue looping until newVal gets a value. The loop can spin until the corrected time catches up to the time of the most recently used ULID generation ID. This should be a short duration in a healthy cluster without large time corrections from sync.

Tests are added in ULIDBasedSSTableIdGeneratorTest
A package-protected constructor is introduced for ULIDBasedSSTableIdGeneratorTest.testGeneratorRetryOnEmptyOptional()

Cassandra Applicability:
 upstream doesn't have ULIDBasedSSTableId (and won't because CASSANDRA-17048).

 (Rebase of commit e9afd3a)
…ly (#2186)

### What is the issue

FSReadError extends IOError, so it was not caught by the catch block in
filterCommitLogFiles(). This caused replayer failures when a remote
commit log file was listed but no longer accessible.

### What does this PR fix and why was it fixed

Only NoSuchFileException is handled gracefully; other IOErrors (such as
corruption) are re-thrown to fail the replay as intended.

 (Rebase of commit a51cdcc)
…out (#2208)

The test was still flaky because individual page fetches could exceed
the 1s range timeout (30ms artificial delay + actual processing)
Increased range timeout to 10s and added flush() for more predictable
reads.

 (Rebase of commit bed57b7)
…2212)

This test is obsolete after #16313, remove it.

 (Rebase of commit 1ffcfa6)
…d check (#2213)

The NoSpamLogger clock override in was checking that accesses came from
a single expected thread. This failed when internal threads accessed it
during truncate in cleanupRepairTables().

Removed the thread check since the purpose is deterministic time, not
thread enforcement.

 (Rebase of commit 94e3546)
…ild (#2211)

Brings main-5.0 in line with main using the referenced SAI View in
QueryView

 (Rebase of commit b951f8e)
PartitionAwarePrimaryKeyMap implements overcomplicated `ceiling` method
calling `exactRowIdOrInvertedCeiling`.

This commit Simplifies PartitionAwarePrimaryKeyMap.ceiling to use the corresponding
correct method from the reader directly.
This can be seen as a follow up to
https://github.com/datastax/cassandra/pull/1096/files#diff-c5011580ab9b0d99d9e504570c4cccb152221d3dbe62c8a956e83fce9070b380

Rebase notes:
- commit 48b55fb was dropped because it only contains lint/refactor changes and should instead be upstreamed

 (Rebase of commit e5c5d07)
The rationale behind this upgrade is to remediate CVE-2025-67735.

The BoringSSL has been upgraded as well for consistency with 4.1.130.Final.

CNDB PR: riptano/cndb#16481

 (Rebase of commit 2e251a1)
The current version of `lz4-java` (1.8.0) has known vulnerabilities,
namely
[CVE-2025-12183](https://www.cve.org/CVERecord?id=CVE-2025-12183)
[CVE-2025-66566](https://www.cve.org/CVERecord?id=CVE-2025-66566)

This patch updates the lz4-java library from 1.8.0 to 1.10.2
As of lz4-java 1.8.1, the artifact coordinates changed from `org.lz4` to
`at.yawk.lz4`, so this patch also updates the artifact coordinates

 (Rebase of commit af4abfb)
@adelapena adelapena self-assigned this Mar 2, 2026
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@michaelsembwever
Copy link
Member

CNDB-16593 → https://github.com/riptano/cndb/issues/16593

@michaelsembwever
Copy link
Member

michaelsembwever commented Mar 2, 2026

cherry-picked into mck/main-5.0.6-rebase (and testing there…)

@adelapena , feel free to add commits if/as they come

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 2, 2026

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2251 rejected by Butler


15 regressions found
See build details here


Found 15 new test failures

Test Explanation Runs Upstream
junit.framework.TestSuite.org.apache.cassandra.cql3.CompactionOutOfSpaceTest-compression_jdk11 NEW 🔴🔴 0 / 0
replace_address_test.TestReplaceAddress.test_revive_endpoint (offheap-bti) NEW 🔴🔴 0 / 0
o.a.c.db.commitlog.CommitLogReaderTest.testSyncMarkerChecksumReadFailed_ignoreReplayErrorsDisabled (compression) NEW 🔴🔴 0 / 0
o.a.c.db.commitlog.CommitLogReaderTest.testSyncMarkerChecksumReadFailed_ignoreReplayErrorsDisabled_commitFailurePolicyIgnore (compression) NEW 🔴🔴 0 / 0
o.a.c.distributed.test.DropUDTWithRestartTest.loadCommitLogAndSSTablesWithDroppedColumnTestCC50 NEW 🔴🔴 0 / 0
o.a.c.distributed.test.DropUDTWithRestartTest.loadCommitLogAndSSTablesWithDroppedColumnTestCassandra41 NEW 🔴🔴 0 / 0
o.a.c.distributed.test.DropUDTWithRestartTest.loadCommitLogAndSSTablesWithDroppedColumnTestCassandra5 NEW 🔴🔴 0 / 0
o.a.c.distributed.test.SSTableLoaderEncryptionOptionsTest.bulkLoaderSuccessfullyStreamsOverSsl NEW 🔴🔴 0 / 0
o.a.c.distributed.test.repair.ForceRepairTest.terminated successfully () NEW 🔴 0 / 0
o.a.c.index.CustomIndexTest.notifyIndexesOfFullyExpiredSSTablesDuringCompaction (compression) NEW 🔴🔴 0 / 0
o.a.c.index.IndexNameTest.testQuotedAndLongColumnNames[CREATE CUSTOM INDEX %s ON %s(%s) USING 'StorageAttachedIndex'] (compression) NEW 🔴🔴 0 / 0
o.a.c.index.sai.cql.StaticColumnIndexTest.testTupleAndBlobFiltering (compression) NEW 🔵🔴 0 / 0
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompaction[dc false] () NEW 🔴🔴 0 / 0
o.a.c.index.sai.cql.VectorKeyRestrictedOnPartitionTest.partitionRestrictedTest[eb false] (compression) NEW 🔵🔴 0 / 0
o.a.c.tools.SSTablePartitionsTest.testSingleSSTable (compression) NEW 🔴🔴 0 / 0

No known test failures found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.