Skip to content

[Enhancement] Support sparse segment idx in shared-data storage metadata#68996

Merged
xiangguangyxg merged 1 commit intoStarRocks:mainfrom
xiangguangyxg:segment_id
Feb 13, 2026
Merged

[Enhancement] Support sparse segment idx in shared-data storage metadata#68996
xiangguangyxg merged 1 commit intoStarRocks:mainfrom
xiangguangyxg:segment_id

Conversation

@xiangguangyxg
Copy link
Contributor

@xiangguangyxg xiangguangyxg commented Feb 6, 2026

Why I'm doing:

Previously, the rssid (rowset-segment-id) was always computed as
rowset_id + segment_index, which implicitly assumed segment IDs are
contiguous (0, 1, 2, ...). This assumption prevents future optimizations
such as segment-level partial compaction or segment reuse across rowsets,
where segment IDs within a rowset may become sparse or non-contiguous.

What I'm doing:

Add an explicit segment_idx field to SegmentMetadataPB in the protobuf
definition, and update all lake storage code paths to use it instead of
deriving segment ID from the positional index.

Key changes:

  • Add optional uint32 segment_idx = 4 to SegmentMetadataPB with
    backward-compatible fallback to segment index when the field is absent.
  • Introduce helper functions (get_segment_idx, get_max_segment_idx,
    get_rowset_id_step, get_rssid) in meta_file.h/cpp to centralize
    segment ID and rssid computation.
  • Update all rssid computation sites: primary index rebuild, persistent
    index loading, compaction publish, delta writer, schema change, spark
    load, tablet reshard, and meta reader.
  • Convert next_rowset_id advancement from std::max(1, segments_size())
    to get_rowset_id_step() which accounts for sparse segment IDs.
  • Update batch merge logic in txn_log_applier and meta_file to remap
    segment IDs into the merged rowset's local ID space.
  • Replace range-based delvec/dcg deletion in apply_opcompaction with
    set-based lookup using unordered_set for correctness with sparse IDs.
  • Replace std::iota in compaction conflict checker with explicit
    get_rssid loop.
  • Add comprehensive unit tests covering sparse segment ID scenarios.

Fixes #64986

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@xiangguangyxg xiangguangyxg requested review from a team as code owners February 6, 2026 07:17
@github-actions github-actions bot added the 4.1 label Feb 6, 2026
@wanpengfei-git wanpengfei-git requested a review from a team February 6, 2026 07:18
@CelerData-Reviewer
Copy link

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a84a0610c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

srlch
srlch previously approved these changes Feb 9, 2026
srlch
srlch previously approved these changes Feb 12, 2026
Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
@xiangguangyxg xiangguangyxg changed the title [Enhancement] Support sparse segment id in shared-data storage metadata [Enhancement] Support sparse segment idx in shared-data storage metadata Feb 12, 2026
@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

pass : 162 / 171 (94.74%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/lake/spark_load.cpp 0 2 00.00% [116, 124]
🔵 be/src/storage/lake/rowset.cpp 16 22 72.73% [156, 632, 642, 643, 645, 653]
🔵 be/src/storage/lake/update_manager.cpp 46 47 97.87% [1661]
🔵 be/src/storage/lake/meta_file.cpp 53 53 100.00% []
🔵 be/src/storage/lake/column_mode_partial_update_handler.cpp 2 2 100.00% []
🔵 be/src/storage/lake/lake_persistent_index.cpp 11 11 100.00% []
🔵 be/src/storage/lake/compaction_task.cpp 2 2 100.00% []
🔵 be/src/storage/lake/tablet_reshard.cpp 2 2 100.00% []
🔵 be/src/storage/lake/lake_primary_index.cpp 2 2 100.00% []
🔵 be/src/storage/lake/txn_log_applier.cpp 18 18 100.00% []
🔵 be/src/storage/lake_meta_reader.cpp 1 1 100.00% []
🔵 be/src/storage/lake/delta_writer.cpp 2 2 100.00% []
🔵 be/src/storage/lake/schema_change.cpp 6 6 100.00% []
🔵 be/src/storage/lake/meta_file.h 1 1 100.00% []

@sonarqubecloud
Copy link

@xiangguangyxg xiangguangyxg merged commit 4223cfd into StarRocks:main Feb 13, 2026
72 of 78 checks passed
@xiangguangyxg xiangguangyxg deleted the segment_id branch February 13, 2026 02:00
@github-actions
Copy link
Contributor

@Mergifyio backport branch-4.1

@github-actions github-actions bot removed the 4.1 label Feb 13, 2026
@mergify
Copy link
Contributor

mergify bot commented Feb 13, 2026

backport branch-4.1

✅ Backports have been created

Details

mergify bot pushed a commit that referenced this pull request Feb 13, 2026
…ata (#68996)

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
(cherry picked from commit 4223cfd)
HangyuanLiu pushed a commit to HangyuanLiu/starrocks that referenced this pull request Feb 13, 2026
…ata (StarRocks#68996)

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
wanpengfei-git pushed a commit that referenced this pull request Feb 13, 2026
…ata (backport #68996) (#69236)

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
Co-authored-by: xiangguangyxg <110401425+xiangguangyxg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New multi-tenant data management

5 participants