Skip to content

ct: hook cloud topics recovery into whole cluster restore#29515

Merged
andrwng merged 8 commits intoredpanda-data:devfrom
andrwng:ct-wcr
Feb 6, 2026
Merged

ct: hook cloud topics recovery into whole cluster restore#29515
andrwng merged 8 commits intoredpanda-data:devfrom
andrwng:ct-wcr

Conversation

@andrwng
Copy link
Contributor

@andrwng andrwng commented Feb 4, 2026

This hooks cloud topics restore into the Whole Cluster Restore machinery. At a high level, cloud topics restore entails two steps:

  1. restore the L1 metastore
  2. restore the cloud topics, pointing the restored topics at the restored metastore

In restoring the metastore, we restore each domain, and then update the metastore topic config so its remote_label will be that of the original metastore. This sets the metastore up to continue uploading its metastore manifest to the original cluster's location, which will be a nice property to have for discoverability of the metastore (e.g. for read replicas). This updated remote label also serves as an indicator that restore is complete and that subsequent attempts to restore this metastore will be no-ops.

In restoring the cloud topics, all the original cloud topics' properties are preserved, but with the recovery property set. There is some additional work that needs to be done to make the restored cloud topics operational (e.g. on restore, they should query the metastore to bootstrap their CTP STMs), but that is left for later work.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • None

Copilot AI review requested due to automatic review settings February 4, 2026 09:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hooks cloud topics recovery into the whole cluster restore process. It introduces new recovery stages for cloud topics metastore and cloud topic data, allowing cloud topics to be restored from controller snapshots during cluster recovery.

Changes:

  • Added two new recovery stages (recovered_cloud_topics_metastore and recovered_cloud_topic_data) for restoring cloud topics
  • Modified the topic creation flow to use the metastore's remote label for cloud topics
  • Implemented recovery actions for cloud topics metastore and cloud topic data
  • Added comprehensive tests for cloud topics cluster recovery scenarios

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/v/redpanda/application_start.cc Passes cloud topics state to controller start for recovery
src/v/kafka/server/handlers/alter_configs.cc Prevents remote_label from being modified via AlterConfigs
src/v/cluster/types.h Adds remote_label property and new recovery stages
src/v/cluster/types.cc Adds string representation for new recovery stages
src/v/cluster/topics_frontend.cc Updates topic creation to use metastore's remote label for cloud topics
src/v/cluster/topic_table.cc Adds incremental update handling for remote_label
src/v/cluster/partition_manager.cc Skips log download for cloud topics during recovery
src/v/cluster/controller.h Adds cloud topics state parameter to controller start
src/v/cluster/controller.cc Passes cloud topics state to recovery backend
src/v/cluster/cluster_recovery_reconciler.h Adds cloud topics fields to recovery actions
src/v/cluster/cluster_recovery_reconciler.cc Implements recovery action generation for cloud topics
src/v/cluster/cloud_metadata/tests/controller_snapshot_reconciliation_test.cc Tests for metastore and cloud topic reconciliation
src/v/cluster/cloud_metadata/tests/cluster_metadata_utils.h Helper functions for cloud topic properties
src/v/cluster/cloud_metadata/tests/BUILD Adds dependencies for test helpers
src/v/cluster/cloud_metadata/cluster_recovery_backend.h Adds cloud topics state member
src/v/cluster/cloud_metadata/cluster_recovery_backend.cc Implements metastore and cloud topic recovery actions
src/v/cluster/BUILD Adds cloud topics metastore dependencies
src/v/cloud_topics/tests/cluster_recovery_test.cc End-to-end test for cloud topics cluster recovery
src/v/cloud_topics/tests/BUILD Adds cluster recovery test target
src/v/cloud_topics/level_one/metastore/tests/replicated_metastore_test.cc Tests restore with correct partition count
src/v/cloud_topics/level_one/metastore/tests/BUILD Adds test dependencies
src/v/cloud_topics/level_one/metastore/replicated_metastore.cc Updates restore to handle missing manifest and partition count
src/v/cloud_topics/level_one/metastore/leader_router.h Adds num_partitions parameter to ensure_topic_exists
src/v/cloud_topics/level_one/metastore/leader_router.cc Passes num_partitions through to domain supervisor
src/v/cloud_topics/level_one/domain/domain_supervisor.h Adds num_partitions parameter
src/v/cloud_topics/level_one/domain/domain_supervisor.cc Creates metastore topic with specified partition count
Comments suppressed due to low confidence (1)

src/v/cloud_topics/level_one/metastore/replicated_metastore.cc:1

  • Corrected spelling of 'uplaod' to 'upload'.
/*

// Restore on the metastore with our desired remote label. This will
// create the underlying topic if it doesn't already exist. Retry on
// transport errors since leadership may be unstable during recovery.
auto* metastore = _ct_state->local().get_l1_metastore();
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential null pointer dereference. While there's a check for !_ct_state earlier, there's no validation that get_l1_metastore() returns a non-null pointer before using it.

Suggested change
auto* metastore = _ct_state->local().get_l1_metastore();
auto* metastore = _ct_state->local().get_l1_metastore();
if (metastore == nullptr) {
vlog(
clusterlog.error,
"Cloud topics metastore not available for recovery");
co_return cluster::errc::invalid_configuration_update;
}

Copilot uses AI. Check for mistakes.
@andrwng andrwng force-pushed the ct-wcr branch 3 times, most recently from 4de461f to eef4196 Compare February 4, 2026 10:24
@andrwng andrwng requested review from Lazin, dotnwat and rockwotj February 4, 2026 17:23
auto last_included_term = last_segment->archiver_term;

// TODO: implement a recovery primitive for cloud topics
if (!ntp_cfg.cloud_topic_enabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this change? It interferes with the implementation of the CTP recovery.

Copy link
Contributor Author

@andrwng andrwng Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it tricky to rebase over the conflict? Would different formatting make the conflict easier to deal with?

This is required for my tests, as without it, WCR will fail in trying to autocreate the topics.

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with WCR, so will let another person review, but from my naive take it LGTM.

I can do more reading later in the week to get more familiar to be able to approve

@andrwng
Copy link
Contributor Author

andrwng commented Feb 5, 2026

Just rebased

Lazin
Lazin previously approved these changes Feb 5, 2026
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the last mile (partition manager recovery) but I'll revert that later

case recovered_acls:
return 5;
case recovered_remote_topic_data:
case recovered_cloud_topics_metastore:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order has to be changed because subsequent steps need all topics to be recovered already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that i change the order? or are you asking why it's important that it's ordered that way?

The order is important for a few reasons:

  • it's generally gonna be less error prone to do metastore recovery when we haven't restored all other (e.g. tiered storage) topics already
  • we want to restore the cloud topics before e.g. getting to the stage where we truncate consumer group offsets

tp_ns);
continue;
}
if (tp_config.is_cloud_topic()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for cloud topics the redpanda.remote.recovery topic property is not needed

the flow of the L0 recovery:

  • the reconciler recovers metastore
  • then it reads starting points for all CTPs
  • then it invokes method of the topics_frontend that replicates the starting points
  • the starting points are added to the topics table
  • CTP's are created normally (no recovery prop), the partitions will be seeded with the right starting points from the metastore

i think here we just need to add a config to the list without adding the recovery
or I guess I can change it in my PR later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that makes sense, we can either do that in the cluster_recovery_backend, which has access to the metastore when getting into the recovered_cloud_topic_data stage, or as you suggest maybe plumbing that into the topics_frontend. I suspect doing it in the recovery backend will be more straightforward

@andrwng
Copy link
Contributor Author

andrwng commented Feb 5, 2026

Force push:

  • handled some uncaught exceptions i found in looping the cluster_recovery test
  • tweaked add_node() in cluster_recovery_test to use the test configs that are in dev

@andrwng andrwng requested a review from Lazin February 5, 2026 23:14
There were some uncaught exceptions in opening the replicated database.
Wraps these in an appropriate error type. Along the way, I updated some
other exception handling to use the new wrapper.
In restoring the metastore topic during WCR, one of the steps to
complete recovery is to change the metastore topic's remote label to be
that of the original metastore topic. In this way, the location of the
metastore manifest will be consistent from the original cluster and
restored cluster (which is a useful property for read replicas, that
will need to locate the metastore).
It'll generally be more robust if the restored topic is created with the
number of partitions from the metastore manifest, vs the current
behaveior of rejecting the restore if there's a partition count
mismatch.

To that end, allow ensure_topic_exists() to take in a parameter for the
number of partitions.
- We previously rejected the restore if the number of domains didn't
  match the default number of partitions. Instead, we just create the
  same number of partitions as there are expected domains.
- We previoulsy rejected the restore if there is no metastore manifest
  in the cloud. Instead, treat this as success, and opt to create the
  metastore topic if needed.
This allows us to rely on the invariant that a topic's remote_label
points to the same remote label that the metastore's manifest will be.
This is a nice property to have for read replicas: given a topic
manifest, we'll be able to find the appropriate metastore manifest in
the object storage.
The current ordering of actions is somewhat brittle since action order
is tied to stage enum value. This makes it difficult to extend with new
stages because stages are serialized in the controller log, and thus
it's unsafe to add new stage enums with values that aren't at the end of
the enum space.

This commit takes a stab at making this a bit better by defining a
non-persisted ordering of stages that is decoupled from enum value. My
intent is to add more stages for cloud topics restore that will not be
the last stage actioned.
This adds a couple of stages that will be used for cloud topics
recovery:
- metastore recovery: the metastore will be restored such that the new
  cluster's metastore topic will host the appropriate domain databases
- topic recovery: each cloud topic will query the metastore to learn
  about the offsets/terms at which to start each partition, and then
  initialize their CTP STMs with those starting points

These stages are only performed if there are cloud topics that need to
be restored.
This implements the body of the cluster restore stages for cloud
topics, by restoring the metastore, and then calling create topics on
all the desired cloud topics.

One of the invariants that we're trying to uphold is that the metastore
topic will have the same remote_label after restoring as the original,
in order (this has the nice property of being able to point at the
metastore from a topic manifest, e.g. for read replicas).

To that end, restoring the metastore looks like:
1. creating the metastore topic
2. calling restore
3. upon success, updating the metastore topic's remote_label to be what
   it was in the dead cluster

After this happens, we go ahead and create all the cloud topics, which
is only partially implemented in this commit: we create the topics, but
additional logic is needed to get the restored topics to bootstrap their
ctp_stms from the metastore. That is left as separate work.
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#80268

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/random_node_operations_smoke_test.py::RedpandaNodeOperationsSmokeTest.test_node_ops_smoke_test@{"cloud_storage_type":1,"mixed_versions":false}

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#80268
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/80268#019c3030-cdd1-4dfc-8ac2-1da471fe89ce FLAKY 7/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0116, p0=0.0002, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
AdminSecurityReportTest test_security_report null integration https://buildkite.com/redpanda/redpanda/builds/80268#019c302d-cb6e-4ebc-88d1-cca3550d2122 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AdminSecurityReportTest&test_method=test_security_report

@andrwng andrwng enabled auto-merge February 6, 2026 06:44
@andrwng andrwng merged commit bb190ca into redpanda-data:dev Feb 6, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants