Skip to content

Add support for externally managed members#1214

Open
CaptainIRS wants to merge 6 commits intogardener:masterfrom
CaptainIRS:externally-managed-members
Open

Add support for externally managed members#1214
CaptainIRS wants to merge 6 commits intogardener:masterfrom
CaptainIRS:externally-managed-members

Conversation

@CaptainIRS
Copy link
Member

@CaptainIRS CaptainIRS commented Nov 14, 2025

How to categorize this PR?
/area control-plane
/area high-availability
/kind enhancement
/kind api-change

What this PR does / why we need it:
This pull request introduces a new mode of operation in etcd-druid to support etcd clusters whose members are managed by an external actor, rather than by etcd-druid via the StatefulSet controller. This is primarily to enable use cases like Gardener's self-hosted shoot clusters (GEP-28), where a tool like gardenadm is responsible for deploying and managing etcd members as static pods on the control plane nodes.

To enable this, a new field, spec.externallyManagedMemberAddresses, is introduced in the Etcd API. When this field is populated with a list of member IP addresses, etcd-druid's behavior changes as follows:

  • Decoupled Pod Management: etcd-druid no longer manages the lifecycle of etcd pods. It creates the StatefulSet with replicas=0 to serve as a template but does not create any pods itself.
    • The template now sets the POD_NAME to be <etcd-name>-<ip-address> for the external tool to spin up etcd pods with the etcd-backup-restore sidecar using the template.
  • IP-based Communication: All etcd configuration is generated to use the provided IP addresses for peer communication. This removes the dependency on Kubernetes DNS and headless services.
    • The initial-cluster and advertised-<peer|client>-urls configuration in the ConfigMap is populated using the IPs from spec.externallyManagedMemberAddresses.
    • Member identities are based on their IP addresses (e.g., etcd-main-192.168.0.1) instead of StatefulSet ordinals.
  • Disabled Component Creation: The creation of several components that are tied to pod lifecycle management is disabled:
    • Client and Peer Services.
    • PodDisruptionBudget.
  • Continued Management Role: While pod management is delegated, etcd-druid continues to provide value by:
    • Generating and reconciling the etcd ConfigMap.
    • Providing status reporting and facilitating maintenance operations for the externally managed cluster (enabling all functionality of etcd-backup-restore full/delta snapshots, compaction, de-fragmentation, etc.).

This approach provides a generic mechanism to integrate etcd-druid with diverse deployment strategies beyond the default StatefulSet-based model, making it more flexible.

This work supersedes the previous approach in PR #1117, opting for a more explicit API field (spec.externallyManagedMemberAddresses) over an annotation, along with design changes that overcome the limitations of the previous approach.

Which issue(s) this PR fixes:
Part of #1071
Supersedes #1117

Special notes for your reviewer:
The core logic is triggered by the presence of the spec.externallyManagedMemberAddresses field. Please pay close attention to the validation rules for this new field, as they are crucial for preventing unsupported transitions:

  • The field must contain a list of valid IPv4 address strings.
  • If specified, its length must be equal to spec.replicas.
  • The field can be specified during the creation of a new Etcd resource.
  • Mode Conversion is Not Supported:
    • The field cannot be added to an existing, running cluster or a hibernating cluster that is currently managed by etcd-druid
    • The field cannot be removed from a running, externally managed cluster to convert it back to a etcd-druid-managed one.
    • Externally managed clusters cannot be hibernated, i.e it cannot be scaled-in to zero replicas.

When externallyManagedMemberAddresses is set, etcd-druid effectively transitions from a pod lifecycle manager to a configuration and maintenance provider for an externally managed cluster.

Release note:

A new field `spec.externallyManagedMemberAddresses` has been added to the `Etcd` API. When this field is specified with a list of IP addresses, `etcd-druid` enters a new operational mode where it does not manage the etcd member pods directly. In this mode, `etcd-druid` will not create pods, services, or `PodDisruptionBudget`s. Instead, it will generate the etcd `ConfigMap` with a configuration tailored for the provided member IP addresses, enabling peer communication without relying on Kubernetes services.

This feature allows external actors, such as `gardenadm`, to manage the lifecycle of etcd members (e.g., as static pods) while still leveraging `etcd-druid` for configuration generation, status reporting, and other management tasks.

@gardener-prow
Copy link

gardener-prow bot commented Nov 14, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@gardener-robot
Copy link

@CaptainIRS Labels area/todo, kind/todo do not exist.

@gardener-robot gardener-robot added needs/review Needs review size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs/second-opinion Needs second review by someone else labels Nov 14, 2025
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch 2 times, most recently from 8eca409 to 6f7f97e Compare November 18, 2025 08:39
@CaptainIRS CaptainIRS self-assigned this Nov 24, 2025
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch 2 times, most recently from 9215661 to 6ac8046 Compare November 26, 2025 05:50
@CaptainIRS
Copy link
Member Author

/test all

@gardener-robot gardener-robot added area/control-plane Control plane related area/high-availability High availability related kind/api-change API change with impact on API users kind/enhancement Enhancement, improvement, extension labels Dec 1, 2025
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch 3 times, most recently from f7d4681 to 75c63f4 Compare December 15, 2025 08:44
@gardener gardener deleted a comment from gardener-prow bot Dec 15, 2025
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch 2 times, most recently from 847faff to 4f3f9fb Compare December 18, 2025 10:09
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch from 4f3f9fb to b1961fb Compare December 31, 2025 06:45
@gardener-github-actions gardener-github-actions bot added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Dec 31, 2025
@github-actions github-actions bot added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 31, 2025
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch from b1961fb to 23d8f36 Compare January 12, 2026 06:24
@gardener-github-actions gardener-github-actions bot added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 12, 2026
@github-actions github-actions bot removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 12, 2026
@CaptainIRS CaptainIRS marked this pull request as ready for review January 12, 2026 06:30
@CaptainIRS CaptainIRS requested a review from a team as a code owner January 12, 2026 06:30
@CaptainIRS CaptainIRS removed their assignment Jan 13, 2026
@gardener-prow gardener-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 22, 2026
@rfranzke rfranzke added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) ok-to-test Indicates a non-member PR verified by an org member that is safe to test. labels Feb 2, 2026
@github-actions github-actions bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. and removed ok-to-test Indicates a non-member PR verified by an org member that is safe to test. labels Feb 2, 2026
@gardener-prow gardener-prow bot added the cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. label Feb 5, 2026
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch from 23d8f36 to 40f34d5 Compare February 9, 2026 12:30
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2026
@gardener-prow
Copy link

gardener-prow bot commented Feb 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign unmarshall for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 9, 2026
@gardener-github-actions gardener-github-actions bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 9, 2026
@github-actions github-actions bot removed the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 9, 2026
@gardener-github-actions gardener-github-actions bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 9, 2026
@github-actions github-actions bot removed the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 9, 2026
@CaptainIRS CaptainIRS force-pushed the externally-managed-members branch from 614fe13 to 33e15d3 Compare February 9, 2026 12:45
@gardener-github-actions gardener-github-actions bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 9, 2026
@github-actions github-actions bot removed the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 9, 2026
@CaptainIRS
Copy link
Member Author

/retest

@Shreyas-s14
Copy link
Member

/test pull-etcd-druid-integration

@gardener-prow
Copy link

gardener-prow bot commented Feb 13, 2026

@CaptainIRS: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-druid-integration 33e15d3 link true /test pull-etcd-druid-integration

Full PR test history. Your PR dashboard. Command help for this repository.
Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane Control plane related area/high-availability High availability related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/api-change API change with impact on API users kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review needs/second-opinion Needs second review by someone else needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants