Skip to content

Zero-downtime major upgrades (blue-green deployment)#1430

Open
vitabaks wants to merge 109 commits intomasterfrom
pg-upgrade-logical
Open

Zero-downtime major upgrades (blue-green deployment)#1430
vitabaks wants to merge 109 commits intomasterfrom
pg-upgrade-logical

Conversation

@vitabaks
Copy link
Owner

@vitabaks vitabaks commented Jan 19, 2026

Issue: #537

This PR implements a blue-green deployment approach with the ability to switch traffic to the new cluster with near-zero downtime.

New playbook: pg_upgrade_logical.yml

blue-green

Additionally:

  • Implemented automatic configuration of pg_hba rules in the source cluster when deploying a standby cluster.
    • New variable: patroni_standby_cluster_auto_hba (default: true)
    • Note: this is a temporary rule and may be overwritten by future source cluster reconfiguration. It is recommended to explicitly configure standby cluster hosts in postgresql_pg_hba on the source cluster.

@vitabaks vitabaks self-assigned this Jan 19, 2026
@vitabaks vitabaks added the automation Automation functionality using Ansible label Jan 19, 2026
@vitabaks vitabaks marked this pull request as draft January 19, 2026 18:09
@vitabaks vitabaks added the new feature New functionality label Jan 20, 2026
Update cloud_resources tasks to account for Patroni standby clusters when deciding SSH key handling.
Pass ca_path (from patroni_restapi_cafile) and set status_code: 200 on the Patroni REST API HTTP calls in pg_upgrade_logical.yml and the upgrade pre_checks task. This ensures the configured TLS CA is used when contacting Patroni and that responses are explicitly validated as HTTP 200, without changing registration or changed_when behavior.
Replace the single-line psql invocation with an explicit {{ pg_old_bindir }}/psql call connecting over the PostgreSQL unix socket (using -h {{ postgresql_unix_socket_dir }}, -p and -U) and keep changed_when:false. Remove the previous psql_command var and PGPASSWORD environment use, and set failed_when:false to avoid failing on initial access checks. After reloading PostgreSQL config (pg_ctl reload) add a re-test of socket access and simplify the conditional to detect a non-zero return code (socket_access_result.rc != 0) instead of parsing stderr for 'no pg_hba.conf entry'.
Normalize Ansible task names for set_fact across playbooks and upgrade role tasks to use a consistent phrasing (e.g. "[Prepare] Set variable: ..." instead of "[Prepare] Set the variable: ..." and similar small label tweaks). Files changed: automation/playbooks/pg_upgrade_logical.yml, automation/roles/upgrade/tasks/pre_checks.yml, automation/roles/upgrade/tasks/upgrade_check.yml. No functional logic was modified—only task name strings for consistency and readability.
Delegate host selection now uses groups[group_name][0] instead of groups[source_group][0] for Patroni password pre-checks (superuser, replication, restapi). The vars block was updated to define group_name, falling back to the previous source_group logic or computing 'master'/'source_cluster' as before.
Replace shorthand role name `upgrade` with fully qualified Galaxy role `vitabaks.autobase.upgrade` in include_role calls within automation/playbooks/pg_upgrade_logical.yml (stop_target_primary, publication, recovery_target, subscription).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation Automation functionality using Ansible new feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant