Zero-downtime major upgrades (blue-green deployment)#1430
Open
Zero-downtime major upgrades (blue-green deployment)#1430
Conversation
Update cloud_resources tasks to account for Patroni standby clusters when deciding SSH key handling.
Pass ca_path (from patroni_restapi_cafile) and set status_code: 200 on the Patroni REST API HTTP calls in pg_upgrade_logical.yml and the upgrade pre_checks task. This ensures the configured TLS CA is used when contacting Patroni and that responses are explicitly validated as HTTP 200, without changing registration or changed_when behavior.
Replace the single-line psql invocation with an explicit {{ pg_old_bindir }}/psql call connecting over the PostgreSQL unix socket (using -h {{ postgresql_unix_socket_dir }}, -p and -U) and keep changed_when:false. Remove the previous psql_command var and PGPASSWORD environment use, and set failed_when:false to avoid failing on initial access checks. After reloading PostgreSQL config (pg_ctl reload) add a re-test of socket access and simplify the conditional to detect a non-zero return code (socket_access_result.rc != 0) instead of parsing stderr for 'no pg_hba.conf entry'.
Normalize Ansible task names for set_fact across playbooks and upgrade role tasks to use a consistent phrasing (e.g. "[Prepare] Set variable: ..." instead of "[Prepare] Set the variable: ..." and similar small label tweaks). Files changed: automation/playbooks/pg_upgrade_logical.yml, automation/roles/upgrade/tasks/pre_checks.yml, automation/roles/upgrade/tasks/upgrade_check.yml. No functional logic was modified—only task name strings for consistency and readability.
Delegate host selection now uses groups[group_name][0] instead of groups[source_group][0] for Patroni password pre-checks (superuser, replication, restapi). The vars block was updated to define group_name, falling back to the previous source_group logic or computing 'master'/'source_cluster' as before.
Replace shorthand role name `upgrade` with fully qualified Galaxy role `vitabaks.autobase.upgrade` in include_role calls within automation/playbooks/pg_upgrade_logical.yml (stop_target_primary, publication, recovery_target, subscription).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue: #537
This PR implements a blue-green deployment approach with the ability to switch traffic to the new cluster with near-zero downtime.
New playbook:
pg_upgrade_logical.ymlAdditionally:
patroni_standby_cluster_auto_hba(default: true)