Fix pg_cron fast shutdown hang with sync replication#414
Open
CyberDem0n wants to merge 1 commit intocitusdata:mainfrom
Open
Fix pg_cron fast shutdown hang with sync replication#414CyberDem0n wants to merge 1 commit intocitusdata:mainfrom
CyberDem0n wants to merge 1 commit intocitusdata:mainfrom
Conversation
Fast shutdown may hang indefinitely when `synchronous_standby_names` requirement cannot be satisfied due to an insufficient number of synchronous replicas. In this situation, pg_cron can block waiting for a synchronous replication acknowledgment. Example: ``` postgres -D testdb --shared_preload_libraries=pg_cron --synchronous_standby_names=foobar \_ postgres: io worker 0 \_ postgres: io worker 1 \_ postgres: io worker 2 \_ postgres: checkpointer \_ postgres: pg_cron launcher waiting for 0/A2DDC88 ``` gdb: ``` (gdb) bt #0 0x00007f7b2a5b3e5a in epoll_wait (epfd=5, events=0x56096e95dc08, maxevents=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 citusdata#1 0x0000560942c65aa6 in WaitEventSetWaitBlock (set=set@entry=0x56096e95dba0, cur_timeout=cur_timeout@entry=-1, occurred_events=occurred_events@entry=0x7fff16aa23d0, nevents=nevents@entry=1) at waiteventset.c:1191 citusdata#2 0x0000560942c664b5 in WaitEventSetWait (set=0x56096e95dba0, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fff16aa23d0, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=134217780) at waiteventset.c:1139 citusdata#3 0x0000560942c5884a in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=17, timeout=timeout@entry=-1, wait_event_info=wait_event_info@entry=134217780) at latch.c:196 citusdata#4 0x0000560942c0f6c4 in SyncRepWaitForLSN (lsn=170777736, commit=commit@entry=true) at syncrep.c:388 citusdata#5 0x00005609428d87cd in RecordTransactionCommit () at xact.c:1557 citusdata#6 0x00005609428d88f2 in CommitTransaction () at xact.c:2365 citusdata#7 0x00005609428d9831 in CommitTransactionCommandInternal () at xact.c:3202 citusdata#8 0x00005609428d9bbb in CommitTransactionCommand () at xact.c:3163 citusdata#9 0x00007f7b2b4b3b19 in MarkPendingRunsAsFailed () at src/job_metadata.c:1456 citusdata#10 0x00007f7b2b4b66a4 in PgCronLauncherMain (arg=<optimized out>) at src/pg_cron.c:588 citusdata#11 0x0000560942bc1798 in BackgroundWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at bgworker.c:879 citusdata#12 0x0000560942bc3a4b in postmaster_child_launch (child_type=child_type@entry=B_BG_WORKER, child_slot=238, startup_data=startup_data@entry=0x56096e9f67b0, startup_data_len=startup_data_len@entry=1472, client_sock=client_sock@entry=0x0) at launch_backend.c:290 citusdata#13 0x0000560942bc5bf2 in StartBackgroundWorker (rw=rw@entry=0x56096e9f67b0) at postmaster.c:4164 citusdata#14 0x0000560942bc5e43 in maybe_start_bgworkers () at postmaster.c:4330 citusdata#15 0x0000560942bc6be3 in LaunchMissingBackgroundProcesses () at postmaster.c:3404 citusdata#16 0x0000560942bc89f9 in ServerLoop () at postmaster.c:1717 citusdata#17 0x0000560942bc9e08 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x56096e95d2e0) at postmaster.c:1400 citusdata#18 0x0000560942acfc06 in main (argc=5, argv=0x56096e95d2e0) at main.c:227 ``` This happens because pg_cron installs a custom `SIGTERM` handler that does not set `ProcDiePending`, causing `SyncRepWaitForLSN()` to never exit its wait loop. Fix this by switching to the standard `SIGTERM` handler (`die()`). Additionally, remove the custom `SIGHUP` handler and rely on `SignalHandlerForConfigReload()` instead.
sfc-gh-mslot
approved these changes
Jan 21, 2026
Collaborator
sfc-gh-mslot
left a comment
There was a problem hiding this comment.
Makes a lot of sense.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fast shutdown may hang indefinitely when
synchronous_standby_namesrequirement cannot be satisfied due to an insufficient number of synchronous replicas. In this situation, pg_cron can block waiting for a synchronous replication acknowledgment.Example:
gdb:
This happens because pg_cron installs a custom
SIGTERMhandler that does not setProcDiePending, causingSyncRepWaitForLSN()to never exit its wait loop.Fix this by switching to the standard
SIGTERMhandler (die()). Additionally, remove the customSIGHUPhandler and rely onSignalHandlerForConfigReload()instead.