discovery: fix gossiper shutdown deadlock by morehouse · Pull Request #10540 · lightningnetwork/lnd

morehouse · 2026-02-02T22:12:13Z

When processing a remote network announcement, it is possible for two error messages to be sent back on the errChan. Since Brontide doesn't actually read from errChan, and since errChan only buffered one error message, the sending goroutine would deadlock forever. This would only become apparent when the gossiper attempted to shut down and got hung up.

For now, we can fix this simply by buffering up to two error messages on errChan. There is an existing TODO to restructure this logic entirely to use the actor model, and we can do a more thorough fix as part of that work.

This bug was discovered while doing full node fuzz testing and was triggered by sending a specific channel_announcement message and then shutting down LND.

gemini-code-assist · 2026-02-02T22:12:27Z

Summary of Changes

Hello @morehouse, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical deadlock issue within the gossiper component, specifically during its shutdown sequence. The fix involves a targeted adjustment to an internal error channel's buffering capacity, preventing goroutines from blocking indefinitely when multiple errors are generated. This enhancement significantly improves the robustness and reliability of the gossiper's shutdown process, ensuring cleaner termination even under specific error conditions.

Highlights

Deadlock Fix: Resolved a potential deadlock in the gossiper's ProcessRemoteAnnouncement function during shutdown. This occurred when two error messages were sent to a single-buffered errChan and not read by Brontide, causing the sending goroutine to block indefinitely.
Error Channel Buffering: Increased the buffer size of the errChan in ProcessRemoteAnnouncement from 1 to 2. This change accommodates scenarios where up to two error messages might be sent, preventing goroutine blocking and ensuring clean shutdown.
Regression Test: Added a new test TestGossiperShutdownWrongChainAnnouncement to specifically reproduce the deadlock condition. This test verifies that the gossiper can now shut down cleanly after processing problematic channel announcements that previously led to deadlocks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

discovery/gossiper.go
- Increased the buffer size of the errChan in ProcessRemoteAnnouncement from 1 to 2.
- Added comments explaining the rationale for the buffer increase and a TODO for future refactoring using an actor model.
discovery/gossiper_test.go
- Introduced TestGossiperShutdownWrongChainAnnouncement, a new regression test.
- This test simulates a scenario where a channel announcement with a wrong chain hash and identical node IDs triggers two error messages, verifying that the gossiper can now shut down without deadlocking.

Activity

The author identified and fixed a deadlock issue during full node fuzz testing.
A regression test was added to ensure the fix prevents future occurrences of the deadlock.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a potential deadlock in the gossiper during shutdown. The fix is simple and effective: increasing the buffer size of an error channel to prevent blocking when multiple errors are sent without a reader. The change is well-commented, explaining the reason for the fix and noting it as a temporary solution with a TODO for a future redesign. A comprehensive regression test has been added that accurately reproduces the deadlock scenario, ensuring the fix is effective and preventing future regressions. The test is clear and well-documented. Overall, this is a solid contribution that improves the stability of the gossiper.

lightninglabs-deploy · 2026-02-02T23:12:50Z

🟠 PR Severity: HIGH

Gossip protocol deadlock fix | 2 files | 9 lines changed

🟠 High (1 file)

discovery/gossiper.go - Gossip protocol component, deadlock fix affecting peer connection handling

🟢 Tests (1 file)

discovery/gossiper_test.go - Test coverage for the fix

Analysis

This PR fixes a shutdown deadlock in the gossiper component. The discovery/* package handles the Lightning Network gossip protocol, which is critical infrastructure for network graph maintenance and peer communication.

Why HIGH severity:

Changes to discovery/gossiper.go affect the gossip protocol subsystem
Deadlocks in shutdown paths can cause node hangs and require manual intervention
While the change is small (9 lines), it touches concurrency control in a core networking component
According to classification rules, discovery/* falls under HIGH severity tier

Review considerations:

Verify the deadlock scenario is properly resolved
Check that the fix doesn't introduce new race conditions
Ensure graceful shutdown behavior is maintained
Test coverage appears adequate with 61 lines of new tests

_{To override, add a `severity-override-{critical,high,medium,low}` label.}

ziggie1984 · 2026-02-03T01:46:02Z

Thank you for the fix, please make sure the linter passes.

Please also add release-notes for LND 21.0 release notes.

morehouse · 2026-02-03T15:14:22Z

Linter fixed and release note added.

When processing a remote network announcement, it is possible for two error messages to be sent back on the errChan. Since Brontide doesn't actually read from errChan, and since errChan only buffered one error message, the sending goroutine would deadlock forever. This would only become apparent when the gossiper attempted to shut down and got hung up. For now, we can fix this simply by buffering up to two error messages on errChan. There is an existing TODO to restructure this logic entirely to use the actor model, and we can do a more thorough fix as part of that work. This bug was discovered while doing full node fuzz testing and was triggered by sending a specific channel_announcement message and then shutting down LND.

morehouse · 2026-02-03T15:17:58Z

Rebased

ziggie1984

LGTM (pending CI)

NishantBansal2003 · 2026-02-04T14:16:03Z

discovery/gossiper.go

+	//
+	// TODO(ziggie): Redesign this once the actor model pattern becomes
+	// available. See https://github.com/lightningnetwork/lnd/pull/9820.
+	errChan := make(chan error, 2)


Just wanted to share a few thoughts here. Even if the actor model pattern is not being used, it doesn’t feel intuitive to me that a single announcement message from a peer would result in two errors being sent on the error channel. Once we encounter the first error, it seems reasonable to return immediately.

If the current focus is just on fixing the issue rather than revisiting the flow, that’s totally fine. I mainly wanted to understand the reasoning behind emitting a second error here (especially since a blocked slot in the gossiper seems to have a much larger impact than a deadlock during shutdown).

This PR should only focus on solving the problem which was found during fuzzing. However I think we can revisit this area in the future making sure that the design follows proper coding design principles.

Generally I consider the pattern of sending errors over a channel in a function like this an anti-pattern. It's caused many bugs (like this) in the past, and can also make the control flow a bit harder to track as well.

Once we encounter the first error, it seems reasonable to return immediately.

Agreed. fn.Future can work nicely here for that, as it will just return that value instead of mssing around with channels here.

Roasbeef

LGTM 🐔

Roasbeef · 2026-02-04T18:31:18Z

discovery/gossiper.go

+	//
+	// TODO(ziggie): Redesign this once the actor model pattern becomes
+	// available. See https://github.com/lightningnetwork/lnd/pull/9820.
+	errChan := make(chan error, 2)


Generally I consider the pattern of sending errors over a channel in a function like this an anti-pattern. It's caused many bugs (like this) in the past, and can also make the control flow a bit harder to track as well.

Once we encounter the first error, it seems reasonable to return immediately.

Agreed. fn.Future can work nicely here for that, as it will just return that value instead of mssing around with channels here.

github-actions · 2026-02-05T01:12:53Z

Created backport PR for v0.20.x-branch:

[v0.20.x-branch] Backport #10540: discovery: fix gossiper shutdown deadlock #10554 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-10540-to-v0.20.x-branch
git worktree add --checkout .worktree/backport-10540-to-v0.20.x-branch backport-10540-to-v0.20.x-branch
cd .worktree/backport-10540-to-v0.20.x-branch
git reset --hard HEAD^
git cherry-pick -x 5fa025e9e7a180de0a81750757fcc20eb37b28a9
git push --force-with-lease

…20.x-branch [v0.20.x-branch] Backport #10540: discovery: fix gossiper shutdown deadlock

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

saubyk assigned morehouse Feb 3, 2026

saubyk added this to v0.21 Feb 3, 2026

saubyk added this to the v0.21.0 milestone Feb 3, 2026

saubyk moved this to In progress in v0.21 Feb 3, 2026

ziggie1984 added no-changelog bug fix and removed no-changelog labels Feb 3, 2026

ziggie1984 added the gossip label Feb 3, 2026

morehouse force-pushed the gossiper_shutdown_deadlock branch from 5d1d71c to 3b7b8b0 Compare February 3, 2026 15:13

morehouse added 2 commits February 3, 2026 09:14

docs: add release note

5fa025e

morehouse force-pushed the gossiper_shutdown_deadlock branch from 3b7b8b0 to 5fa025e Compare February 3, 2026 15:16

ziggie1984 approved these changes Feb 3, 2026

View reviewed changes

saubyk requested a review from yyforyongyu February 3, 2026 17:57

Roasbeef mentioned this pull request Feb 3, 2026

ci: add tool constraints to pr-severity prompt #10550

Merged

NishantBansal2003 reviewed Feb 4, 2026

View reviewed changes

Roasbeef approved these changes Feb 4, 2026

View reviewed changes

Roasbeef merged commit 5c7d3ad into lightningnetwork:master Feb 4, 2026
48 of 49 checks passed

github-project-automation bot moved this from In progress to Done in v0.21 Feb 4, 2026

morehouse deleted the gossiper_shutdown_deadlock branch February 4, 2026 20:28

Roasbeef added the backport-v0.20.x-branch This label is used to trigger the creation of a backport PR to the branch `v0.20.x-branch`. label Feb 5, 2026

Roasbeef mentioned this pull request Feb 5, 2026

build: bump version to v0.20.1 #10551

Open

github-actions bot mentioned this pull request Feb 5, 2026

[v0.20.x-branch] Backport #10540: discovery: fix gossiper shutdown deadlock #10554

Merged

Roasbeef pushed a commit that referenced this pull request Feb 5, 2026

docs: add release note for #10540

f427fee

yyforyongyu added a commit that referenced this pull request Feb 6, 2026

Merge pull request #10554 from lightningnetwork/backport-10540-to-v0.…

a4f375f

…20.x-branch [v0.20.x-branch] Backport #10540: discovery: fix gossiper shutdown deadlock

Conversation

morehouse commented Feb 2, 2026

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

lightninglabs-deploy commented Feb 2, 2026

🟠 PR Severity: HIGH

Analysis

Uh oh!

ziggie1984 commented Feb 3, 2026

Uh oh!

morehouse commented Feb 3, 2026

Uh oh!

morehouse commented Feb 3, 2026

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

NishantBansal2003 Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Roasbeef Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Roasbeef Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

NishantBansal2003 Feb 4, 2026 •

edited

Loading