fix: validate connection health and handle missing slots in cluster refresh #944

billshen99 · 2026-01-29T09:14:54Z

Summary

Fixes critical bug where broken connections were blindly reused during cluster
topology refresh, causing persistent "wrong shard/slot" errors requiring pod restarts.

Root Cause

During _refresh(), connections were reused without health validation. Network
issues could break connections, which would persist in error state indefinitely
across refresh cycles.

Changes

Critical, Connection health validation and reuse:
Validate connection health with Error() before reusing

The cluster client now checks the health of existing connections before reusing them during a topology refresh. Connections reporting errors are replaced with new ones, preventing broken connections from being reused.

Slot mapping resilience:

When the cluster topology information is incomplete (e.g., missing slots), the client preserves old slot-to-connection mappings for slots not covered by the new topology, ensuring no slot is left without a connection.

Replica slot fallback improvements:
Add replica-to-master fallback for missing replica slots

In scenarios where replica slot information is missing, the client now falls back to using the master connection for those slots, ensuring continued availability. [1] [2] [3]

Impact

Broken connections now auto-detected and replaced within seconds
No more manual pod restarts needed for this issue
Graceful degradation during incomplete topology updates
This pull request improves the reliability of the Redis cluster client by ensuring that unhealthy (broken) connections are not reused after a topology refresh, and by making slot mapping more robust when the cluster topology is incomplete. It also adds comprehensive tests to validate these behaviors.

Testing enhancements:

Added tests to verify that broken connections are not reused after a refresh, and that all slots are properly updated to use healthy connections, ensuring robust failover and recovery.

…efresh

jit-ci · 2026-01-29T09:15:00Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

rueian · 2026-01-30T05:05:22Z

cluster.go

-			conns[addr] = fresh
+			// Validate connection health before reusing
+			// If connection has error, it will be replaced with a new one
+			if cc.conn.Error() == nil {


mux handles reconnections internally, so no Error() check at the cluster client is okay.

rueian · 2026-01-30T05:05:58Z

cluster.go

+	// Preserve old slot mappings for any slots not covered by the new topology
+	// This prevents nil connections when CLUSTER SLOTS returns incomplete data
+	for i := 0; i < 16384; i++ {


We should not do this. CLUSTER SLOTS should be the only truth.

fix: validate connection health and handle missing slots in cluster r…

200ef61

…efresh

billshen99 mentioned this pull request Jan 29, 2026

Elevated MOVED errors while doing SETEX operations #937

Open

rueian reviewed Jan 30, 2026

View reviewed changes

rueian closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate connection health and handle missing slots in cluster refresh #944

fix: validate connection health and handle missing slots in cluster refresh #944

billshen99 commented Jan 29, 2026

Uh oh!

jit-ci bot commented Jan 29, 2026

Uh oh!

rueian Jan 30, 2026

Uh oh!

rueian Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: validate connection health and handle missing slots in cluster refresh #944

fix: validate connection health and handle missing slots in cluster refresh #944

Conversation

billshen99 commented Jan 29, 2026

Summary

Root Cause

Changes

Impact

Uh oh!

jit-ci bot commented Jan 29, 2026

Uh oh!

rueian Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

rueian Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants