Skip to content

Conversation

Copy link

Copilot AI commented Feb 5, 2026

When Sentinel triggers a +switch-master event, Filter() closes connections but leaves them in idleConns forever. The connections have state CLOSED but remain in pool structures, creating zombie connections.

Changes

internal/pool/pool.go

  • Modified Filter() to call removeConn() after closeConn() to properly remove closed connections from idleConns slice and conns map

internal/pool/pool_test.go

  • Added test simulating Sentinel failover with Filter() closing a connection, verifying it's removed from pool

internal/pool/main_test.go

  • Added dummyConnWithAddr helper to create connections with unique addresses for testing

Root Cause

// Before: closeConn() sets state to CLOSED but doesn't remove from pool
func (p *ConnPool) Filter(fn func(*Conn) bool) error {
    for _, cn := range p.conns {
        if fn(cn) {
            p.closeConn(cn)  // Sets state=CLOSED, but cn stays in p.conns and p.idleConns
        }
    }
}

With LIFO ordering, closed connections in the middle of idleConns are never encountered during popIdle(), causing them to accumulate indefinitely.

Original prompt

This section details on the original issue you should resolve

<issue_title>Zombie connection in ConnPool.idleConns</issue_title>
<issue_description>Connections closed on +switch-master notification from Sentinel remains in ConnPool.idleConns forever

Expected Behavior

Idle connections with state "CLOSED" should be cleaned up from ConnPool.idleConns and new connections should be reestablished.

Current Behavior

Idle connections remains in ConnPool.idleConns forever.
Currently we use v9.17.2.

Possible Solution

On v9.16.0 it works as expected.
First, on +switch-master event for every connection in pool p.closeConn(cn) is called. The difference between versions is a new line cn.stateMachine.Transition(StateClosed) in Conn.Close() method, which changes the state to "CLOSED".
Second, ConnPool.popIdle() behavior was changed.
In v9.16.0 the following code returns cn, which later is being tested in p.isHealthyConn(cn, now). As a result closed connection is being removed from pool.


		if cn.CompareAndSwapUsed(false, true) {
			if cn.IsUsable() {
				p.idleConnsLen.Add(-1)
				break
			}
			cn.SetUsed(false)
		}

In v9.17.2 the code above was replaced with the following:


		// Hot path optimization: try IDLE → IN_USE or CREATED → IN_USE transition
		// Using inline TryAcquire() method for better performance (avoids pointer dereference)
		if cn.TryAcquire() {
			// Successfully acquired the connection
			p.idleConnsLen.Add(-1)
			break
		}

cn.TryAcquire() will give True if and only if the stated is in "IDLE" or "CREATED". We have state "CLOSED" here, so as a result we have cn=nil. There are no remaining available connection in pool, so we go through the branch of code where for every upcoming commands a new connection is being created.

I believe we need to return closed cn from popIdle method to be cleared in isHealthyConn as before, or remove from pool here.

Steps to Reproduce

  1. Run redis in sentinel mode
  2. Create a Failover Client with poolSize=1

        client = goredis.NewFailoverClient(&goredis.FailoverOptions{
		MasterName:      sentinelName,
		SentinelAddrs:   strings.Split(sentinels, ","),
		PoolSize:        1,
		DisableIdentity: true,
	})

  1. Send at least one command to fill the pool
  2. Trigger switch master
  3. To easier catch the bug we can use multi-exec
	
	set, err := client.Set(ctx, "testRadishKeyMulti2", "5", 0).Result()
	assert.NoError(t, err)
	assert.Equal(t, "OK", set)

	multi, err := client.Do(ctx, "MULTI").Result()
	assert.NoError(t, err)
	assert.Equal(t, "OK", multi.(string))

	incr, err := client.Do(ctx, "INCR", "testRadishKeyMulti2").Result()
	assert.NoError(t, err)
	assert.Equal(t, "QUEUED", incr.(string))

	decr, err := client.Do(ctx, "DECR", "testRadishKeyMulti2").Result()
	assert.NoError(t, err)
	assert.Equal(t, "QUEUED", decr.(string))

	exec, err := client.Do(ctx, "EXEC").Result()
	assert.NoError(t, err)
	execArray, ok := exec.([]any)
	assert.True(t, ok)
	assert.EqualValues(t, int64(6), execArray[0])
	assert.EqualValues(t, int64(5), execArray[1])

Context (Environment)

We moved from v9.16.0 due to that maintenance bug

Detailed Description

Possible Implementation

</issue_description>

<agent_instructions>idleCons in the pool should include only idle connections, no closed connections.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@ndyakov @alexey-s-aksenov thank you for reporting this. There is a release `9.17.3` have you checked if it addresses this? @ndyakov @alexey-s-aksenov noted, we will look into it, but probably won't be able to fix in the next version as we are late in the preparation process. Will release a patch afterward when we are able to resolve this one. Cc @ofekshenawa for visibility

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@jit-ci
Copy link

jit-ci bot commented Feb 5, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copilot AI and others added 2 commits February 5, 2026 12:37
- Modified Filter() to call removeConn() after closeConn() to properly clean up closed connections from idleConns
- Added test to verify closed connections are removed from the pool
- Added dummyConnWithAddr helper type for testing with unique connection addresses

Co-authored-by: ndyakov <1547186+ndyakov@users.noreply.github.com>
Co-authored-by: ndyakov <1547186+ndyakov@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix zombie connection in ConnPool.idleConns Fix zombie connections in ConnPool after Sentinel failover Feb 5, 2026
Copilot AI requested a review from ndyakov February 5, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zombie connection in ConnPool.idleConns

2 participants