Skip to content

fix: serverIPListComponent not close when invoke Client close#350

Open
Epiphany523 wants to merge 5 commits intoapolloconfig:developfrom
Epiphany523:fix_not_close_serverIPListComponent
Open

fix: serverIPListComponent not close when invoke Client close#350
Epiphany523 wants to merge 5 commits intoapolloconfig:developfrom
Epiphany523:fix_not_close_serverIPListComponent

Conversation

@Epiphany523
Copy link

problem1 desc

serverIPListComponent not close when invoke Client close

client, err := agollo.StartWithConfig(func() (*config.AppConfig, error) {
		return appConfig, nil
	})
defer client.Close()

problem2 desc

ConfigComponent.stopCh should init when ConfigComponent struct create, not ConfigComponent.Start

test

{"level":"DEBUG","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:73","msg":"init notifySyncConfigServices finished"}
{"level":"DEBUG","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:73","msg":"get all server info:[{\"appName\":\"apollo-configservice\",\"instanceId\":\"apollo-configservice:http://apollo.demo:8080\",\"homepageUrl\":\"http://apollo.demo:8080\"}]"}
{"level":"DEBUG","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:73","msg":"ConfigComponent started"}
{"level":"DEBUG","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:73","msg":"syncServerIpListComponent started"}
{"level":"INFO","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:78","msg":"agollo start finished ! "}
{"level":"DEBUG","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:73","msg":"ConfigComponent stopped"}
{"level":"DEBUG","time":"2026-01-19 19:24:42.752","caller":"/home/aaa/open-source/agollo/component/log/log.go:73","msg":"syncServerIpListComponent stopped"}

@coderabbitai
Copy link

coderabbitai bot commented Jan 19, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify
Copy link
Contributor

mergify bot commented Jan 19, 2026

感谢您提出Pull Request,我会尽快Review。我会在1-2日内进行查看或者回复,如果遇到节假日可能会处理较慢,敬请谅解。

Copy link
Member

@nobodyiam nobodyiam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢你的贡献!这个 PR 修复了一个重要的资源泄漏问题 👍

✅ 优点

  1. 问题定位准确: 确实修复了 serverIPListComponent 和 ConfigComponent 的资源泄漏问题
  2. 设计改进: 统一使用 components 数组管理所有组件,代码更清晰
  3. 符合最佳实践:
    • stopCh 在构造函数中初始化,避免并发问题
    • 使用 chan struct{} 替代 chan interface{},更符合 Go 惯例
  4. 日志完善: 添加了组件启动和停止的 debug 日志

⚠️ 需要改进的地方

1. ConfigComponent 缺少 Stop() 方法

component/notify/componet_notify.go 中 ConfigComponent 实现了 AbsComponent 接口,但缺少 Stop() 方法的实现。建议添加:

func (c *ConfigComponent) Stop() {
    if c.stopCh != nil {
        close(c.stopCh)
    }
}

2. 并发安全问题

Stop() 方法中关闭 channel 可能存在重复关闭的风险。建议使用 sync.Once 保护:

type SyncServerIPListComponent struct {
    appConfig func() config.AppConfig
    stopCh    chan struct{}
    stopOnce  sync.Once
}

func (s *SyncServerIPListComponent) Stop() {
    s.stopOnce.Do(func() {
        if s.stopCh != nil {
            close(s.stopCh)
        }
    })
}

ConfigComponent 同样需要此保护。

3. 废弃方法处理

SetAppConfigSetCache 方法已经不再使用,建议:

  • 标记为 Deprecated 并在注释中说明使用 NewConfigComponent 替代
  • 或者直接删除(如果确认没有外部调用)
// SetAppConfig Deprecated: use NewConfigComponent instead
func (c *ConfigComponent) SetAppConfig(appConfigFunc func() config.AppConfig) {
    c.appConfigFunc = appConfigFunc
}

4. 测试覆盖

建议添加测试用例验证:

  • Client.Close() 能正确停止所有组件
  • 重复调用 Close() 不会 panic
  • 组件停止后 goroutine 正确退出

5. 代码注释格式

部分注释格式不一致,建议统一:

// InitSyncServerIPList 初始化同步服务器信息列表

🔍 其他建议

  1. 错误处理: component.StartRefreshConfig 如果 panic,可能导致程序崩溃,建议添加 recover
  2. 优雅关闭: 可以考虑添加超时机制,避免 Stop() 无限等待

总结

这是一个很好的 bug 修复,解决了实际的资源泄漏问题。主要需要:

  • 必须添加: ConfigComponent.Stop() 方法
  • 强烈建议: 添加并发安全保护(sync.Once)
  • 建议: 添加测试用例

修复第 1、2 点后可以合并。

@Epiphany523
Copy link
Author

本次的修改变更

  1. ConfigComponent 已经存在了Stop(), 同时新增了sync.Once保护,避免重复关闭时出现panic
func (c *ConfigComponent) Stop() {
	c.stopOnce.Do(func() {
		if c.stopCh != nil {
			close(c.stopCh)
		}
	})
}
  1. SyncServerIPListComponent.Stop() 使用 sync.Once保护
func (s *SyncServerIPListComponent) Stop() {
	s.stopOnce.Do(func() {
		if s.stopCh != nil {
			close(s.stopCh)
		}
	})
}
  1. component.StartRefreshConfig 新增 recover 处理

  2. 新增了测试用例验证组件的启动和关闭,验证Stop()方法的重复执行

  3. 移除掉 SetAppConfig 和 SetCache 方法和其测试用例

远景

合并后,是否可以评估出一个修复该问题的版本的时间

@nobodyiam
Copy link
Member

感谢更新!对照上次建议,以下已完成:

  • ConfigComponent / SyncServerIPListComponent 的 Stop 增加 sync.Once
  • StartRefreshConfig 增加 recover
  • 新增 Close/Stop 相关测试
  • 组件初始化改为 NewConfigComponent

新增需要关注的问题:

  1. 兼容性回归InitSyncServerIPList 从“内部启动 goroutine”变为“仅返回组件”。如果外部仍按旧方式调用且忽略返回值,会导致 server list 不再刷新(静默回归)。建议保留旧行为并返回组件,或新增 New... 构造函数并保留旧 API。
  2. panic 日志格式recover 日志用 %s,panic 不是 string/error 时会丢信息,建议改 %v(可选加堆栈)。

关于 SetAppConfig/SetCache
不建议直接删除对外导出方法。更稳妥是保留并标记 Deprecated(提示使用 NewConfigComponent),避免破坏外部依赖;可在后续主版本再移除。

关于发布时间
合并后会纳入后续版本发布计划,但发布时间需要根据整体变更规模和验证情况再定。

若修复第 1 点和第 2 点,我倾向于可以合并。

@Epiphany523
Copy link
Author

本次修改变更

兼容性回归:

  1. InitSyncServerIPList 保留并标记为Deprecated,同时新增NewSyncServerIPListComponent供调用者使用

  2. SetAppConfig/SetCache 保留并标记 Deprecated

panic 日志格式

修改成%v格式输出recover信息

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a critical bug where the SyncServerIPListComponent was not properly stopped when Client.Close() was invoked. The PR refactors the component lifecycle management to ensure all components can be properly shut down.

Changes:

  • Added a Stop() method to the AbsComponent interface and implemented proper shutdown logic for both ConfigComponent and SyncServerIPListComponent
  • Refactored client to track all components in a slice and stop them all when Close() is called
  • Introduced constructor functions (NewConfigComponent, NewSyncServerIPListComponent) to ensure components are properly initialized with stop channels

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
client.go Changed from storing single configComponent to components slice; added appendComponent() method; updated Close() to stop all components
component/common.go Added Stop() method to AbsComponent interface; added panic recovery in StartRefreshConfig
component/notify/componet_notify.go Added stopCh, stopOnce fields; implemented Stop() method; added NewConfigComponent constructor; deprecated SetAppConfig and SetCache
component/serverlist/sync.go Added stopCh, stopOnce fields; implemented Stop() method; added NewSyncServerIPListComponent constructor; deprecated InitSyncServerIPList
component/notify/componet_notify_test.go Added test for Stop() method covering nil channel and duplicate stop scenarios
component/serverlist/sync_test.go Added test for Stop() method covering nil channel and duplicate stop scenarios
client_test.go Added test for Close() method verifying all components are stopped
component/common_test.go Updated testComponent to implement new Stop() method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 65 to 80
func (s *SyncServerIPListComponent) Start() {
SyncServerIPList(s.appConfig)
log.Debug("syncServerIpList started")
log.Debug("syncServerIpListComponent started")

t2 := time.NewTimer(refreshIPListInterval)
for {
select {
case <-s.stopCh:
log.Debug("syncServerIpListComponent stopped")
return
case <-t2.C:
SyncServerIPList(s.appConfig)
t2.Reset(refreshIPListInterval)
}
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Timer created in the Start method is not properly cleaned up when the component stops. When the stopCh receives a signal and the function returns, the timer t2 will continue to exist and consume resources. You should call t2.Stop() before returning to release the timer resources and prevent goroutine leaks.

Copilot uses AI. Check for mistakes.
Comment on lines 297 to 301
func (c *internalClient) Close() {
c.configComponent.Stop()
for _, comp := range c.components {
comp.Stop()
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Close method has a potential race condition. Multiple goroutines could call Close concurrently, and even though each component's Stop method is protected by sync.Once, the iteration over c.components is not thread-safe. If one goroutine is iterating through the components slice while another goroutine calls appendComponent, this could lead to undefined behavior. Consider adding a mutex to protect access to c.components or ensuring Close is only called once using a sync.Once pattern at the client level.

Copilot uses AI. Check for mistakes.
@@ -269,6 +269,10 @@ func (c *internalClient) getConfigValue(key string) interface{} {
return value
}

Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The appendComponent method has a potential race condition. It modifies the c.components slice without synchronization, which could lead to data races if called concurrently with Close or with other appendComponent calls. While in the current implementation all appendComponent calls happen during initialization before components start, this creates a fragile design. Consider adding mutex protection or documenting that this method must only be called during initialization.

Suggested change
// appendComponent adds a component to the internal list.
// It is not safe for concurrent use and must only be called during client
// initialization, before any components are started or Close may be invoked.

Copilot uses AI. Check for mistakes.
func StartRefreshConfig(component AbsComponent) {
defer func() {
if err := recover(); err != nil {
log.Errorf("StartRefreshConfig component start failed,recover panic: %v", err)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after comma in the log message. The message should be "StartRefreshConfig component start failed, recover panic: %v" instead of "StartRefreshConfig component start failed,recover panic: %v" for consistency with other error messages in the codebase. Looking at other log.Errorf calls in the codebase (e.g., component/common_test.go:152, component/serverlist/sync.go:129), there is a space after commas in error messages.

Suggested change
log.Errorf("StartRefreshConfig component start failed,recover panic: %v", err)
log.Errorf("StartRefreshConfig component start failed, recover panic: %v", err)

Copilot uses AI. Check for mistakes.
client_test.go Outdated
Assert(t, tc.status, Equal(0))
Assert(t, tc2.status, Equal(0))
c.Close()
c.Close() // replicate Close
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment uses "replicate" when it should be "duplicate" or "repeated". The word "replicate" means to reproduce or make a copy, while "duplicate" means to do something again. In this context, the test is calling Close twice to verify idempotency, so "duplicate Close" or "repeated Close" would be more appropriate.

Suggested change
c.Close() // replicate Close
c.Close() // duplicate Close

Copilot uses AI. Check for mistakes.
Comment on lines 64 to 82
@@ -66,14 +75,17 @@ loop:
}
t2.Reset(longPollInterval)
case <-c.stopCh:
break loop
log.Debug("ConfigComponent stopped")
return
}
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Timer created in the Start method is not properly cleaned up when the component stops. When the stopCh receives a signal and the function returns, the timer t2 will continue to exist and consume resources. You should call t2.Stop() before returning to release the timer resources and prevent goroutine leaks.

Copilot uses AI. Check for mistakes.
@nobodyiam
Copy link
Member

感谢补充修改。这里有两个可能影响兼容性的点,请结合 agollo 的实际使用场景判断一下:

  1. AbsComponent 是导出接口,这次新增 Stop() 可能会让下游实现方编译失败。**实际场景里是否有人实现该接口?**如果可能存在,建议考虑兼容方案(比如新增接口、或把它非导出化)。
  2. 旧用法 &ConfigComponent{} + SetAppConfig/SetCache + Start/Stop 现在不会初始化 stopChStop() 变成空操作,可能导致 goroutine 泄漏。**实际场景里是否有人还在用旧用法?**如果可能存在,建议在 setter 或 Start 里补 stopCh 初始化,或明确不再支持。

@Epiphany523
Copy link
Author

对于兼容性,本次修改的个人理解:(结合 agollo 的实际使用场景)

用户使用场景

用户使用agollo.Start() 或者 agollo.StartWithConfig(loadAppConfig func()) 获取client,然后调用client的导出方法进行获取apollo的配置信息

对于AbsComponent定位

  1. AbsComponent主要用于组件任务的抽象,比如: ConfigComponent 定时拉取apollo配置并缓存、SyncServerIPListComponent 定时更新IP列表,这些组件都是服务于agollo
  2. agollo也没有为用户提供AbsComponent实现者注入到agollo中的钩子
  3. 虽然AbsComponent是导出接口,但用户自定义实现该接口概率几乎为零,无应用场景

旧用法 &ConfigComponent{} + SetAppConfig/SetCache + Start/Stop

旧用法虽然会暴露给用户,但是用户没有使用场景

本次变更

NewConfigComponent的返回类型改成component.AbsComponent

@nobodyiam
Copy link
Member

nobodyiam commented Feb 7, 2026

感谢这轮更新,Close()/Stop() 生命周期整体已经更完整了 👍

这里我建议本 PR 内再补一个点(偏 must-fix):

  • 现在组件新增了 stop 提前退出路径后,Start() 里的 timer 建议同步做显式清理,保证资源释放语义完整:
    • component/notify/componet_notify.go:66
    • component/serverlist/sync.go:69
  • 建议两处都改为:t2 := time.NewTimer(...); defer t2.Stop()
  • timer 虽然是历史就存在的,但以前没有 stop 退出分支;现在既然支持停止,建议在退出时一并释放 timer 资源,避免留下不必要的运行时对象。

另外,在前述讨论里我看到两个兼容性问题:

  1. AbsComponent 作为导出接口新增了 Stop()component/common.go:23),可能对潜在外部实现方产生编译影响;
  2. 旧用法 &ConfigComponent{} + SetAppConfig/SetCache + Start/Stop 现在不会初始化 stopChStop() 可能变成空操作(component/notify/componet_notify.go:52component/notify/componet_notify.go:60component/notify/componet_notify.go:65)。

这两点虽然是问题,但是否需要在本 PR 内处理,还是要结合 agollo 的实际使用场景和兼容策略来判断。@zouyx 能否也帮忙确认下这两点是否 OK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants