Skip to content

fix: Add timeout to response thread join in BrokerRequestResponse#127

Open
mo-radwan1 wants to merge 3 commits intomainfrom
mradwan/fix-response-thread-join-timeout
Open

fix: Add timeout to response thread join in BrokerRequestResponse#127
mo-radwan1 wants to merge 3 commits intomainfrom
mradwan/fix-response-thread-join-timeout

Conversation

@mo-radwan1
Copy link
Collaborator

🧩 Complexity Level: 🟢 Low

⏱️ Estimated Review Time: 5 minutes

What is the purpose of this change?

Fix a process hang issue during cleanup when the response thread in BrokerRequestResponse.stop_component() doesn't exit cleanly.

The response_thread.join() call had no timeout, which could cause the process to hang indefinitely during component cleanup. This was observed during test fixture teardown in downstream projects, requiring external timeouts to kill the test process.

How is this accomplished?

  • Added a 2.0 second timeout to response_thread.join(timeout=2.0)
  • Added a warning log message if the thread doesn't exit within the timeout
  • The thread is already created with daemon=True, so it will be cleaned up on process exit even if it doesn't respond to the stop signal

The 2.0 second timeout is chosen to be longer than the 1.0 second receive_message timeout in handle_responses(), giving the thread enough time to notice the stop signal and exit gracefully.

Anything reviewers should focus on/be aware of?

  • This is a minimal, low-risk change that follows the existing pattern in the codebase (e.g., solace_ai_connector.py:698 uses thread.join(timeout=1.0))
  • The response thread is already a daemon thread, so this change just prevents the join() call from blocking indefinitely
  • No functional behavior changes - only affects cleanup timing

The stop_component() method was calling response_thread.join() without
a timeout, which could cause the process to hang indefinitely during
cleanup if the response thread doesn't exit cleanly.

This was causing test processes to hang during fixture teardown,
requiring external timeouts to kill the process.

Changes:
- Added 2.0 second timeout to response_thread.join()
- Added warning log if thread doesn't exit within timeout
- Thread is already a daemon thread, so it will be cleaned up on
  process exit if it doesn't respond to the stop signal
Adds tests to verify:
- Thread.join(timeout) works correctly for normal exits
- Thread.join(timeout) returns after timeout for slow threads
- stop_component uses join with 2.0s timeout
- Warning is logged when thread doesn't exit in time
- Daemon threads don't prevent process exit
Add 2-second timeouts to thread.join() calls in multiple locations:

- TimerManager.stop(): timer thread join
- CacheService.stop(): expiry thread join
- ComponentBase.run(): monitoring and connection status thread joins
- Flow.wait_for_threads(): flow thread joins

All these threads are daemon threads, so they won't prevent process
exit if the join times out. Added warning logs for TimerManager and
CacheService when threads don't exit within the timeout.

This completes the fix started in the previous commit that added
timeout to BrokerRequestResponse.stop_component().
@sonarqube-solacecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants