feat: add unanswered Slack mention detection to heartbeat#186
feat: add unanswered Slack mention detection to heartbeat#186vltbaudbot wants to merge 2 commits intomodem-dev:mainfrom
Conversation
Adds periodic checking for Slack app_mention events that didn't receive a reply, helping catch messages lost during agent restarts. Changes: - Add checkUnansweredMentions() function that scans bridge log for recent app_mention events (last hour, older than 5 min) - Add hasRepliedToThread() helper that checks bridge logs and control-agent session logs for evidence of replies - Integrate check into main heartbeat flow (runs every 10 min by default) - Add HEARTBEAT_CHECK_UNANSWERED_MENTIONS env var (enabled by default) - Add UNANSWERED_MENTION_THRESHOLD_MS constant (5 min grace period) - Update heartbeat config output to show new settings This addresses the issue where messages forwarded to dead sockets during restarts are lost because the bridge acks them to Slack before confirming agent receipt. The heartbeat will now alert the control-agent about unanswered mentions so it can follow up. Fixes dropped messages during agent restart windows.
The initial implementation checked for any occurrence of thread_ts in session logs, which would match both inbound mentions AND outbound replies. This led to false positives. Improvements: 1. Add support for slack-reply-log.jsonl (most reliable source) 2. Check for thread_ts in outbound context specifically (JSON keys in curl commands or send_to_session calls) 3. Use proper session subdirectory path 4. Add multiple JSON formatting variations to catch escaped quotes 5. Add documentation for future reaction-based checking This makes the check more precise - only counting actual replies, not just seeing the mention.
Greptile SummaryAdded periodic unanswered Slack mention detection to the heartbeat extension to catch messages lost during agent restarts. The implementation scans bridge logs for Key changes:
Issues found:
Confidence Score: 4/5
Important Files Changed
Last reviewed commit: ed2c48d |
| * HEARTBEAT_INTERVAL_MS — interval between heartbeats (default: 600000 = 10 min) | ||
| * HEARTBEAT_ENABLED — set to "0" or "false" to disable (default: enabled) | ||
| * HEARTBEAT_EXPECTED_SESSIONS — comma-separated session aliases to check (default: "sentry-agent") | ||
| * HEARTBEAT_CHECK_UNANSWERED_MENTIONS — set to "1" or "true" to enable (default: enabled) |
There was a problem hiding this comment.
Comment is misleading - should say "enabled by default, set to 0/false/no to disable"
| * HEARTBEAT_CHECK_UNANSWERED_MENTIONS — set to "1" or "true" to enable (default: enabled) | |
| * HEARTBEAT_CHECK_UNANSWERED_MENTIONS — enabled by default, set to "0" or "false" to disable |
Prompt To Fix With AI
This is a comment left during a code review.
Path: pi/extensions/heartbeat.ts
Line: 18
Comment:
Comment is misleading - should say "enabled by default, set to 0/false/no to disable"
```suggestion
* HEARTBEAT_CHECK_UNANSWERED_MENTIONS — enabled by default, set to "0" or "false" to disable
```
How can I resolve this? If you propose a fix, please make it concise.| try { | ||
| const content = readFileSync(join(controlAgentSessionDir, file), "utf-8"); | ||
| // Look for evidence of a reply: the thread_ts appearing in a curl /send command | ||
| // or in a send_to_session message. The thread_ts in an outbound context |
There was a problem hiding this comment.
Simple string search could have false positives if threadTs appears in error messages or logs without being an actual reply. Consider checking for more specific patterns like the thread being mentioned in a tool call or message send context.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: pi/extensions/heartbeat.ts
Line: 411
Comment:
Simple string search could have false positives if `threadTs` appears in error messages or logs without being an actual reply. Consider checking for more specific patterns like the thread being mentioned in a tool call or message send context.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| const worktreeResults = checkWorktrees(); | ||
| const stuckTodoResults = checkStuckTodos(); | ||
| const unansweredMentionResults = isUnansweredMentionsCheckEnabled() | ||
| ? checkUnansweredMentions() | ||
| : []; | ||
|
|
||
| const allResults: CheckResult[] = [ |
There was a problem hiding this comment.
Bug: The hasRepliedToThread() function incorrectly identifies inbound messages as replies by using a broad string search for thread_ts in session logs, leading to false negatives.
Severity: HIGH
Suggested Fix
Refine the search logic in hasRepliedToThread() to differentiate between inbound and outbound messages. Instead of a simple string search, parse the session log entries and only check for thread_ts within the context of an outbound action, such as a send_to_session tool call or a curl command to the Slack API. This ensures that only actual replies are counted.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: pi/extensions/heartbeat.ts#L413-L416
Potential issue: The `hasRepliedToThread()` function checks for replies by searching for
the string `"thread_ts":"<threadTs>"` anywhere in the session log file. However, inbound
Slack messages are logged with their metadata, which includes the `thread_ts`. If the
agent processes an inbound mention and logs it but then crashes before sending a reply,
the function will find the `thread_ts` in the logged inbound message and incorrectly
conclude that a reply was sent. This defeats the purpose of the unanswered mention
detection feature, as lost messages will go undetected.
Did we get this right? 👍 / 👎 to inform future reviews.
Summary
Adds periodic checking for Slack app_mention events that didn't receive a reply, helping catch messages lost during agent restarts.
Problem
During control-agent restarts, messages forwarded to dead sockets are lost forever because the Slack bridge acks them before confirming agent receipt. This creates a window where user messages can be silently dropped.
Example incident: On 2026-02-27, 3 messages were lost during a 45-minute restart window (05:03-05:48 UTC).
Solution
This PR adds an
unanswered mentioncheck to the heartbeat extension that:Changes
checkUnansweredMentions()functionhasRepliedToThread()helper for reply detectionHEARTBEAT_CHECK_UNANSWERED_MENTIONSenv var (enabled by default)Testing
Tested locally during incident recovery - successfully detected the 3 lost messages from the restart.
Configuration
Follow-up Work
This is a detection/monitoring solution. Long-term reliability improvements would require changes to the bridge architecture (two-phase ack, persistent queue, etc) and potentially the broker service itself.