Skip to content

Conversation

@EivMeyer
Copy link
Collaborator

Problem

Zombie zeroshot processes remained after heroshot shipped PRs, blocking new agent spawns.

Root Cause

When CLUSTER_COMPLETE triggers orchestrator.stop(), the cluster state changes to 'stopped' but no SIGTERM is sent. The cleanup handlers registered via process.on('SIGTERM') are never triggered.

Fix

Added polling in setupDaemonCleanup() to detect when cluster state changes to 'stopped' or 'completed', then exit the process.

Testing

The fix was triggered by a heroshot run where 3 zombie processes from already-shipped items (#1159, #1168) were blocking new spawns.

tomdps and others added 2 commits January 27, 2026 10:01
Release dev to main

---------

Co-authored-by: Eivind Meyer <eiv.meyer@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Eivind Meyer <eivind.meyer@ksat.no>
Co-authored-by: Michael Eichelbeck <141341133+mkceichelbeck@users.noreply.github.com>
Co-authored-by: Michael Eichelbeck <michael.eichelbeck.ext@wtsde.onmicrosoft.de>
CLUSTER_COMPLETE triggers orchestrator.stop() which sets state to 'stopped',
but no SIGTERM was sent to trigger the cleanup handlers.

Added polling in setupDaemonCleanup to detect state change and exit.

Fixes zombie zeroshot processes in heroshot runs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants