-
Notifications
You must be signed in to change notification settings - Fork 749
More Durable Shell Bug Fixes #2822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sawka
commented
Feb 4, 2026
- Lots of work on ResyncController, issues with stale data, and connection switching
- Add+Fix termination messages for shell controller
- Prune done/detached jobs on a timer
- Remove circular dependencies from wcore, use blockclose event in both jobcontroller and blockcontroller instead of direct calls
- Fix concurrency bugs with job termination
- Send object update events when modifying the block datastructure in jobcontroller
…ecially when updating connection
…ng blocks in jobcontroller
WalkthroughThis pull request introduces durability configuration tracking for block shells, automatic job lifecycle management, and refactors signal/process utilities into a centralized package. On the backend, it adds block controller initialization during server startup, implements block-close event handling and automatic job pruning, extends the durable shell controller with a connection name field and forced-restart capability, and consolidates Unix-specific signal and process utilities into a new Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/app/view/term/term-model.ts (1)
1143-1163:⚠️ Potential issue | 🟡 MinorMissing
fireAndForgetwrapper for async menu click handlers.The
restartSessionWithDurabilitymethod is async, but the click handlers at lines 1150 and 1160 call it directly without wrapping infireAndForget(). This is inconsistent with other async click handlers in this file (e.g., lines 316, 700, 1045) and could result in unhandled promise rejections.🔧 Proposed fix
{ label: "Restart Session in Standard Mode", - click: () => this.restartSessionWithDurability(false), + click: () => fireAndForget(() => this.restartSessionWithDurability(false)), },{ label: "Restart Session in Durable Mode", - click: () => this.restartSessionWithDurability(true), + click: () => fireAndForget(() => this.restartSessionWithDurability(true)), },
🧹 Nitpick comments (6)
pkg/util/utilfn/utilfn.go (1)
592-604: Clarify: duplicates froms2are preserved in the result.The function name implies set semantics, but if
s2contains duplicate elements, they will appear multiple times in the result. This differs fromCombineStrArrays(which deduplicates) and may surprise callers expecting true set behavior.If set semantics are intended, consider deduplicating:
♻️ Optional fix to deduplicate results
func StrSetIntersection(s1 []string, s2 []string) []string { set := make(map[string]bool) for _, s := range s1 { set[s] = true } var rtn []string for _, s := range s2 { - if set[s] { + if set[s] { + delete(set, s) // ensure each element appears only once rtn = append(rtn, s) } } return rtn }If preserving
s2's duplicates is intentional (e.g., for counting or ordering purposes), consider renaming to clarify behavior (e.g.,StrFilterBySet).pkg/blockcontroller/shellcontroller.go (1)
600-615: Consider avoiding duplicate exit notifications for cmd blocks.
The pty read loop already appends “process finished …” for cmd blocks; this new muted “[command exited …]” can show twice. If that’s not desired, consider gating one of them.pkg/blockcontroller/blockcontroller.go (2)
122-129: Log subscription failures to avoid silent BlockClose misses.
Right now, a subscription error would fail silently. Consider logging the error.Proposed change
func InitBlockController() { rpcClient := wshclient.GetBareRpcClient() rpcClient.EventListener.On(wps.Event_BlockClose, handleBlockCloseEvent) - wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{ + if err := wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{ Event: wps.Event_BlockClose, AllScopes: true, - }, nil) + }, nil); err != nil { + log.Printf("[blockcontroller] error subscribing to blockclose: %v", err) + } }
153-167: Avoid destroying controllers when conn name is still unknown.
WhenShellProcConnNameis empty (e.g., before the proc starts), the current check can trigger unnecessary destroy/recreate cycles. Consider guarding on a non-empty runtime conn name.Proposed change
if existing != nil { existingStatus := existing.GetRuntimeStatus() - if existingStatus.ShellProcConnName != connName { + if existingStatus.ShellProcConnName != "" && existingStatus.ShellProcConnName != connName { log.Printf("stopping blockcontroller %s due to conn change (from %q to %q)\n", blockId, existingStatus.ShellProcConnName, connName) DestroyBlockController(blockId) time.Sleep(100 * time.Millisecond) existing = nil } }pkg/jobcontroller/jobcontroller.go (2)
290-332: Re-validate job state before delete to avoid race window.
A job could reattach between scan and delete; a quick re-check avoids accidental deletes.Proposed change
for _, jobId := range jobsToDelete { - err := DeleteJob(ctx, jobId) + job, err := wstore.DBGet[*waveobj.Job](ctx, jobId) + if err != nil || job == nil { + if err != nil { + log.Printf("[jobpruner] error reloading job %s: %v", jobId, err) + } + continue + } + if job.JobManagerStatus != JobManagerStatus_Done || job.AttachedBlockId != "" { + continue + } + err = DeleteJob(ctx, jobId) if err != nil { log.Printf("[jobpruner] error deleting job %s: %v", jobId, err) } }
394-419: Consider offloading BlockClose termination to avoid blocking the event loop.
Termination can be slow; running it asynchronously keeps the event handler responsive.Proposed change
func handleBlockCloseEvent(event *wps.WaveEvent) { - ctx, cancelFn := context.WithTimeout(context.Background(), 5*time.Second) - defer cancelFn() blockId, ok := event.Data.(string) if !ok { log.Printf("[blockclose] invalid event data type") return } - jobIds, err := wstore.WithTxRtn(ctx, func(tx *wstore.TxWrap) ([]string, error) { - query := `SELECT oid FROM db_job WHERE json_extract(data, '$.attachedblockid') = ?` - jobIds := tx.SelectStrings(query, blockId) - return jobIds, nil - }) - if err != nil { - log.Printf("[block:%s] error looking up jobids: %v", blockId, err) - return - } - if len(jobIds) == 0 { - return - } - - for _, jobId := range jobIds { - TerminateAndDetachJob(ctx, jobId) - } + go func(blockId string) { + ctx, cancelFn := context.WithTimeout(context.Background(), 5*time.Second) + defer cancelFn() + jobIds, err := wstore.WithTxRtn(ctx, func(tx *wstore.TxWrap) ([]string, error) { + query := `SELECT oid FROM db_job WHERE json_extract(data, '$.attachedblockid') = ?` + jobIds := tx.SelectStrings(query, blockId) + return jobIds, nil + }) + if err != nil { + log.Printf("[block:%s] error looking up jobids: %v", blockId, err) + return + } + for _, jobId := range jobIds { + TerminateAndDetachJob(ctx, jobId) + } + }(blockId) }