Skip to content

bug(exec): macOS Cloud Storage volumes (Dropbox/iCloud/Google Drive) cause EINTR failures and retry loops #84

@drpedapati

Description

@drpedapati

Summary

On macOS, workspaces located on cloud-synced volumes (Dropbox, iCloud Drive, Google Drive, OneDrive) can cause shell commands to fail with `EINTR` (Interrupted system call) errors. When this happens, the agent may retry the failing command repeatedly, causing long turn times and no useful output.

This issue does not affect Linux or Windows, where cloud sync tools use different mechanisms.

Root Cause

macOS implements cloud storage through File Provider extensions — a kernel-level virtual filesystem that fetches files on-demand from the cloud. When a shell command runs with its working directory set to a cloud-backed path:

  1. Bash calls `getcwd()` during shell initialization
  2. The File Provider may be hydrating (downloading) directory metadata
  3. macOS delivers an `EINTR` signal to the process during hydration
  4. Unlike Linux's glibc, macOS does not automatically retry `getcwd()` on `EINTR`
  5. Bash fails with: `shell-init: error retrieving current directory: getcwd: cannot access parent directories: Interrupted system call`

The agent sees this as a tool error and reasonably retries — but retrying doesn't help because the underlying filesystem behavior is non-deterministic.

Affected Paths

Any workspace under macOS cloud-managed directories:

Provider Typical Path
Dropbox `~/Library/CloudStorage/Dropbox-*/...`
iCloud Drive `/Library/Mobile Documents/comapple~CloudDocs/...`
Google Drive `~/Library/CloudStorage/GoogleDrive-*/...`
OneDrive `~/Library/CloudStorage/OneDrive-*/...`

Symptoms

  • `list_dir` times out at 1500ms
  • `exec` commands take 20-50+ seconds then fail
  • Stderr contains: `Interrupted system call` or `getcwd: cannot access parent directories`
  • Agent loops through 10+ iterations retrying filesystem operations
  • User sees long delays with no response

Workarounds for Users

Option 1: Use a local workspace (Recommended)

Move or create your workspace on local storage:

```bash

Instead of:

workspace: ~/Library/CloudStorage/Dropbox/my-project

Use:

workspace: ~/sciclaw/my-project
```

You can still keep files synced by using symlinks or manual sync.

Option 2: Pre-hydrate the directory

Open the cloud folder in Finder before using sciClaw. This forces macOS to download the files, reducing (but not eliminating) EINTR occurrences.

Option 3: Use Linux for production

Deploy sciClaw on a Linux server where Dropbox/cloud sync uses a traditional daemon-based approach with real local files. This completely avoids the File Provider kernel extension issue.

Mitigations Implemented

1. Filesystem tool timeouts (v0.1.66-dev)

`read_file` and `list_dir` now have a 1500ms timeout with fail-open behavior. This prevents indefinite hangs but doesn't fix the underlying EINTR issue.

2. Exec diagnostics (v0.1.66-dev)

Stage-level timing in `exec` logs helps identify when the stall is in process wait (EINTR) vs. path validation:

```
tool.exec: cwd_validate_ms=0 guard_ms=0 start_ms=4 wait_ms=26900
```

Proposed Additional Fixes

  1. EINTR error classification: Detect `Interrupted system call` in exec stderr and classify as a terminal (non-retryable) filesystem error

  2. Circuit breaker: After 3 consecutive tool failures with filesystem errors, inject a system message telling the LLM to stop retrying

  3. Iteration cap: Set a default `max_tool_iterations` (e.g., 25) to bound worst-case turn duration

  4. User-friendly fallback: When filesystem degradation is detected, return a clear message:

    "This workspace is on a cloud-synced volume that is currently unavailable. Please try again or use a local workspace."

Why Linux Doesn't Have This Problem

On Linux, cloud sync tools (Dropbox, rclone, etc.) typically:

  • Run as a userspace daemon
  • Sync files to real local storage
  • Don't use kernel-level virtual filesystems

The files are normal local files — `getcwd()` always works, no EINTR signals are delivered.

References

Definition of Done

  • EINTR errors classified as terminal (non-retryable)
  • Circuit breaker prevents infinite retry loops
  • Default iteration cap configured
  • User receives actionable error message
  • Integration test verifies fail-fast behavior
  • Documentation updated with cloud storage guidance

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:highHigh priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions