-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
On macOS, workspaces located on cloud-synced volumes (Dropbox, iCloud Drive, Google Drive, OneDrive) can cause shell commands to fail with `EINTR` (Interrupted system call) errors. When this happens, the agent may retry the failing command repeatedly, causing long turn times and no useful output.
This issue does not affect Linux or Windows, where cloud sync tools use different mechanisms.
Root Cause
macOS implements cloud storage through File Provider extensions — a kernel-level virtual filesystem that fetches files on-demand from the cloud. When a shell command runs with its working directory set to a cloud-backed path:
- Bash calls `getcwd()` during shell initialization
- The File Provider may be hydrating (downloading) directory metadata
- macOS delivers an `EINTR` signal to the process during hydration
- Unlike Linux's glibc, macOS does not automatically retry `getcwd()` on `EINTR`
- Bash fails with: `shell-init: error retrieving current directory: getcwd: cannot access parent directories: Interrupted system call`
The agent sees this as a tool error and reasonably retries — but retrying doesn't help because the underlying filesystem behavior is non-deterministic.
Affected Paths
Any workspace under macOS cloud-managed directories:
| Provider | Typical Path |
|---|---|
| Dropbox | `~/Library/CloudStorage/Dropbox-*/...` |
| iCloud Drive | ` |
| Google Drive | `~/Library/CloudStorage/GoogleDrive-*/...` |
| OneDrive | `~/Library/CloudStorage/OneDrive-*/...` |
Symptoms
- `list_dir` times out at 1500ms
- `exec` commands take 20-50+ seconds then fail
- Stderr contains: `Interrupted system call` or `getcwd: cannot access parent directories`
- Agent loops through 10+ iterations retrying filesystem operations
- User sees long delays with no response
Workarounds for Users
Option 1: Use a local workspace (Recommended)
Move or create your workspace on local storage:
```bash
Instead of:
workspace: ~/Library/CloudStorage/Dropbox/my-project
Use:
workspace: ~/sciclaw/my-project
```
You can still keep files synced by using symlinks or manual sync.
Option 2: Pre-hydrate the directory
Open the cloud folder in Finder before using sciClaw. This forces macOS to download the files, reducing (but not eliminating) EINTR occurrences.
Option 3: Use Linux for production
Deploy sciClaw on a Linux server where Dropbox/cloud sync uses a traditional daemon-based approach with real local files. This completely avoids the File Provider kernel extension issue.
Mitigations Implemented
1. Filesystem tool timeouts (v0.1.66-dev)
`read_file` and `list_dir` now have a 1500ms timeout with fail-open behavior. This prevents indefinite hangs but doesn't fix the underlying EINTR issue.
2. Exec diagnostics (v0.1.66-dev)
Stage-level timing in `exec` logs helps identify when the stall is in process wait (EINTR) vs. path validation:
```
tool.exec: cwd_validate_ms=0 guard_ms=0 start_ms=4 wait_ms=26900
```
Proposed Additional Fixes
-
EINTR error classification: Detect `Interrupted system call` in exec stderr and classify as a terminal (non-retryable) filesystem error
-
Circuit breaker: After 3 consecutive tool failures with filesystem errors, inject a system message telling the LLM to stop retrying
-
Iteration cap: Set a default `max_tool_iterations` (e.g., 25) to bound worst-case turn duration
-
User-friendly fallback: When filesystem degradation is detected, return a clear message:
"This workspace is on a cloud-synced volume that is currently unavailable. Please try again or use a local workspace."
Why Linux Doesn't Have This Problem
On Linux, cloud sync tools (Dropbox, rclone, etc.) typically:
- Run as a userspace daemon
- Sync files to real local storage
- Don't use kernel-level virtual filesystems
The files are normal local files — `getcwd()` always works, no EINTR signals are delivered.
References
- Apple File Provider documentation
- EINTR and system call interruption
- Bash source: `shell.c` — `getcwd()` failure handling
Definition of Done
- EINTR errors classified as terminal (non-retryable)
- Circuit breaker prevents infinite retry loops
- Default iteration cap configured
- User receives actionable error message
- Integration test verifies fail-fast behavior
- Documentation updated with cloud storage guidance