feat(reconnection): add slow probe mode after max attempts by Milofax · Pull Request #94 · chris-schra/mcp-funnel

Milofax · 2026-01-16T21:29:18Z

Summary

After max reconnection attempts are reached, the proxy now enters "slow probe mode" instead of giving up
Periodically attempts to reconnect every 60 seconds (circuit breaker pattern best practice)
Proper cleanup on shutdown and on successful reconnection
Fixed: Transport cache is cleared before each reconnect attempt to ensure fresh transport creation

Motivation

When actively developing local MCP servers (e.g., custom integrations, Docker-based services), frequent restarts are common during development. Currently, when a backend MCP server restarts, mcp-funnel reaches max reconnection attempts and gives up completely - requiring users to restart their entire Claude Code / mcp-funnel session just to reconnect.

This is especially frustrating when:

Developing and testing MCP servers locally
Running MCP servers in Docker containers that restart on code changes
Managing multiple MCP servers where one might temporarily be down

Problem

When a backend MCP server restarts (e.g., Docker container restart), the proxy eventually reaches max reconnection attempts and gives up completely. Users then need to restart the entire mcp-funnel process to reconnect.

Solution

Implements the "slow probe" phase of the circuit breaker pattern:

After exponential backoff exhausts max attempts, enter slow probe mode
Attempt reconnection every 60 seconds
Clear transport cache before each attempt (critical fix - closed transports were being reused)
On success, stop probing and resume normal operation
Emit events for monitoring (server.slow_probe_started, recovery logs)

This follows industry best practices (AWS, Microsoft) for resilient service connections - never give up entirely, just back off to a slower probe interval.

Changes

server-connection-manager.ts:

Add slowProbeTimers Map and SLOW_PROBE_INTERVAL_MS constant (60s)
Add startSlowProbeMode() and stopSlowProbeMode() private methods
Modify onMaxAttemptsReached callback to call startSlowProbeMode instead of just deleting the manager
Update shutdown() to clean up slow probe timers
Update connectToSingleServer() to stop slow probe mode on successful connection
Call clearTransportCache() before each reconnection attempt

transport-cache.ts:

Add removeCachedTransport() helper function
Add invalidateCachedTransport() helper function

transport/index.ts:

Export new cache management functions

Test plan

Unit tests for slow probe mode start/stop (9 tests passing)
E2E test: Graphiti Docker container restart → automatic recovery via slow probe ✅

🤖 Generated with Claude Code

The README.md files were creating a loop: - /README.md → packages/mcp/README.md - /packages/mcp/README.md → ../../README.md This caused npm to fail with ELOOP when spawning MCP servers via npx. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Instead of giving up when max reconnection attempts are reached, the proxy now enters "slow probe mode" - periodically attempting to reconnect every 60 seconds. This follows the circuit breaker pattern best practice. Changes: - Add slowProbeTimers Map and SLOW_PROBE_INTERVAL_MS constant - Add startSlowProbeMode() and stopSlowProbeMode() methods - Modify onMaxAttemptsReached to enter slow probe mode - Update shutdown() to clean up slow probe timers - Stop slow probe mode on successful connection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Tests cover: - Slow probe mode lifecycle (start, stop, prevent duplicates) - Reconnection attempts every 60 seconds - Successful reconnection stops probing - Shutdown cleanup of all timers - Manual disconnect request handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The root cause of failed reconnections was transport caching - closed transports were being reused instead of creating fresh ones. Changes: - Add clearTransportCache() call before each slow probe reconnect attempt - Add invalidateCachedTransport() and removeCachedTransport() helpers - Export cache management functions from transport index - Update tests to mock clearTransportCache This fix was verified with E2E testing: Graphiti container restart now successfully auto-reconnects via slow probe mode. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Problem: Claude Code starts new MCP-Funnel processes for each session but doesn't always send clean shutdown signals. This leads to zombie processes accumulating (10+ processes, 1.2GB RAM observed). Solution: Implement PID-file based singleton pattern: - Check ~/.mcp-funnel.pid on startup - Kill stale process if still running (SIGTERM then SIGKILL) - Write current PID to file - Clean up PID file on shutdown This ensures only one MCP-Funnel instance runs at a time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Milofax · 2026-01-26T14:18:21Z

Additional Fix: Singleton Enforcement (Commit `74049f3`)

Added PID-file based singleton pattern to prevent zombie processes.

Problem

Claude Code starts new MCP-Funnel processes per session but doesn't always send clean shutdown signals. This leads to zombie processes accumulating (observed: 10+ processes, ~1.2GB RAM).

Solution

Check ~/.mcp-funnel.pid on startup
Kill stale process if still running
Write current PID to file
Clean up on shutdown

Background

This is a known MCP ecosystem issue: typescript-sdk#208

The VS Code/LSP approach passes host PID for periodic liveness checks. Since Claude Code doesn't support this, PID-file singleton is a practical workaround.

Previous singleton approach killed ALL other mcp-funnel processes, breaking multi-terminal setups where each Claude Code session needs its own mcp-funnel. New approach: Only kill processes whose parent (Claude Code) no longer exists. This cleans up true zombies while keeping active sessions alive. - Add isOrphanedProcess() to check if parent PID exists - Rename enforceSignleton() to cleanupOrphanedProcesses() - Log when skipping processes with active parents Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chris-schra · 2026-03-02T21:18:52Z

thanks! Sorry, was totally busy - still interested in merging this?

Milofax · 2026-03-04T21:34:11Z

Yes, absolutely still interested! Ready to merge whenever you are. Let me know if you need any changes.

Milofax and others added 6 commits January 4, 2026 09:27

Merge remote-tracking branch 'upstream/develop' into develop

c9baa04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reconnection): add slow probe mode after max attempts#94

feat(reconnection): add slow probe mode after max attempts#94
Milofax wants to merge 7 commits intochris-schra:developfrom
Milofax:feature/auto-reconnect-probe-mode

Milofax commented Jan 16, 2026 •

edited

Loading

Uh oh!

Milofax commented Jan 26, 2026

Uh oh!

chris-schra commented Mar 2, 2026

Uh oh!

Milofax commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Milofax commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Problem

Solution

Changes

Test plan

Uh oh!

Milofax commented Jan 26, 2026

Additional Fix: Singleton Enforcement (Commit 74049f3)

Problem

Solution

Background

Uh oh!

chris-schra commented Mar 2, 2026

Uh oh!

Milofax commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Milofax commented Jan 16, 2026 •

edited

Loading

Additional Fix: Singleton Enforcement (Commit `74049f3`)