-
Notifications
You must be signed in to change notification settings - Fork 45
Closed
Description
Bug Description
File: pdd/sync_determine_operation.py
Line: 183
Type: Resource Leak
The SyncLock.acquire() method opens a file without guaranteeing it will be closed if certain exceptions occur, leading to file descriptor leaks in long-running processes.
Location
# pdd/sync_determine_operation.py:183-200
def acquire(self):
try:
# ... setup code ...
# Line 183: File opened, FD assigned
self.fd = open(self.lock_file, 'w') # ← PROBLEM: No guaranteed cleanup
# Lines 185-194: Operations that can raise various exceptions
if HAS_FCNTL:
fcntl.flock(self.fd.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
elif HAS_MSVCRT:
msvcrt.locking(self.fd.fileno(), msvcrt.LK_NBLCK, 1)
self.fd.write(str(self.current_pid))
self.fd.flush()
except (IOError, OSError) as e: # ← Only catches IOError and OSError!
if self.fd:
self.fd.close()
self.fd = None
raise TimeoutError(f"Failed to acquire lock: {e}")Root Cause
The file is opened at line 183, but the exception handler only catches (IOError, OSError). If any other exception occurs (e.g., KeyboardInterrupt, RuntimeError, MemoryError), the file descriptor is never closed.
Impact
When Does This Leak Occur?
- User presses Ctrl+C during lock acquisition
- Unexpected exceptions (not IOError/OSError) during fcntl.flock()
- System interrupts during the lock acquisition process
- Out of memory or other system errors
Consequences
- File descriptors accumulate over time in long-running processes
- Eventually causes:
OSError: [Errno 24] Too many open files - Process crashes requiring manual restart
- Affects:
- PDD server running continuously
- Automated scripts with repeated sync operations
- CI/CD pipelines
- Users frequently interrupting operations
Proof of Concept
Automated Test Results
Created isolated test in bug_verification_workspace/simple_leak_test.py:
$ python3 simple_leak_test.py
======================================================================
SIMPLE FILE HANDLE LEAK TEST
======================================================================
Process PID: 9908
1. Initial state (no locks):
Open .lock files: 0
2. Normal lock acquisition and release:
✓ Lock acquired
Open .lock files: 1
✓ Lock released
Open .lock files after release: 0
3. Interrupted lock acquisition (simulating Ctrl+C):
Acquiring lock (will be interrupted)...
✓ KeyboardInterrupt occurred (as expected)
Open .lock files after interrupt: 1
🔴 LEAK DETECTED!
The following .lock files are still open:
Python 9908 9w REG .../test_interrupt_python.lock
======================================================================
RESULT: Bug confirmed - file handle leak exists
======================================================================
Manual Reproduction
import fcntl
from sync_determine_operation import SyncLock
# Simulate interrupt during lock acquisition
original_flock = fcntl.flock
fcntl.flock = lambda fd, op: (_ for _ in ()).throw(KeyboardInterrupt())
lock = SyncLock("test", "python")
try:
lock.acquire() # File opened at line 183, but never closed
except KeyboardInterrupt:
pass
# Check with: lsof -p <PID> | grep .lock
# The .lock file will still be open!Recommended Fix
Option 1: Try-Finally Block (Minimal Change)
def acquire(self):
self.lock_file.parent.mkdir(parents=True, exist_ok=True)
try:
# ... re-entrancy checks ...
self.lock_file.touch()
self.fd = open(self.lock_file, 'w')
try:
# Critical section - must close file if anything fails
if HAS_FCNTL:
fcntl.flock(self.fd.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
elif HAS_MSVCRT:
msvcrt.locking(self.fd.fileno(), msvcrt.LK_NBLCK, 1)
self.fd.write(str(self.current_pid))
self.fd.flush()
except:
# Close file on ANY exception (not just IOError/OSError)
if self.fd:
self.fd.close()
self.fd = None
raise
except (IOError, OSError) as e:
if self.fd:
self.fd.close()
self.fd = None
raise TimeoutError(f"Failed to acquire lock: {e}")Option 2: Context Manager (More Pythonic)
Use context manager for initial file operations, then duplicate FD:
def acquire(self):
# ... setup ...
self.lock_file.touch()
with open(self.lock_file, 'w') as temp_fd:
if HAS_FCNTL:
fcntl.flock(temp_fd.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
elif HAS_MSVCRT:
msvcrt.locking(temp_fd.fileno(), msvcrt.LK_NBLCK, 1)
temp_fd.write(str(self.current_pid))
temp_fd.flush()
# Duplicate FD to keep lock alive after context exits
self.fd = os.fdopen(os.dup(temp_fd.fileno()), 'w')Option 3: Broader Exception Handling
Change exception handler to catch all exceptions:
except Exception as e: # Instead of (IOError, OSError)
if self.fd:
self.fd.close()
self.fd = None
if isinstance(e, (IOError, OSError)):
raise TimeoutError(f"Failed to acquire lock: {e}")
raiseRelated Issues
This is similar to issues that occur when resources are not properly managed in exception scenarios:
- File handles left open
- Locks not released
- Memory not freed
Environment
- Confirmed on: macOS (Darwin)
- Detection method:
lsofshowing leaked file descriptors
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels