Skip to content

Conversation

@d4ndr4d3
Copy link
Contributor

@d4ndr4d3 d4ndr4d3 commented Feb 9, 2026

Summary

  • ResourceScheduler.allocate_resources() acquires self.lock (line 205) and then calls allocate_cpu(), allocate_memory(), and allocate_gpu(), each of which also attempt to acquire the same self.lock (lines 266, 305, 340).
  • With threading.Lock() (non-reentrant), this causes a deadlock whenever build_knowledge_base() triggers the pipeline resource allocation path.
  • Fix: change threading.Lock()threading.RLock() so the same thread can re-enter the lock without blocking itself.

Reproduction

from semantica.core import Semantica, ConfigManager

config_manager = ConfigManager()
config = config_manager.load_from_file("config.yaml")

framework = Semantica(config=config)
framework.initialize()

# This hangs indefinitely (deadlock in ResourceScheduler.allocate_cpu)
kb_result = framework.build_knowledge_base(
    sources=["welcome_docs/apple.txt"],
)

Change

One-line change in semantica/pipeline/resource_scheduler.py:

-        self.lock = threading.Lock()
+        self.lock = threading.RLock()

Test plan

  • Verified build_knowledge_base() completes without deadlock after fix
  • Full pipeline (ingest → parse → extract → build KG → visualize → export) runs successfully
  • Existing test suite passes

allocate_resources() acquires self.lock and then calls allocate_cpu(),
allocate_memory(), and allocate_gpu(), each of which also acquire
self.lock.  With a non-reentrant threading.Lock this causes a deadlock
whenever build_knowledge_base() triggers the pipeline resource
allocation path.

Switch to threading.RLock() so the same thread can re-enter the lock.

Co-authored-by: Cursor <cursoragent@cursor.com>
@qodo-code-review
Copy link

Review Summary by Qodo

Fix ResourceScheduler deadlock by using RLock instead of Lock

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Replace non-reentrant Lock with RLock to prevent deadlock
• Allows same thread to re-enter lock in nested calls
• Fixes deadlock in build_knowledge_base() pipeline execution
Diagram
flowchart LR
  A["allocate_resources<br/>acquires lock"] -- "calls" --> B["allocate_cpu<br/>tries to acquire lock"]
  A -- "calls" --> C["allocate_memory<br/>tries to acquire lock"]
  A -- "calls" --> D["allocate_gpu<br/>tries to acquire lock"]
  E["Lock<br/>non-reentrant"] -.->|"causes deadlock"| F["Thread blocks itself"]
  G["RLock<br/>reentrant"] -.->|"allows re-entry"| H["Thread proceeds"]
Loading

Grey Divider

File Changes

1. semantica/pipeline/resource_scheduler.py 🐞 Bug fix +1/-1

Replace Lock with RLock for reentrancy

• Changed self.lock initialization from threading.Lock() to threading.RLock()
• Added inline comment explaining RLock usage for nested lock acquisition
• Resolves deadlock when allocate_resources() calls allocate_cpu(), allocate_memory(), and
 allocate_gpu() which all attempt to acquire the same lock

semantica/pipeline/resource_scheduler.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Silent allocation failure 🐞 Bug ✓ Correctness
Description
• With the new RLock, allocate_resources no longer self-deadlocks and will proceed, but it can
  return partial/empty allocations when capacity is insufficient.
• ExecutionEngine does not validate allocations before running steps, so under load pipelines can
  execute without acquiring requested CPU/memory/GPU (oversubscription / broken scheduling semantics).
• allocate_resources still reports tracking status as "completed" even when required resources
  weren’t allocated, reducing operator visibility into the failure mode.
Code

semantica/pipeline/resource_scheduler.py[112]

+        self.lock = threading.RLock()  # RLock: allocate_resources holds lock and calls allocate_cpu/memory/gpu which also acquire it
Evidence
Now that RLock allows nested locking, allocate_resources will run and may return allocations missing
CPU/memory/GPU (because allocate_cpu/memory return None on insufficiency). ExecutionEngine
immediately executes the pipeline after allocate_resources without checking for missing allocations,
so the scheduler can no longer enforce capacity under contention.

semantica/pipeline/resource_scheduler.py[110-114]
semantica/pipeline/resource_scheduler.py[205-234]
semantica/pipeline/resource_scheduler.py[266-290]
semantica/pipeline/resource_scheduler.py[361-385]
semantica/pipeline/resource_scheduler.py[235-239]
semantica/pipeline/execution_engine.py[171-176]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ResourceScheduler.allocate_resources()` can return an empty/partial allocations dict when a requested resource can’t be allocated (allocate_cpu/memory/gpu return `None`). `ExecutionEngine` proceeds to execute the pipeline anyway, so under contention the scheduler fails to enforce resource limits.

## Issue Context
This PR changes the scheduler lock to `threading.RLock()`, which makes `allocate_resources()` actually execute (previously it would self-deadlock due to nested locking). That makes the silent-allocation-failure path operational.

## Fix Focus Areas
- semantica/pipeline/resource_scheduler.py[180-240]
- semantica/pipeline/resource_scheduler.py[248-290]
- semantica/pipeline/resource_scheduler.py[343-386]
- semantica/pipeline/execution_engine.py[171-176]

## Suggested approach
1. In `allocate_resources`, treat `cpu_cores` / `memory_gb` / `gpu_device` options as required when provided (and defaults likely required too).
2. If any required allocation returns `None`, immediately `release_resources()` for any already-acquired allocations in this call and raise a `ProcessingError`/`ValidationError`.
3. In `ExecutionEngine.execute_pipeline`, handle allocation failure by returning a failed `ExecutionResult` (or implement retry/backoff/queueing if that’s the intended UX).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Lock held during progress updates 🐞 Bug ➹ Performance
Description
• RLock fixes the nested-lock deadlock but keeps the nested-locking structure; allocate_resources
  holds the scheduler lock while calling ProgressTracker update methods.
• ProgressTracker updates run under their own lock and call into display update loops, increasing
  the time the scheduler lock is held and raising contention risk when multiple pipelines
  allocate/release concurrently.
• This is avoidable by reducing lock scope (only protect shared resource state) and/or removing
  nested locking via “*_locked” helpers.
Code

semantica/pipeline/resource_scheduler.py[112]

+        self.lock = threading.RLock()  # RLock: allocate_resources holds lock and calls allocate_cpu/memory/gpu which also acquire it
Evidence
allocate_resources keeps the scheduler lock while performing progress-tracker updates.
ProgressTracker.update_tracking itself acquires its lock and iterates displays (which may do
console/Jupyter work), so resource allocation is blocked longer than necessary. This becomes more
relevant now that RLock makes the code path runnable.

semantica/pipeline/resource_scheduler.py[205-223]
semantica/pipeline/resource_scheduler.py[266-267]
semantica/utils/progress_tracker.py[1296-1332]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ResourceScheduler.allocate_resources()` holds `self.lock` while calling `progress_tracker.update_tracking()`. ProgressTracker performs display updates under its own lock, which can be slow and extends the time the scheduler lock is held. This increases contention (and complicates lock-ordering) during concurrent pipeline execution.

## Issue Context
The PR’s switch to `threading.RLock()` makes nested locking workable, but the scheduler should still minimize work done while holding its lock.

## Fix Focus Areas
- semantica/pipeline/resource_scheduler.py[180-239]
- semantica/utils/progress_tracker.py[1282-1333]

## Suggested approach
1. Move `progress_tracker.update_tracking(...)` calls outside the `with self.lock:` block (or restructure so updates occur before/after critical sections).
2. Keep `with self.lock:` only around mutations/reads of `self.resources` and `self.allocations`.
3. (Optional refactor) Replace nested locking with internal helpers like `_allocate_cpu_locked`, `_allocate_memory_locked`, `_allocate_gpu_locked` that assume the caller holds the lock; then `allocate_resources` can use a standard `Lock` while preserving atomic multi-resource allocation if desired.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

self.resources: Dict[str, Resource] = {}
self.allocations: Dict[str, ResourceAllocation] = {}
self.lock = threading.Lock()
self.lock = threading.RLock() # RLock: allocate_resources holds lock and calls allocate_cpu/memory/gpu which also acquire it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Silent allocation failure 🐞 Bug ✓ Correctness

• With the new RLock, allocate_resources no longer self-deadlocks and will proceed, but it can
  return partial/empty allocations when capacity is insufficient.
• ExecutionEngine does not validate allocations before running steps, so under load pipelines can
  execute without acquiring requested CPU/memory/GPU (oversubscription / broken scheduling semantics).
• allocate_resources still reports tracking status as "completed" even when required resources
  weren’t allocated, reducing operator visibility into the failure mode.
Agent Prompt
## Issue description
`ResourceScheduler.allocate_resources()` can return an empty/partial allocations dict when a requested resource can’t be allocated (allocate_cpu/memory/gpu return `None`). `ExecutionEngine` proceeds to execute the pipeline anyway, so under contention the scheduler fails to enforce resource limits.

## Issue Context
This PR changes the scheduler lock to `threading.RLock()`, which makes `allocate_resources()` actually execute (previously it would self-deadlock due to nested locking). That makes the silent-allocation-failure path operational.

## Fix Focus Areas
- semantica/pipeline/resource_scheduler.py[180-240]
- semantica/pipeline/resource_scheduler.py[248-290]
- semantica/pipeline/resource_scheduler.py[343-386]
- semantica/pipeline/execution_engine.py[171-176]

## Suggested approach
1. In `allocate_resources`, treat `cpu_cores` / `memory_gb` / `gpu_device` options as required when provided (and defaults likely required too).
2. If any required allocation returns `None`, immediately `release_resources()` for any already-acquired allocations in this call and raise a `ProcessingError`/`ValidationError`.
3. In `ExecutionEngine.execute_pipeline`, handle allocation failure by returning a failed `ExecutionResult` (or implement retry/backoff/queueing if that’s the intended UX).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@KaifAhmad1 KaifAhmad1 self-requested a review February 10, 2026 05:59
KaifAhmad1 added a commit that referenced this pull request Feb 10, 2026
- Change threading.Lock() to threading.RLock() in ResourceScheduler.__init__
- Fixes deadlock in allocate_resources() when it calls allocate_cpu/memory/gpu
- Each allocate_* method also acquires the same lock, causing re-entrancy issue
- RLock allows same thread to re-enter lock without blocking itself
- Resolves build_knowledge_base() hanging indefinitely

Additional improvements:
- Add allocation validation to prevent silent failures
- Move progress tracking updates outside lock for better performance
- Add comprehensive regression tests
- Add explanatory comment for RLock usage

Resolves: #299
KaifAhmad1 added a commit that referenced this pull request Feb 10, 2026
Coordinated by:
- @d4ndr4d3 (original author)
- @cursoragent (review coordinator)
- @qodo-code-review (automated review)

Follow-up: Add @qodo-code-reviewer and @cursoragent as PR reviewers
KaifAhmad1 added a commit that referenced this pull request Feb 10, 2026
Coordinated by:
- @d4ndr4d3 (original author)
- @qodo-code-review (automated review)

Follow-up: Add @qodo-code-reviewer and @KaifAhmad1 as PR reviewers
KaifAhmad1 added a commit that referenced this pull request Feb 10, 2026
Test Fixes:
- Fixed isinstance() TypeError in test suite
- Updated validation tests to match actual behavior
- Added comprehensive test coverage (6/6 tests passing)
- Added regression test for complete resource allocation failure

Additional Improvements:
- Allocation validation with ValidationError when no resources allocated
- Performance optimization: moved progress tracking outside lock
- Documentation: added explanatory comment for RLock usage
- Comprehensive test suite in tests/test_resource_scheduler_deadlock.py

Qodo Review Concerns Addressed:
✅ Silent allocation failure - now raises ValidationError
✅ Lock held during progress updates - moved outside lock
✅ Deadlock prevention - RLock allows re-entrant acquisition

All tests passing, ready for merge.
KaifAhmad1 added a commit that referenced this pull request Feb 10, 2026
- Change threading.Lock() to threading.RLock() in ResourceScheduler.__init__
- Fixes deadlock in allocate_resources() when it calls allocate_cpu/memory/gpu
- Each allocate_* method also acquires the same lock, causing re-entrancy issue
- RLock allows same thread to re-enter lock without blocking itself
- Resolves build_knowledge_base() hanging indefinitely

Test fixes and improvements:
- Add allocation validation to prevent silent failures
- Move progress tracking updates outside lock for better performance
- Add comprehensive regression tests
- Add explanatory comment for RLock usage

Addresses Qodo review concerns:
✅ Silent allocation failure - now raises ValidationError
✅ Lock held during progress updates - moved outside lock
✅ Deadlock prevention - RLock allows re-entrant acquisition

Resolves: #299
Copy link
Contributor

@KaifAhmad1 KaifAhmad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core Issue

ResourceScheduler deadlocks when allocate_resources() calls allocate_cpu/memory/gpu() due to nested Lock acquisition.

Fix Applied

self.lock = threading.RLock()  # Line 112

RLock allows same thread re-entrancy, preventing deadlock.


Testing Results

6/6 tests passing:
- Deadlock prevention
- Allocation validation
- Re-entrant lock behavior
- GPU allocation
- Edge case handling

Improvements Added

  1. Allocation Validation: ValidationError when no resources allocated
  2. Performance: Progress tracking moved outside lock
  3. Test Suite: Comprehensive regression tests
  4. Documentation: Clear RLock rationale

Qodo Review Concerns - All Resolved

Concern Status Solution
Silent allocation failure RESOLVED ValidationError added
Lock held during progress updates RESOLVED Moved outside lock
Deadlock prevention RESOLVED RLock implemented

Impact Assessment

  • Fixes: Pipeline blocking deadlock affecting all users
  • Maintains: Thread safety, backward compatibility, API unchanged
  • Improves: Performance (reduced lock contention), error handling

Contributors

  • @d4ndr4d3 - Original deadlock fix. Thanks for your contribution.
  • @qodo-code-review - Automated review feedback

Decision: APPROVE

Critical bug fix with comprehensive testing, all review concerns addressed, and zero breaking changes. Ready for production deployment.

@KaifAhmad1 KaifAhmad1 merged commit db1e3a5 into Hawksight-AI:main Feb 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants