Add failing tests for #375: Race condition in LLM cost tracking by Serhan-Asad · Pull Request #386 · promptdriven/pdd

Serhan-Asad · 2026-01-23T03:09:49Z

Summary

Adds failing tests that detect the race condition bug reported in #375, where concurrent jobs in server mode share the global _LAST_CALLBACK_DATA dictionary, causing cost and token data corruption.

Test Files

Unit test: tests/test_llm_invoke_concurrency.py
- Directly tests _litellm_success_callback() function under concurrent execution
- Uses 3 concurrent threads (matching server's max_concurrent=3)
- Demonstrates threads reading each other's callback data
E2E test: tests/test_e2e_issue_375_concurrent_cost_tracking.py
- Simulates the full server mode scenario with concurrent jobs
- Tests the complete write→read cycle that happens in production
- Verifies each job should get its own cost/token data

What This PR Contains

✅ Failing unit test that reproduces the reported bug
✅ Failing E2E test that verifies the bug at integration level
✅ Tests are verified to fail on current code (demonstrating the bug exists)
✅ Tests will pass once the bug is fixed with thread-local storage

Root Cause

The module-level global dictionary _LAST_CALLBACK_DATA in pdd/llm_invoke.py:690-695 is shared across all threads in server mode. When LiteLLM executes callbacks for concurrent jobs, each callback writes to this shared dictionary, and each llm_invoke() call reads from it. With no synchronization mechanism, threads overwrite each other's data in a classic read-after-write race condition.

Impact:

Budget limits malfunction (jobs may stop early or exceed budgets)
Cost reports show incorrect values
Token tracking is corrupted
Affects all concurrent operations via pdd connect

Next Steps

Implement the fix at pdd/llm_invoke.py using threading.local() storage
Verify the unit test passes (pytest tests/test_llm_invoke_concurrency.py -v)
Verify the E2E test passes (pytest tests/test_e2e_issue_375_concurrent_cost_tracking.py -v)
Run full test suite to check for regressions
Mark PR as ready for review

Fixes #375

Generated by PDD agentic bug workflow

Serhan-Asad · 2026-01-23T22:56:39Z

Summary

Fixed critical race condition in LLM cost tracking that caused data corruption when multiple jobs run concurrently in server mode. Replaced global _LAST_CALLBACK_DATA
dictionary with thread-local storage (threading.local()) to ensure each concurrent job tracks its own cost data independently.

Impact:

Server mode (pdd connect) with max_concurrent=3 was sharing cost data across threads
Jobs were reading wrong cost values from other concurrent jobs
Budget enforcement was failing (jobs stopped early or exceeded limits)
Cost reports showed incorrect amounts

Fix:

Replaced module-level global dict with threading.local() for thread-safe storage
Each thread gets isolated callback data (cost, tokens, finish_reason)
Read operations use getattr() with safe defaults
All 6 affected test files updated to reference new _CALLBACK_DATA variable

Test Results

Unit tests: PASS
Regression tests: PASS
Sync regression: PASS
Test coverage: 70%

Specific test results:

test_concurrent_callbacks_corrupt_cost_data - Verifies thread isolation                                                                                                     
test_sequential_callbacks_work - Ensures backward compatibility                                                                                                             
test_concurrent_server_jobs_cost_tracking_e2e - Full server mode simulation                                                                                                 
test_sequential_server_jobs_still_work_e2e - Regression prevention                                                                                                          
All 7 related e2e tests (OpenAI schema, Vertex retry, etc.)

Overall test suite:

77/78 tests pass in test_llm_invoke.py
1 pre-existing failure unrelated to this fix (LiteLLM provider config issue)

Manual Testing

Automated tests simulate real server mode scenario:

Created concurrent threads matching server's max_concurrent=3 setting
Each thread writes different cost values to callback storage
Each thread reads back and verifies it gets its own data (not corrupted)
Tests stagger timing to maximize chance of race condition exposure
Verified both concurrent and sequential execution paths work correctly

Checklist

All tests pass
Copilot comments addressed
No temp files committed
No merge conflicts
(If bug fix) Failing test added - 4 tests added that reproduce race condition
(If feature) A/B comparison included - N/A (bug fix)
(If prompts changed) Submitted to pdd_cap - No prompts changed in pdd repo

Fixes #375

Copilot

Pull request overview

This PR adds comprehensive failing tests to reproduce and validate the fix for issue #375, where concurrent jobs in server mode experience race conditions in LLM cost tracking due to shared global state.

Changes:

Adds unit tests for concurrent callback execution with thread-local storage
Adds E2E tests simulating full server mode concurrent scenarios
Updates all existing test files to use the new _CALLBACK_DATA thread-local storage instead of the old _LAST_CALLBACK_DATA dictionary

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pdd/llm_invoke.py	Implements thread-local storage (`_CALLBACK_DATA`) replacing global dictionary to fix race condition
tests/test_llm_invoke_concurrency.py	New unit tests verifying thread-safe callback data isolation
tests/test_e2e_issue_375_concurrent_cost_tracking.py	New E2E tests simulating concurrent server mode operations
tests/test_llm_invoke.py	Updated to mock thread-local storage instead of dictionary
tests/test_llm_invoke_vertex_retry.py	Updated mock for thread-local storage compatibility
tests/test_e2e_openai_required_array.py	Updated mock for thread-local storage compatibility
tests/test_e2e_issue_295_openai_schema.py	Updated mock for thread-local storage compatibility

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_e2e_issue_375_concurrent_cost_tracking.py

gltanaka · 2026-01-28T19:34:19Z

making this draft until conflicts and comments are addressed

gltanaka · 2026-01-28T19:34:26Z

@Serhan-Asad

…ptdriven#375) This commit adds two test files that reproduce the race condition reported in issue promptdriven#375, where concurrent jobs share the global _LAST_CALLBACK_DATA dictionary, causing cost and token data corruption. Test files: - tests/test_llm_invoke_concurrency.py: Unit test that directly tests the _litellm_success_callback() function under concurrent execution - tests/test_e2e_issue_375_concurrent_cost_tracking.py: E2E test that simulates the full server mode scenario with 3 concurrent jobs Both tests currently fail (as expected), demonstrating the bug. They will pass once the fix is implemented using threading.local() storage. Related to promptdriven#375 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixes promptdriven#375

Updated 6 test files to reference _CALLBACK_DATA (thread-local) instead of _LAST_CALLBACK_DATA (global dict) following the fix for issue promptdriven#375. - Replaced all _LAST_CALLBACK_DATA references with _CALLBACK_DATA - Updated mock patches to use object with attributes instead of dict - Fixed indentation for proper context manager nesting

- Fixed 6 tests added to main after PR creation that still referenced _LAST_CALLBACK_DATA - Updated line number reference in E2E test from 690 to 689 (Copilot review) - All tests now pass (83/84, 1 pre-existing LiteLLM provider config issue) Addresses Copilot review comment on PR promptdriven#386

Serhan-Asad · 2026-01-28T21:58:54Z

✅ Copilot review addressed

Fixed the line number reference in commit 4ec74b9:

Updated from line 690 to 689 (the actual line where _CALLBACK_DATA is defined in pdd/llm_invoke.py)

Also rebased on latest main (88a37d5) and resolved all conflicts. All tests passing (83/84, 1 pre-existing unrelated failure).

Instead of renaming to _CALLBACK_DATA, keep the original variable name _LAST_CALLBACK_DATA for the thread-local storage. This ensures: - All existing tests continue to work without modification - Future tests will naturally use the correct name - No breaking changes for other contributors Only changes needed: - pdd/llm_invoke.py: Rename _CALLBACK_DATA back to _LAST_CALLBACK_DATA - New test files: Update imports to use _LAST_CALLBACK_DATA All tests passing (16/17, 1 skipped)

Serhan-Asad · 2026-01-28T22:10:30Z

Simplified Fix ✅

Changed approach to maintain backward compatibility by keeping the original variable name _LAST_CALLBACK_DATA instead of renaming to _CALLBACK_DATA.

Benefits:

✅ All existing tests work without modification
✅ New tests added to main after this PR work automatically
✅ Future tests will naturally use the correct name
✅ No breaking changes for contributors

Changes:

Only 3 files modified (down from 7 previously)
pdd/llm_invoke.py: Thread-local storage using original name
New test files: Updated to use _LAST_CALLBACK_DATA

Test Results: 16/17 passing, 1 skipped (pre-existing)

gltanaka · 2026-02-02T18:29:23Z

target 2/3

Serhan-Asad mentioned this pull request Jan 23, 2026

Race Condition in LLM Cost Tracking Causes Data Corruption #375

Open

Serhan-Asad marked this pull request as ready for review January 23, 2026 22:56

gltanaka requested a review from Copilot January 24, 2026 20:48

Copilot AI reviewed Jan 24, 2026

View reviewed changes

tests/test_e2e_issue_375_concurrent_cost_tracking.py Outdated Show resolved Hide resolved

gltanaka marked this pull request as draft January 28, 2026 19:34

Serhan-Asad and others added 4 commits January 28, 2026 16:53

fix: Race Condition in LLM Cost Tracking Causes Data Corruption

022e50f

Fixes promptdriven#375

Serhan-Asad force-pushed the fix/issue-375 branch from 7a559ca to 4ec74b9 Compare January 28, 2026 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failing tests for #375: Race condition in LLM cost tracking#386

Add failing tests for #375: Race condition in LLM cost tracking#386
Serhan-Asad wants to merge 5 commits intopromptdriven:mainfrom
Serhan-Asad:fix/issue-375

Serhan-Asad commented Jan 23, 2026

Uh oh!

Serhan-Asad commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gltanaka commented Jan 28, 2026

Uh oh!

gltanaka commented Jan 28, 2026

Uh oh!

Serhan-Asad commented Jan 28, 2026

Uh oh!

Serhan-Asad commented Jan 28, 2026

Uh oh!

gltanaka commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Serhan-Asad commented Jan 23, 2026

Summary

Test Files

What This PR Contains

Root Cause

Next Steps

Uh oh!

Serhan-Asad commented Jan 23, 2026

Summary

Test Results

Manual Testing

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

gltanaka commented Jan 28, 2026

Uh oh!

gltanaka commented Jan 28, 2026

Uh oh!

Serhan-Asad commented Jan 28, 2026

Uh oh!

Serhan-Asad commented Jan 28, 2026

Simplified Fix ✅

Uh oh!

gltanaka commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants