Add failing tests for #375: Race condition in LLM cost tracking#386
Add failing tests for #375: Race condition in LLM cost tracking#386Serhan-Asad wants to merge 5 commits intopromptdriven:mainfrom
Conversation
SummaryFixed critical race condition in LLM cost tracking that caused data corruption when multiple jobs run concurrently in server mode. Replaced global Impact:
Fix:
Test Results
Specific test results: Overall test suite:
Manual TestingAutomated tests simulate real server mode scenario:
Checklist
Fixes #375 |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive failing tests to reproduce and validate the fix for issue #375, where concurrent jobs in server mode experience race conditions in LLM cost tracking due to shared global state.
Changes:
- Adds unit tests for concurrent callback execution with thread-local storage
- Adds E2E tests simulating full server mode concurrent scenarios
- Updates all existing test files to use the new
_CALLBACK_DATAthread-local storage instead of the old_LAST_CALLBACK_DATAdictionary
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pdd/llm_invoke.py | Implements thread-local storage (_CALLBACK_DATA) replacing global dictionary to fix race condition |
| tests/test_llm_invoke_concurrency.py | New unit tests verifying thread-safe callback data isolation |
| tests/test_e2e_issue_375_concurrent_cost_tracking.py | New E2E tests simulating concurrent server mode operations |
| tests/test_llm_invoke.py | Updated to mock thread-local storage instead of dictionary |
| tests/test_llm_invoke_vertex_retry.py | Updated mock for thread-local storage compatibility |
| tests/test_e2e_openai_required_array.py | Updated mock for thread-local storage compatibility |
| tests/test_e2e_issue_295_openai_schema.py | Updated mock for thread-local storage compatibility |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
making this draft until conflicts and comments are addressed |
…ptdriven#375) This commit adds two test files that reproduce the race condition reported in issue promptdriven#375, where concurrent jobs share the global _LAST_CALLBACK_DATA dictionary, causing cost and token data corruption. Test files: - tests/test_llm_invoke_concurrency.py: Unit test that directly tests the _litellm_success_callback() function under concurrent execution - tests/test_e2e_issue_375_concurrent_cost_tracking.py: E2E test that simulates the full server mode scenario with 3 concurrent jobs Both tests currently fail (as expected), demonstrating the bug. They will pass once the fix is implemented using threading.local() storage. Related to promptdriven#375 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated 6 test files to reference _CALLBACK_DATA (thread-local) instead of _LAST_CALLBACK_DATA (global dict) following the fix for issue promptdriven#375. - Replaced all _LAST_CALLBACK_DATA references with _CALLBACK_DATA - Updated mock patches to use object with attributes instead of dict - Fixed indentation for proper context manager nesting
- Fixed 6 tests added to main after PR creation that still referenced _LAST_CALLBACK_DATA - Updated line number reference in E2E test from 690 to 689 (Copilot review) - All tests now pass (83/84, 1 pre-existing LiteLLM provider config issue) Addresses Copilot review comment on PR promptdriven#386
|
✅ Copilot review addressed Fixed the line number reference in commit 4ec74b9:
Also rebased on latest main (88a37d5) and resolved all conflicts. All tests passing (83/84, 1 pre-existing unrelated failure). |
7a559ca to
4ec74b9
Compare
Instead of renaming to _CALLBACK_DATA, keep the original variable name _LAST_CALLBACK_DATA for the thread-local storage. This ensures: - All existing tests continue to work without modification - Future tests will naturally use the correct name - No breaking changes for other contributors Only changes needed: - pdd/llm_invoke.py: Rename _CALLBACK_DATA back to _LAST_CALLBACK_DATA - New test files: Update imports to use _LAST_CALLBACK_DATA All tests passing (16/17, 1 skipped)
Simplified Fix ✅Changed approach to maintain backward compatibility by keeping the original variable name Benefits:
Changes:
Test Results: 16/17 passing, 1 skipped (pre-existing) |
|
target 2/3 |
Summary
Adds failing tests that detect the race condition bug reported in #375, where concurrent jobs in server mode share the global
_LAST_CALLBACK_DATAdictionary, causing cost and token data corruption.Test Files
Unit test:
tests/test_llm_invoke_concurrency.py_litellm_success_callback()function under concurrent executionmax_concurrent=3)E2E test:
tests/test_e2e_issue_375_concurrent_cost_tracking.pyWhat This PR Contains
Root Cause
The module-level global dictionary
_LAST_CALLBACK_DATAinpdd/llm_invoke.py:690-695is shared across all threads in server mode. When LiteLLM executes callbacks for concurrent jobs, each callback writes to this shared dictionary, and eachllm_invoke()call reads from it. With no synchronization mechanism, threads overwrite each other's data in a classic read-after-write race condition.Impact:
pdd connectNext Steps
pdd/llm_invoke.pyusingthreading.local()storagepytest tests/test_llm_invoke_concurrency.py -v)pytest tests/test_e2e_issue_375_concurrent_cost_tracking.py -v)Fixes #375
Generated by PDD agentic bug workflow