Add failing tests for #452: O(n²) complexity in _scan_risky_placeholders #464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds failing tests that detect the O(n²) performance bug reported in #452.
Test Files
tests/test_preprocess.py(3 new test functions appended)tests/test_e2e_issue_452_preprocess_performance.pyWhat This PR Contains
test_scan_risky_placeholders_performance_issue_452) that reproduces the O(n²) complexity by measuring execution time on files of increasing size (2k, 4k, 8k lines). Currently fails with 3.11x slowdown when doubling file size.test_scan_risky_placeholders_correctness_large_file_issue_452) that ensures line numbers are accurate on large files (5000+ lines).test_scan_risky_placeholders_edge_cases_issue_452) that validates boundary conditions.pdd generatecommand path. Currently takes 9+ seconds for 5000-line prompts.All tests are verified to correctly detect the bug and will pass once the optimization is implemented.
Root Cause
The
_scan_risky_placeholders()function atpdd/preprocess.py:101and:106contains an O(n²) complexity issue. For every placeholder match found by the regex iterator, the code callstext.count("\n", 0, m.start()) + 1to compute the line number. This scans from position 0 to the match position for every placeholder, resulting in quadratic scaling:This causes 100-250x slowdown on large prompt files (5000+ lines), making
pdd generate,pdd sync, and all preprocessing operations workflow-breaking for users.Next Steps
Expected Performance Improvement
Fixes #452
Generated by PDD agentic bug workflow (Step 10/11)