Add failing tests for #452: O(n²) complexity in _scan_risky_placeholders #464

prompt-driven-github · 2026-02-04T19:43:34Z

Summary

Adds failing tests that detect the O(n²) performance bug reported in #452.

Test Files

Unit test: tests/test_preprocess.py (3 new test functions appended)
E2E test: tests/test_e2e_issue_452_preprocess_performance.py

What This PR Contains

Failing unit test (test_scan_risky_placeholders_performance_issue_452) that reproduces the O(n²) complexity by measuring execution time on files of increasing size (2k, 4k, 8k lines). Currently fails with 3.11x slowdown when doubling file size.
Passing correctness test (test_scan_risky_placeholders_correctness_large_file_issue_452) that ensures line numbers are accurate on large files (5000+ lines).
Passing edge cases test (test_scan_risky_placeholders_edge_cases_issue_452) that validates boundary conditions.
Failing E2E test that verifies the user-facing performance degradation using the full pdd generate command path. Currently takes 9+ seconds for 5000-line prompts.

All tests are verified to correctly detect the bug and will pass once the optimization is implemented.

Root Cause

The _scan_risky_placeholders() function at pdd/preprocess.py:101 and :106 contains an O(n²) complexity issue. For every placeholder match found by the regex iterator, the code calls text.count("\n", 0, m.start()) + 1 to compute the line number. This scans from position 0 to the match position for every placeholder, resulting in quadratic scaling:

For a 5000-line file with 500 placeholders (avg position ~2500), this performs ~1,250,000 character scans
The same information could be computed with ~5,000 scans using a pre-built line position map and binary search

This causes 100-250x slowdown on large prompt files (5000+ lines), making pdd generate, pdd sync, and all preprocessing operations workflow-breaking for users.

Next Steps

Implement the fix: Pre-compute line position map once (O(n))
Use binary search (bisect_right) for O(log n) line lookups
Verify the unit performance test passes (slowdown <2.5x)
Verify the E2E test passes (processing time <2 seconds)
Run full test suite to check for regressions
Mark PR as ready for review

Expected Performance Improvement

Small files: 5x faster
Medium files: 24x faster
Large files: 48x faster
Very large files: 96-240x faster

Fixes #452

Generated by PDD agentic bug workflow (Step 10/11)

…ceholders - Add unit tests to tests/test_preprocess.py that detect quadratic scaling - Add E2E test that verifies user-facing performance degradation - Performance test fails with 3.11x slowdown (expected <2.5x for linear) - Tests will pass once the O(n) optimization is implemented Related to #452 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

prompt-driven-github bot mentioned this pull request Feb 4, 2026

Performance: O(n²) complexity in _scan_risky_placeholders causes 100-250x slowdown on large prompts #452

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failing tests for #452: O(n²) complexity in _scan_risky_placeholders #464

Add failing tests for #452: O(n²) complexity in _scan_risky_placeholders #464

Uh oh!

prompt-driven-github bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add failing tests for #452: O(n²) complexity in _scan_risky_placeholders #464

Are you sure you want to change the base?

Add failing tests for #452: O(n²) complexity in _scan_risky_placeholders #464

Uh oh!

Conversation

prompt-driven-github bot commented Feb 4, 2026

Summary

Test Files

What This PR Contains

Root Cause

Next Steps

Expected Performance Improvement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant