feat(sync): auto-update tests when prompt changes (#203)#328
feat(sync): auto-update tests when prompt changes (#203)#328Enfoirer wants to merge 4 commits intopromptdriven:mainfrom
Conversation
25f6ec3 to
57cd218
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements automatic test regeneration when prompts change by tracking the prompt version that tests were generated from. When code is regenerated from an updated prompt, the system now detects that tests are stale and automatically triggers test regeneration during sync operations.
Changes:
- Added
test_prompt_hashfield toFingerprintdataclass to track which prompt version tests were generated from - Implemented stale test detection logic that compares
test_prompt_hashwith current prompt hash - Added automatic
test_prompt_hashmanagement in fingerprint save operations based on operation type (generate/test/other)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pdd/sync_determine_operation.py | Added test_prompt_hash field to Fingerprint, updated read_fingerprint() to load it, and added stale test detection in _perform_sync_analysis() |
| pdd/sync_orchestration.py | Updated _save_fingerprint_atomic() to automatically set test_prompt_hash based on operation type |
| pdd/operation_log.py | Updated save_fingerprint() to mirror atomic fingerprint logic for managing test_prompt_hash |
| tests/test_sync_determine_operation.py | Added 8 new tests covering Fingerprint field, read operations, and stale test detection |
| tests/test_sync_orchestration.py | Added 5 new tests for fingerprint atomic operations |
| tests/test_operation_log.py | Added 5 new tests for fingerprint save operations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
45fc5da to
44916cf
Compare
Add test_prompt_hash field to Fingerprint to track which prompt version tests were generated from. When prompt changes and code is regenerated, sync now detects stale tests and triggers test regeneration. - Add test_prompt_hash field to Fingerprint dataclass - Update read_fingerprint() to load test_prompt_hash from JSON - Add stale test detection in _perform_sync_analysis() - Update _save_operation_fingerprint() to set test_prompt_hash based on operation: - generate: sets to None (tests now stale) - test: sets to current prompt hash - other ops: preserves existing value - Add 12 unit tests covering the new functionality
- Rename test method for clarity: test_skip_operation -> test_skip_test_operation - Simplify test class name: TestIssue203SaveFingerprintTestPromptHash -> TestIssue203TestPromptHashManagement - Consolidate duplicate read_fingerprint imports in _save_fingerprint_atomic
44916cf to
5451719
Compare
|
prompt: test: LLM needed?
|
promptdriven#349 tests The conflict was in tests/test_sync_determine_operation.py where both issue promptdriven#203 (stale test detection) and issue promptdriven#349 (infinite loop fix) added new test classes at the end of the file. Both sets are retained.
2769a56 to
d51ffc2
Compare
|
Step 1 (detect): No LLM — pure hash comparison (test_prompt_hash ≠ current_prompt_hash). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@Enfoirer can you want to address the github comments? |
Rename test_skip_test_operation_preserves_test_prompt_hash_without_atomic_state to test_skip_prefixed_operation_preserves_test_prompt_hash_without_atomic_state to better reflect that it tests skip:-prefixed operations (like skip:test).
Description
Adds automatic detection of stale tests during sync by tracking which prompt version tests were generated from. When the prompt changes and code is regenerated, sync now automatically triggers test regeneration to keep tests in sync with the latest prompt.
Fixes #203
Changes Made
Code Changes
pdd/sync_determine_operation.py:
test_prompt_hash: Optional[str]field toFingerprintdataclass to track which prompt version tests were generated fromread_fingerprint()to loadtest_prompt_hashfrom fingerprint JSON files_perform_sync_analysis(): whentest_prompt_hash != current_prompt_hash, returnsSyncDecision(operation='test')with reason "Tests outdated - generated from old prompt version"pdd/sync_orchestration.py:
_save_fingerprint_atomic()to settest_prompt_hashbased on operation type:generate: setstest_prompt_hash=None(code regenerated, tests are now stale)test: setstest_prompt_hash=current_prompt_hash(tests regenerated, linked to current prompt)test_prompt_hashvaluepdd/operation_log.py:
save_fingerprint()to automatically managetest_prompt_hashbased on operation typepdd generate,pdd test, etc.) correctly handletest_prompt_hash_save_fingerprint_atomic()for consistencyTest Changes
tests/test_sync_determine_operation.py: Added 8 new tests for issue #203
TestIssue203FingerprintTestPromptHash:test_fingerprint_has_test_prompt_hash_fieldtest_fingerprint_test_prompt_hash_defaults_to_nonetest_fingerprint_serialization_includes_test_prompt_hashTestIssue203ReadFingerprintTestPromptHash:test_read_fingerprint_with_test_prompt_hashtest_read_fingerprint_backward_compat_without_test_prompt_hashTestIssue203StaleTestDetection:test_detects_stale_tests_when_test_prompt_hash_differstest_no_stale_test_detection_when_test_prompt_hash_matchestest_no_stale_test_detection_when_test_prompt_hash_is_nonetests/test_sync_orchestration.py: Added 5 new tests for issue #203
TestIssue203SaveOperationFingerprintTestPromptHash:test_generate_operation_sets_test_prompt_hash_to_nonetest_test_operation_sets_test_prompt_hash_to_currenttest_fix_operation_preserves_test_prompt_hashtest_generate_then_test_workflowtest_skip_operation_preserves_test_prompt_hash_without_atomic_statetests/test_operation_log.py: Added 5 new tests for issue #203
TestIssue203SaveFingerprintTestPromptHash:test_generate_operation_sets_test_prompt_hash_to_nonetest_test_operation_sets_test_prompt_hash_to_currenttest_example_operation_preserves_test_prompt_hashtest_fix_operation_preserves_test_prompt_hashtest_explicit_test_prompt_hash_overrides_auto_logicPrompt Changes
added the prompt for the new feature
Testing
Test Results
Test Coverage
Note:
sync_determine_operation.pycoverage is lower due to pytest-cov module import timingIssue #203 Code Coverage Detail
Fingerprint.test_prompt_hashfieldread_fingerprint()loading test_prompt_hash_perform_sync_analysis()stale detection_save_fingerprint_atomic()test_prompt_hash logicsave_fingerprint()auto test_prompt_hash handlingTesting Environment
Manual Test Results (E2E)
test_prompt_hashequalsprompt_hashafter syncE2E Test Details:
tmp_issue203_e2ee2e_issue203_python.promptto simulate a prompt changepython -m pdd.cli --force --local --context default sync e2e_issue203 --target-coverage 0 --max-attempts 1 --budget 5test_prompt_hash == prompt_hashDev Unit Checklist
Regression Testing
test_prompt_hashhandled gracefully)Regression Test Results
Note on Test 5 failure: The
changecommand test fails with"Agentic mode requires exactly 1 argument: ISSUE_URL". This is a pre-existing issue unrelated to this PR - thechangecommand API was modified but the regression test script was not updated.Screenshots/Logs
N/A - Behavior verified through unit tests and E2E testing.
Related Issues
Fixes #203
Additional Notes
This fix improves the sync workflow by:
test_prompt_hashfield tracks which prompt version tests were generated fromtest_prompt_hashare handled gracefully (treated as unknown state)Fingerprint Lifecycle
test_prompt_hashValuegenerateNone(tests now stale)testfix,verify, etc.