feat(sync): auto-update tests when prompt changes (#203) by Enfoirer · Pull Request #328 · promptdriven/pdd

Enfoirer · 2026-01-18T15:33:02Z

Description

Adds automatic detection of stale tests during sync by tracking which prompt version tests were generated from. When the prompt changes and code is regenerated, sync now automatically triggers test regeneration to keep tests in sync with the latest prompt.

Fixes #203

Changes Made

Code Changes

pdd/sync_determine_operation.py:

Added test_prompt_hash: Optional[str] field to Fingerprint dataclass to track which prompt version tests were generated from
Updated read_fingerprint() to load test_prompt_hash from fingerprint JSON files
Added stale test detection logic in _perform_sync_analysis(): when test_prompt_hash != current_prompt_hash, returns SyncDecision(operation='test') with reason "Tests outdated - generated from old prompt version"

pdd/sync_orchestration.py:

Updated _save_fingerprint_atomic() to set test_prompt_hash based on operation type:
- generate: sets test_prompt_hash=None (code regenerated, tests are now stale)
- test: sets test_prompt_hash=current_prompt_hash (tests regenerated, linked to current prompt)
- Other operations: preserves existing test_prompt_hash value

pdd/operation_log.py:

Updated save_fingerprint() to automatically manage test_prompt_hash based on operation type
This ensures manual commands (pdd generate, pdd test, etc.) correctly handle test_prompt_hash
Logic mirrors _save_fingerprint_atomic() for consistency

Test Changes

tests/test_sync_determine_operation.py: Added 8 new tests for issue #203

TestIssue203FingerprintTestPromptHash:
- test_fingerprint_has_test_prompt_hash_field
- test_fingerprint_test_prompt_hash_defaults_to_none
- test_fingerprint_serialization_includes_test_prompt_hash
TestIssue203ReadFingerprintTestPromptHash:
- test_read_fingerprint_with_test_prompt_hash
- test_read_fingerprint_backward_compat_without_test_prompt_hash
TestIssue203StaleTestDetection:
- test_detects_stale_tests_when_test_prompt_hash_differs
- test_no_stale_test_detection_when_test_prompt_hash_matches
- test_no_stale_test_detection_when_test_prompt_hash_is_none

tests/test_sync_orchestration.py: Added 5 new tests for issue #203

TestIssue203SaveOperationFingerprintTestPromptHash:
- test_generate_operation_sets_test_prompt_hash_to_none
- test_test_operation_sets_test_prompt_hash_to_current
- test_fix_operation_preserves_test_prompt_hash
- test_generate_then_test_workflow
- test_skip_operation_preserves_test_prompt_hash_without_atomic_state

tests/test_operation_log.py: Added 5 new tests for issue #203

TestIssue203SaveFingerprintTestPromptHash:
- test_generate_operation_sets_test_prompt_hash_to_none
- test_test_operation_sets_test_prompt_hash_to_current
- test_example_operation_preserves_test_prompt_hash
- test_fix_operation_preserves_test_prompt_hash
- test_explicit_test_prompt_hash_overrides_auto_logic

Prompt Changes

added the prompt for the new feature

Testing

Test Results

tests/test_sync_determine_operation.py: 86 passed
tests/test_sync_orchestration.py:       95 passed
tests/test_operation_log.py:            28 passed
Total: 209 passed in 64.29s

Test Coverage

File	Coverage	Statements	Missed
pdd/operation_log.py	92%	156	13
pdd/sync_orchestration.py	68%	849	268
pdd/sync_determine_operation.py	~13%	689	-

Note:

The code added for issue Automatically update tests based on prompt changes during sync #203 is 100% covered by the new tests
sync_determine_operation.py coverage is lower due to pytest-cov module import timing
Lower overall coverage is because these files contain many other features unrelated to issue Automatically update tests based on prompt changes during sync #203

Issue #203 Code Coverage Detail

Component	Status
`Fingerprint.test_prompt_hash` field	✅ Covered
`read_fingerprint()` loading test_prompt_hash	✅ Covered
`_perform_sync_analysis()` stale detection	✅ Covered
`_save_fingerprint_atomic()` test_prompt_hash logic	✅ Covered
`save_fingerprint()` auto test_prompt_hash handling	✅ Covered

Testing Environment

Platform: macOS Darwin 24.6.0
Python: 3.12.12 / 3.13.7
pytest: 9.0.1

Manual Test Results (E2E)

Test Case	Result	Evidence
Prompt change triggers test regeneration	✅	Sync log shows: generate → verify → test → nothing
Fingerprint updated correctly	✅	`test_prompt_hash` equals `prompt_hash` after sync
Cost tracking	✅	~$0.0519 (model: gpt-4o)

E2E Test Details:

Created temp project at tmp_issue203_e2e
Updated e2e_issue203_python.prompt to simulate a prompt change
Ran: python -m pdd.cli --force --local --context default sync e2e_issue203 --target-coverage 0 --max-attempts 1 --budget 5
Result: Sync completed successfully, fingerprint shows test_prompt_hash == prompt_hash

Dev Unit Checklist

Code: Updated implementation (3 files)
Tests: Added comprehensive test coverage (18 new tests)
Prompt: Added the prompt for the new feature
Example: N/A (no new user-facing API)

Regression Testing

All existing sync tests pass (209 tests)
No changes to public API (additive only)
Backward compatible (existing fingerprints without test_prompt_hash handled gracefully)

Regression Test Results

Test	Description	Status
0	--list-contexts and --context	✅
1	generate command	✅
2	example command	✅
3	preprocess command	✅
4	update command	✅
5	change command	❌ (pre-existing issue)
6	crash command	✅
7	verify command	✅
8	test command	✅
9	fix command	✅
10	split command	✅
11	detect command	✅
12	conflicts command	✅
13	trace command	✅
14	bug command	✅
15	auto-deps command	✅
16	Global Options	✅
18-21	Other tests	✅

Note on Test 5 failure: The change command test fails with "Agentic mode requires exactly 1 argument: ISSUE_URL". This is a pre-existing issue unrelated to this PR - the change command API was modified but the regression test script was not updated.

Screenshots/Logs

N/A - Behavior verified through unit tests and E2E testing.

Related Issues

Fixes #203

Additional Notes

This fix improves the sync workflow by:

Automatic stale test detection: Tests are automatically regenerated when prompt changes
Fingerprint tracking: test_prompt_hash field tracks which prompt version tests were generated from
Backward compatible: Existing fingerprints without test_prompt_hash are handled gracefully (treated as unknown state)
Follows PDD philosophy: Ensures tests stay in sync with prompt (the source of truth)

Fingerprint Lifecycle

Operation	`test_prompt_hash` Value
`generate`	`None` (tests now stale)
`test`	Current prompt hash
`fix`, `verify`, etc.	Preserved from previous

Copilot

Pull request overview

This PR implements automatic test regeneration when prompts change by tracking the prompt version that tests were generated from. When code is regenerated from an updated prompt, the system now detects that tests are stale and automatically triggers test regeneration during sync operations.

Changes:

Added test_prompt_hash field to Fingerprint dataclass to track which prompt version tests were generated from
Implemented stale test detection logic that compares test_prompt_hash with current prompt hash
Added automatic test_prompt_hash management in fingerprint save operations based on operation type (generate/test/other)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pdd/sync_determine_operation.py	Added `test_prompt_hash` field to Fingerprint, updated `read_fingerprint()` to load it, and added stale test detection in `_perform_sync_analysis()`
pdd/sync_orchestration.py	Updated `_save_fingerprint_atomic()` to automatically set `test_prompt_hash` based on operation type
pdd/operation_log.py	Updated `save_fingerprint()` to mirror atomic fingerprint logic for managing `test_prompt_hash`
tests/test_sync_determine_operation.py	Added 8 new tests covering Fingerprint field, read operations, and stale test detection
tests/test_sync_orchestration.py	Added 5 new tests for fingerprint atomic operations
tests/test_operation_log.py	Added 5 new tests for fingerprint save operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_sync_orchestration.py

tests/test_operation_log.py

pdd/sync_orchestration.py

Add test_prompt_hash field to Fingerprint to track which prompt version tests were generated from. When prompt changes and code is regenerated, sync now detects stale tests and triggers test regeneration. - Add test_prompt_hash field to Fingerprint dataclass - Update read_fingerprint() to load test_prompt_hash from JSON - Add stale test detection in _perform_sync_analysis() - Update _save_operation_fingerprint() to set test_prompt_hash based on operation: - generate: sets to None (tests now stale) - test: sets to current prompt hash - other ops: preserves existing value - Add 12 unit tests covering the new functionality

- Rename test method for clarity: test_skip_operation -> test_skip_test_operation - Simplify test class name: TestIssue203SaveFingerprintTestPromptHash -> TestIssue203TestPromptHashManagement - Consolidate duplicate read_fingerprint imports in _save_fingerprint_atomic

gltanaka · 2026-01-20T01:17:52Z

prompt:
write a python program to print hello and bye

test:
test__
pytest failure as a screen

LLM needed?

determine what test need to change -> pytest
test changes- > fix
incremental new tests for new feature -> _LLM changes needed?

promptdriven#349 tests The conflict was in tests/test_sync_determine_operation.py where both issue promptdriven#203 (stale test detection) and issue promptdriven#349 (infinite loop fix) added new test classes at the end of the file. Both sets are retained.

Enfoirer · 2026-01-26T15:59:38Z

Step 1 (detect): No LLM — pure hash comparison (test_prompt_hash ≠ current_prompt_hash).
Step 2 (regenerate/fix): LLM needed — cmd_test_main() calls LLM to regenerate tests; fix_main() calls
LLM if tests fail.
Step 3 (new feature tests): No LLM template changes — the existing generic test generation prompt
handles new features automatically since the feature requirements come from the source prompt.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_sync_orchestration.py

tests/test_sync_determine_operation.py

gltanaka · 2026-01-28T19:35:37Z

@Enfoirer can you want to address the github comments?

Rename test_skip_test_operation_preserves_test_prompt_hash_without_atomic_state to test_skip_prefixed_operation_preserves_test_prompt_hash_without_atomic_state to better reflect that it tests skip:-prefixed operations (like skip:test).

Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch 4 times, most recently from 25f6ec3 to 57cd218 Compare January 18, 2026 16:30

gltanaka requested a review from Copilot January 18, 2026 19:08

Copilot AI reviewed Jan 18, 2026

View reviewed changes

tests/test_sync_orchestration.py Outdated Show resolved Hide resolved

tests/test_operation_log.py Outdated Show resolved Hide resolved

pdd/sync_orchestration.py Show resolved Hide resolved

Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch from 45fc5da to 44916cf Compare January 19, 2026 15:54

Enfoirer added 2 commits January 19, 2026 23:59

Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch from 44916cf to 5451719 Compare January 19, 2026 16:00

gltanaka marked this pull request as draft January 20, 2026 01:18

jamesdlevine mentioned this pull request Jan 21, 2026

pdd fix modifies prompt files but does not regenerate corresponding code #356

Open

Serhan-Asad mentioned this pull request Jan 22, 2026

Race Condition in LLM Cost Tracking Causes Data Corruption #375

Open

Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch from 2769a56 to d51ffc2 Compare January 26, 2026 15:59

Enfoirer marked this pull request as ready for review January 26, 2026 15:59

Serhan-Asad mentioned this pull request Jan 26, 2026

Bug: File Handle Resource Leak in SyncLock.acquire() #403

Closed

gltanaka requested a review from Copilot January 26, 2026 17:42

Copilot AI reviewed Jan 26, 2026

View reviewed changes

tests/test_sync_orchestration.py Outdated Show resolved Hide resolved

tests/test_sync_determine_operation.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sync): auto-update tests when prompt changes (#203)#328

feat(sync): auto-update tests when prompt changes (#203)#328
Enfoirer wants to merge 4 commits intopromptdriven:mainfrom
Enfoirer:feature/issue-203-auto-update-tests-sync

Enfoirer commented Jan 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gltanaka commented Jan 20, 2026

Uh oh!

Enfoirer commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

gltanaka commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Enfoirer commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Code Changes

Test Changes

Prompt Changes

Testing

Test Results

Test Coverage

Issue #203 Code Coverage Detail

Testing Environment

Manual Test Results (E2E)

Dev Unit Checklist

Regression Testing

Regression Test Results

Screenshots/Logs

Related Issues

Additional Notes

Fingerprint Lifecycle

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gltanaka commented Jan 20, 2026

Uh oh!

Enfoirer commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

gltanaka commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enfoirer commented Jan 18, 2026 •

edited

Loading