Skip to content

feat(sync): auto-update tests when prompt changes (#203)#328

Open
Enfoirer wants to merge 4 commits intopromptdriven:mainfrom
Enfoirer:feature/issue-203-auto-update-tests-sync
Open

feat(sync): auto-update tests when prompt changes (#203)#328
Enfoirer wants to merge 4 commits intopromptdriven:mainfrom
Enfoirer:feature/issue-203-auto-update-tests-sync

Conversation

@Enfoirer
Copy link
Contributor

@Enfoirer Enfoirer commented Jan 18, 2026

Description

Adds automatic detection of stale tests during sync by tracking which prompt version tests were generated from. When the prompt changes and code is regenerated, sync now automatically triggers test regeneration to keep tests in sync with the latest prompt.

Fixes #203

Changes Made

Code Changes

pdd/sync_determine_operation.py:

  • Added test_prompt_hash: Optional[str] field to Fingerprint dataclass to track which prompt version tests were generated from
  • Updated read_fingerprint() to load test_prompt_hash from fingerprint JSON files
  • Added stale test detection logic in _perform_sync_analysis(): when test_prompt_hash != current_prompt_hash, returns SyncDecision(operation='test') with reason "Tests outdated - generated from old prompt version"

pdd/sync_orchestration.py:

  • Updated _save_fingerprint_atomic() to set test_prompt_hash based on operation type:
    • generate: sets test_prompt_hash=None (code regenerated, tests are now stale)
    • test: sets test_prompt_hash=current_prompt_hash (tests regenerated, linked to current prompt)
    • Other operations: preserves existing test_prompt_hash value

pdd/operation_log.py:

  • Updated save_fingerprint() to automatically manage test_prompt_hash based on operation type
  • This ensures manual commands (pdd generate, pdd test, etc.) correctly handle test_prompt_hash
  • Logic mirrors _save_fingerprint_atomic() for consistency

Test Changes

tests/test_sync_determine_operation.py: Added 8 new tests for issue #203

  • TestIssue203FingerprintTestPromptHash:
    • test_fingerprint_has_test_prompt_hash_field
    • test_fingerprint_test_prompt_hash_defaults_to_none
    • test_fingerprint_serialization_includes_test_prompt_hash
  • TestIssue203ReadFingerprintTestPromptHash:
    • test_read_fingerprint_with_test_prompt_hash
    • test_read_fingerprint_backward_compat_without_test_prompt_hash
  • TestIssue203StaleTestDetection:
    • test_detects_stale_tests_when_test_prompt_hash_differs
    • test_no_stale_test_detection_when_test_prompt_hash_matches
    • test_no_stale_test_detection_when_test_prompt_hash_is_none

tests/test_sync_orchestration.py: Added 5 new tests for issue #203

  • TestIssue203SaveOperationFingerprintTestPromptHash:
    • test_generate_operation_sets_test_prompt_hash_to_none
    • test_test_operation_sets_test_prompt_hash_to_current
    • test_fix_operation_preserves_test_prompt_hash
    • test_generate_then_test_workflow
    • test_skip_operation_preserves_test_prompt_hash_without_atomic_state

tests/test_operation_log.py: Added 5 new tests for issue #203

  • TestIssue203SaveFingerprintTestPromptHash:
    • test_generate_operation_sets_test_prompt_hash_to_none
    • test_test_operation_sets_test_prompt_hash_to_current
    • test_example_operation_preserves_test_prompt_hash
    • test_fix_operation_preserves_test_prompt_hash
    • test_explicit_test_prompt_hash_overrides_auto_logic

Prompt Changes

added the prompt for the new feature

Testing

Test Results

tests/test_sync_determine_operation.py: 86 passed
tests/test_sync_orchestration.py:       95 passed
tests/test_operation_log.py:            28 passed
Total: 209 passed in 64.29s

Test Coverage

File Coverage Statements Missed
pdd/operation_log.py 92% 156 13
pdd/sync_orchestration.py 68% 849 268
pdd/sync_determine_operation.py ~13% 689 -

Note:

Issue #203 Code Coverage Detail

Component Status
Fingerprint.test_prompt_hash field ✅ Covered
read_fingerprint() loading test_prompt_hash ✅ Covered
_perform_sync_analysis() stale detection ✅ Covered
_save_fingerprint_atomic() test_prompt_hash logic ✅ Covered
save_fingerprint() auto test_prompt_hash handling ✅ Covered

Testing Environment

  • Platform: macOS Darwin 24.6.0
  • Python: 3.12.12 / 3.13.7
  • pytest: 9.0.1

Manual Test Results (E2E)

Test Case Result Evidence
Prompt change triggers test regeneration Sync log shows: generate → verify → test → nothing
Fingerprint updated correctly test_prompt_hash equals prompt_hash after sync
Cost tracking ~$0.0519 (model: gpt-4o)

E2E Test Details:

  • Created temp project at tmp_issue203_e2e
  • Updated e2e_issue203_python.prompt to simulate a prompt change
  • Ran: python -m pdd.cli --force --local --context default sync e2e_issue203 --target-coverage 0 --max-attempts 1 --budget 5
  • Result: Sync completed successfully, fingerprint shows test_prompt_hash == prompt_hash

Dev Unit Checklist

  • Code: Updated implementation (3 files)
  • Tests: Added comprehensive test coverage (18 new tests)
  • Prompt: Added the prompt for the new feature
  • Example: N/A (no new user-facing API)

Regression Testing

  • All existing sync tests pass (209 tests)
  • No changes to public API (additive only)
  • Backward compatible (existing fingerprints without test_prompt_hash handled gracefully)

Regression Test Results

Test Description Status
0 --list-contexts and --context
1 generate command
2 example command
3 preprocess command
4 update command
5 change command ❌ (pre-existing issue)
6 crash command
7 verify command
8 test command
9 fix command
10 split command
11 detect command
12 conflicts command
13 trace command
14 bug command
15 auto-deps command
16 Global Options
18-21 Other tests

Note on Test 5 failure: The change command test fails with "Agentic mode requires exactly 1 argument: ISSUE_URL". This is a pre-existing issue unrelated to this PR - the change command API was modified but the regression test script was not updated.

Screenshots/Logs

N/A - Behavior verified through unit tests and E2E testing.

Related Issues

Fixes #203

Additional Notes

This fix improves the sync workflow by:

  1. Automatic stale test detection: Tests are automatically regenerated when prompt changes
  2. Fingerprint tracking: test_prompt_hash field tracks which prompt version tests were generated from
  3. Backward compatible: Existing fingerprints without test_prompt_hash are handled gracefully (treated as unknown state)
  4. Follows PDD philosophy: Ensures tests stay in sync with prompt (the source of truth)

Fingerprint Lifecycle

Operation test_prompt_hash Value
generate None (tests now stale)
test Current prompt hash
fix, verify, etc. Preserved from previous

@Enfoirer Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch 4 times, most recently from 25f6ec3 to 57cd218 Compare January 18, 2026 16:30
@gltanaka gltanaka requested a review from Copilot January 18, 2026 19:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic test regeneration when prompts change by tracking the prompt version that tests were generated from. When code is regenerated from an updated prompt, the system now detects that tests are stale and automatically triggers test regeneration during sync operations.

Changes:

  • Added test_prompt_hash field to Fingerprint dataclass to track which prompt version tests were generated from
  • Implemented stale test detection logic that compares test_prompt_hash with current prompt hash
  • Added automatic test_prompt_hash management in fingerprint save operations based on operation type (generate/test/other)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pdd/sync_determine_operation.py Added test_prompt_hash field to Fingerprint, updated read_fingerprint() to load it, and added stale test detection in _perform_sync_analysis()
pdd/sync_orchestration.py Updated _save_fingerprint_atomic() to automatically set test_prompt_hash based on operation type
pdd/operation_log.py Updated save_fingerprint() to mirror atomic fingerprint logic for managing test_prompt_hash
tests/test_sync_determine_operation.py Added 8 new tests covering Fingerprint field, read operations, and stale test detection
tests/test_sync_orchestration.py Added 5 new tests for fingerprint atomic operations
tests/test_operation_log.py Added 5 new tests for fingerprint save operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Enfoirer Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch from 45fc5da to 44916cf Compare January 19, 2026 15:54
Add test_prompt_hash field to Fingerprint to track which prompt version
tests were generated from. When prompt changes and code is regenerated,
sync now detects stale tests and triggers test regeneration.

- Add test_prompt_hash field to Fingerprint dataclass
- Update read_fingerprint() to load test_prompt_hash from JSON
- Add stale test detection in _perform_sync_analysis()
- Update _save_operation_fingerprint() to set test_prompt_hash based on operation:
  - generate: sets to None (tests now stale)
  - test: sets to current prompt hash
  - other ops: preserves existing value
- Add 12 unit tests covering the new functionality
- Rename test method for clarity: test_skip_operation -> test_skip_test_operation
- Simplify test class name: TestIssue203SaveFingerprintTestPromptHash -> TestIssue203TestPromptHashManagement
- Consolidate duplicate read_fingerprint imports in _save_fingerprint_atomic
@Enfoirer Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch from 44916cf to 5451719 Compare January 19, 2026 16:00
@gltanaka
Copy link
Contributor

prompt:
write a python program to print hello and bye

test:
test__
pytest failure as a screen

LLM needed?

  1. determine what test need to change -> pytest
  2. test changes- > fix
  3. incremental new tests for new feature -> _LLM changes needed?

promptdriven#349 tests

The conflict was in tests/test_sync_determine_operation.py where both
issue promptdriven#203 (stale test detection) and issue promptdriven#349 (infinite loop fix)
added new test classes at the end of the file. Both sets are retained.
@Enfoirer Enfoirer force-pushed the feature/issue-203-auto-update-tests-sync branch from 2769a56 to d51ffc2 Compare January 26, 2026 15:59
@Enfoirer
Copy link
Contributor Author

Step 1 (detect): No LLM — pure hash comparison (test_prompt_hash ≠ current_prompt_hash).
Step 2 (regenerate/fix): LLM needed — cmd_test_main() calls LLM to regenerate tests; fix_main() calls
LLM if tests fail.
Step 3 (new feature tests): No LLM template changes — the existing generic test generation prompt
handles new features automatically since the feature requirements come from the source prompt.

@Enfoirer Enfoirer marked this pull request as ready for review January 26, 2026 15:59
@gltanaka gltanaka requested a review from Copilot January 26, 2026 17:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gltanaka
Copy link
Contributor

@Enfoirer can you want to address the github comments?

Rename test_skip_test_operation_preserves_test_prompt_hash_without_atomic_state
to test_skip_prefixed_operation_preserves_test_prompt_hash_without_atomic_state
to better reflect that it tests skip:-prefixed operations (like skip:test).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically update tests based on prompt changes during sync

2 participants