Skip to content

Commit 1870cfe

Browse files
authored
fix: harden backlog refine prompt scaffold and mixed-format parsing (#228)
* fix: harden backlog refine prompt scaffold and parsing * fix: normalize mixed notes parsing and boundary flushing --------- Co-authored-by: Dominikus Nold <djm81@users.noreply.github.com>
1 parent f9f2fcc commit 1870cfe

File tree

10 files changed

+290
-3
lines changed

10 files changed

+290
-3
lines changed

openspec/CHANGE_ORDER.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ All entries in the table below are pending implementation.
6060
|--------|-------|---------------|----------|------------|
6161
| backlog-core | 01 | backlog-core-01-dependency-analysis-commands | [#116](https://github.com/nold-ai/specfact-cli/issues/116) ||
6262
| backlog-core | 02 | backlog-core-02-interactive-issue-creation | [#173](https://github.com/nold-ai/specfact-cli/issues/173) | #116 (optional: #176, #177) |
63-
| backlog-core | 03 | backlog-core-03-refine-writeback-field-splitting | #225 ||
63+
| backlog-core | 03 | backlog-core-03-refine-writeback-field-splitting | #225, #227 ||
6464

6565
### backlog-scrum
6666

openspec/changes/backlog-core-03-refine-writeback-field-splitting/CHANGE_VALIDATION.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,19 @@ No public CLI signature changes or adapter method signature changes were introdu
7171
- `hatch test -- tests/unit/commands/test_backlog_commands.py -k TestParseRefinementOutputFields -v`: Pass (`5 passed`) after parser fix.
7272
- `hatch test -- tests/unit/adapters/test_ado_backlog_adapter.py tests/unit/adapters/test_github_backlog_adapter.py -v`: Pass (`35 passed`).
7373

74+
## User Report Follow-up Validation (Prompt + Mixed Format Parsing, 2026-02-12)
75+
76+
- `hatch test -- tests/unit/commands/test_backlog_commands.py -k mixed_heading_and_inline_notes -v`: Failing pre-implementation evidence captured.
77+
- `hatch test -- tests/unit/backlog/test_ai_refiner.py -k "expected_output_scaffold or omit_unknown_metadata_fields" -v`: Failing pre-implementation evidence captured.
78+
- `hatch test -- tests/unit/commands/test_backlog_commands.py -k TestParseRefinementOutputFields -v`: Pass (`6 passed`) after parser and prompt fixes.
79+
- `hatch test -- tests/unit/backlog/test_ai_refiner.py -k generate_refinement_prompt -v`: Pass (`5 passed`).
80+
- `hatch test -- tests/unit/adapters/test_ado_backlog_adapter.py tests/unit/adapters/test_github_backlog_adapter.py -v`: Pass (`35 passed`).
81+
82+
## Review Follow-up Validation (Notes Duplication + Internal Heading Preservation, 2026-02-12)
83+
84+
- `hatch test -- tests/unit/commands/test_backlog_commands.py -k "mixed_heading_and_inline_notes_preserves_description_before_notes or label_notes_with_internal_heading_keeps_heading_content" -v`: Failing pre-implementation evidence captured, then pass (`2 passed`) after parser fix.
85+
- `hatch test -- tests/unit/commands/test_backlog_commands.py -k TestParseRefinementOutputFields -v`: Pass (`7 passed`).
86+
7487
## Next Steps
7588

7689
1. Complete implementation and tests per `tasks.md`.

openspec/changes/backlog-core-03-refine-writeback-field-splitting/TDD_EVIDENCE.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,64 @@
8080
- **Summary**:
8181
- `5 passed`
8282
- Includes regression coverage for label-only field blocks without `Description:`.
83+
84+
## User Report Follow-up: Prompt Scaffold + Mixed Format Parsing
85+
86+
### Pre-Implementation Failing Run
87+
88+
- **Timestamp**: 2026-02-12T11:06:00+01:00
89+
- **Command**: `hatch test -- tests/unit/commands/test_backlog_commands.py -k mixed_heading_and_inline_notes -v`
90+
- **Result**: Failed
91+
- **Failure Summary**:
92+
- `test_mixed_heading_and_inline_notes_preserves_description_before_notes` failed.
93+
- Parser dropped pre-notes description narrative and kept only content starting at inline `**Notes**:`.
94+
95+
- **Timestamp**: 2026-02-12T11:06:00+01:00
96+
- **Command**: `hatch test -- tests/unit/backlog/test_ai_refiner.py -k "expected_output_scaffold or omit_unknown_metadata_fields" -v`
97+
- **Result**: Failed
98+
- **Failure Summary**:
99+
- Prompt did not include explicit output scaffold instructions.
100+
- Prompt did not include instruction to omit unknown metadata fields/placeholders.
101+
102+
### Post-Implementation Passing Run
103+
104+
- **Timestamp**: 2026-02-12T11:08:00+01:00
105+
- **Command**: `hatch test -- tests/unit/commands/test_backlog_commands.py -k TestParseRefinementOutputFields -v`
106+
- **Result**: Passed
107+
- **Summary**:
108+
- `6 passed`
109+
- Includes mixed heading + inline notes regression.
110+
111+
- **Timestamp**: 2026-02-12T11:08:00+01:00
112+
- **Command**: `hatch test -- tests/unit/backlog/test_ai_refiner.py -k generate_refinement_prompt -v`
113+
- **Result**: Passed
114+
- **Summary**:
115+
- `5 passed`
116+
- Includes prompt scaffold and metadata omission instruction coverage.
117+
118+
## Review Follow-up: Notes Duplication + Internal Heading Truncation
119+
120+
### Pre-Implementation Failing Run
121+
122+
- **Timestamp**: 2026-02-12T17:47:00+01:00
123+
- **Command**: `hatch test -- tests/unit/commands/test_backlog_commands.py -k "mixed_heading_and_inline_notes_preserves_description_before_notes or label_notes_with_internal_heading_keeps_heading_content" -v`
124+
- **Result**: Failed
125+
- **Failure Summary**:
126+
- `test_mixed_heading_and_inline_notes_preserves_description_before_notes` failed because raw `**Notes**:` markup and notes text were duplicated in `body_markdown`.
127+
- `test_label_notes_with_internal_heading_keeps_heading_content` failed because notes content was truncated at internal `## Risks` heading.
128+
129+
### Post-Implementation Passing Run
130+
131+
- **Timestamp**: 2026-02-12T17:53:00+01:00
132+
- **Command**: `hatch test -- tests/unit/commands/test_backlog_commands.py -k "mixed_heading_and_inline_notes_preserves_description_before_notes or label_notes_with_internal_heading_keeps_heading_content" -v`
133+
- **Result**: Passed
134+
- **Summary**:
135+
- `2 passed`
136+
- Duplicate inline-notes markup removed from description output; internal notes headings preserved.
137+
138+
- **Timestamp**: 2026-02-12T17:54:00+01:00
139+
- **Command**: `hatch test -- tests/unit/commands/test_backlog_commands.py -k TestParseRefinementOutputFields -v`
140+
- **Result**: Passed
141+
- **Summary**:
142+
- `7 passed`
143+
- Full parser regression suite passes including mixed-format and internal-heading cases.

openspec/changes/backlog-core-03-refine-writeback-field-splitting/proposal.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,15 @@
33
## Why
44

55

6+
67
`specfact backlog refine --write` currently applies the raw copilot response as `body_markdown` and does not parse structured refinement output back into canonical fields before adapter writeback. For Azure DevOps this causes `System.Description` to receive a verbatim payload containing labels like `Description`, `Acceptance Criteria`, `Story Points`, `Business Value`, `Priority`, `Area Path`, and provider markers instead of updating separate fields. GitHub can exhibit the same issue when copilot output uses label-style sections instead of markdown headings.
78

89
This breaks the provider-aware contract implied by refinement prompts and produces low-quality remote item updates.
910

1011
## What Changes
1112

1213

14+
1315
- **MODIFY**: Backlog refine write path to parse structured refinement content into canonical fields (`description`, `acceptance_criteria`, `story_points`, `business_value`, `priority`, `work_item_type`) before writeback.
1416
- **MODIFY**: Normalize label-style refinement output to canonical markdown sections so both ADO and GitHub writeback paths behave deterministically.
1517
- **MODIFY**: ADO/GitHub writeback behavior to prefer parsed refined values over stale pre-refinement values when `--write` is used.
@@ -19,12 +21,23 @@ This breaks the provider-aware contract implied by refinement prompts and produc
1921
## Capabilities
2022
- **backlog-refinement**: Provider-aware parsing and canonical field splitting for `specfact backlog refine --write`.
2123

24+
2225
---
2326

2427
## Source Tracking
2528

26-
<!-- source_repo: nold-ai/specfact-cli -->
29+
### Repository: nold-ai/specfact-cli
30+
2731
- **GitHub Issue**: #225
2832
- **Issue URL**: <https://github.com/nold-ai/specfact-cli/issues/225>
2933
- **Last Synced Status**: proposed
3034
- **Sanitized**: false
35+
36+
---
37+
38+
### Repository: nold-ai/specfact-cli
39+
40+
- **GitHub Issue**: #227
41+
- **Issue URL**: <https://github.com/nold-ai/specfact-cli/issues/227>
42+
- **Last Synced Status**: proposed
43+
- **Sanitized**: false

openspec/changes/backlog-core-03-refine-writeback-field-splitting/specs/backlog-refinement/spec.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,36 @@ The system SHALL parse structured refinement output into canonical fields before
4848
- **AND** parser fallback does not keep the entire raw labeled payload as `description`
4949
- **AND** `body_markdown` does not contain prompt labels verbatim.
5050

51+
#### Scenario: Mixed heading and inline label formatting preserves description narrative
52+
53+
- **GIVEN** a refined output that uses heading-style sections such as `## Description` and `## Acceptance Criteria`
54+
- **AND** the `## Description` section contains an inline label like `**Notes**:`
55+
- **WHEN** SpecFact parses the refinement output for writeback
56+
- **THEN** text before the inline label in `## Description` is preserved in `body_markdown`
57+
- **AND** label-capture does not swallow subsequent heading sections.
58+
59+
#### Scenario: Prompt format contract includes canonical scaffold and metadata omission rule
60+
61+
- **GIVEN** SpecFact generates a refinement prompt for IDE Copilot
62+
- **WHEN** prompt text is rendered for backlog refine
63+
- **THEN** it includes an explicit expected output scaffold (ordered canonical sections)
64+
- **AND** it instructs Copilot to omit unknown metadata fields (for example area/iteration path) instead of placeholders like "unspecified" or "provide ...".
65+
66+
#### Scenario: Mixed heading description does not duplicate inline notes block
67+
68+
- **GIVEN** a refined output with `## Description` narrative
69+
- **AND** an inline label block like `**Notes**:` appears inside that description section
70+
- **WHEN** SpecFact parses the refinement output for writeback
71+
- **THEN** description narrative is preserved without raw inline label markup
72+
- **AND** notes content appears only once in normalized `## Notes` output.
73+
74+
#### Scenario: Label-style notes preserves internal non-boundary headings
75+
76+
- **GIVEN** a label-style `Notes:` section includes internal headings such as `## Risks`
77+
- **WHEN** SpecFact parses notes/dependencies label blocks
78+
- **THEN** internal headings that are not canonical section boundaries are preserved as notes content
79+
- **AND** parser does not truncate notes at the internal heading line.
80+
5181
#### Scenario: Refine command orchestration remains behaviorally consistent after decomposition
5282

5383
- **GIVEN** `specfact backlog refine` supports initialization, filtering, export/import, interactive refinement, writeback, and summary flows

openspec/changes/backlog-core-03-refine-writeback-field-splitting/tasks.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@
2727
- [x] 4.7 Preserve heading-style `## Notes` / `## Dependencies` sections in parsed `body_markdown` for writeback
2828
- [x] 4.8 Match heading-style `## Notes` / `## Dependencies` sections case-insensitively during parser writeback normalization
2929
- [x] 4.9 Prevent label-only refinement output (without `Description:`) from leaking raw prompt labels into fallback description/body fields
30+
- [x] 4.10 Preserve mixed-format `## Description` narrative content when inline label-style markers (for example `**Notes**:`) appear before later heading sections
31+
- [x] 4.11 Strengthen Copilot prompt with explicit output scaffold and "omit unknown metadata fields" rule (no placeholder values)
32+
- [x] 4.12 Strip inline label markers from retained heading-style description content to avoid duplicated `Notes` blocks in normalized body output
33+
- [x] 4.13 Restrict label-block heading flush to canonical section boundaries so internal headings (for example `## Risks`) remain inside notes/dependencies content
3034

3135
## 5. Quality Gates
3236

src/specfact_cli/backlog/ai_refiner.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,28 @@ def generate_refinement_prompt(
172172
9. Provider-aware formatting:
173173
- **GitHub**: Use markdown headings in body (## Section Name)
174174
- **ADO**: Use markdown headings in body (will be mapped to separate ADO fields during writeback)
175+
10. Omit unknown metadata fields instead of placeholders (do not emit values like "unspecified", "no info provided", or "provide area path")
176+
11. Keep `## Description` focused on narrative body content; do not place metadata labels in description text.
177+
178+
Expected Output Scaffold (ordered):
179+
## Work Item Properties / Metadata
180+
- Story Points: <number, omit line if unknown>
181+
- Business Value: <number, omit line if unknown>
182+
- Priority: <number, omit line if unknown>
183+
- Work Item Type: <type, omit line if unknown>
184+
185+
## Description
186+
<main story narrative/body only>
187+
188+
## Acceptance Criteria
189+
- [ ] <criterion>
190+
191+
## Notes
192+
<optional; include only for ambiguity/risk/dependency context>
193+
194+
Metadata scaffold rules:
195+
- The `## Work Item Properties / Metadata` section is optional; include only when at least one metadata value is known.
196+
- Omit unknown metadata fields and do not emit placeholders.
175197
176198
Return ONLY the refined backlog item body content in markdown format. Do not include any explanations or metadata."""
177199
return prompt.strip()

src/specfact_cli/modules/backlog/src/commands.py

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1002,6 +1002,24 @@ def _build_refine_export_content(
10021002
export_content += "7. Include story points, business value, priority, and work item type when available\n"
10031003
export_content += "8. For high-complexity stories, suggest splitting when appropriate\n"
10041004
export_content += "9. Follow provider-aware formatting guidance listed per item\n\n"
1005+
export_content += "**Expected Output Scaffold (ordered):**\n"
1006+
export_content += "```markdown\n"
1007+
export_content += "## Work Item Properties / Metadata\n"
1008+
export_content += "- Story Points: <number, omit line if unknown>\n"
1009+
export_content += "- Business Value: <number, omit line if unknown>\n"
1010+
export_content += "- Priority: <number, omit line if unknown>\n"
1011+
export_content += "- Work Item Type: <type, omit line if unknown>\n\n"
1012+
export_content += "## Description\n"
1013+
export_content += "<main story narrative/body only>\n\n"
1014+
export_content += "## Acceptance Criteria\n"
1015+
export_content += "- [ ] <criterion>\n\n"
1016+
export_content += "## Notes\n"
1017+
export_content += "<optional; include only for ambiguity/risk/dependency context>\n"
1018+
export_content += "```\n\n"
1019+
export_content += (
1020+
"Omit unknown metadata fields and never emit placeholders such as "
1021+
"`(unspecified)`, `no info provided`, or `provide area path`.\n\n"
1022+
)
10051023
export_content += "---\n\n"
10061024
comments_map = comments_by_item_id or {}
10071025
template_map = template_guidance_by_item_id or {}
@@ -1526,13 +1544,26 @@ def _parse_refinement_output_fields(content: str) -> dict[str, Any]:
15261544
if isinstance(value, int):
15271545
parsed[key] = value
15281546

1547+
def _has_heading_section(section_name: str) -> bool:
1548+
return bool(
1549+
re.search(
1550+
rf"^##+\s+{re.escape(section_name)}\s*$",
1551+
normalized,
1552+
re.MULTILINE | re.IGNORECASE,
1553+
)
1554+
)
1555+
15291556
def _extract_heading_section(section_name: str) -> str:
15301557
pattern = rf"^##+\s+{re.escape(section_name)}\s*$\n(.*?)(?=^##|\Z)"
15311558
match = re.search(pattern, normalized, re.MULTILINE | re.DOTALL | re.IGNORECASE)
15321559
if not match:
15331560
return ""
15341561
return match.group(1).strip()
15351562

1563+
heading_description = _extract_heading_section("Description")
1564+
if heading_description and not (parsed.get("description") or "").strip():
1565+
parsed["description"] = heading_description
1566+
15361567
# Then parse label-style blocks; explicit labels override heading heuristics.
15371568
label_aliases = {
15381569
"description": "description",
@@ -1547,11 +1578,24 @@ def _extract_heading_section(section_name: str) -> str:
15471578
"iteration path": "iteration_path",
15481579
"provider": "provider",
15491580
}
1581+
canonical_heading_boundaries = {
1582+
*label_aliases.keys(),
1583+
"work item properties / metadata",
1584+
"work item properties",
1585+
"metadata",
1586+
}
15501587
label_pattern = re.compile(r"^\s*(?:[-*]\s*)?(?:\*\*)?([A-Za-z][A-Za-z0-9 ()/_-]*?)(?:\*\*)?\s*:\s*(.*)\s*$")
15511588
blocks: dict[str, str] = {}
15521589
current_key: str | None = None
15531590
current_lines: list[str] = []
15541591

1592+
def _is_canonical_heading_boundary(line: str) -> bool:
1593+
heading_match = re.match(r"^\s*##+\s+(.+?)\s*$", line)
1594+
if not heading_match:
1595+
return False
1596+
heading_name = re.sub(r"\s+", " ", heading_match.group(1).strip().strip("#")).lower()
1597+
return heading_name in canonical_heading_boundaries
1598+
15551599
def _flush_current() -> None:
15561600
nonlocal current_key, current_lines
15571601
if current_key is None:
@@ -1562,6 +1606,10 @@ def _flush_current() -> None:
15621606
current_lines = []
15631607

15641608
for line in normalized.splitlines():
1609+
# Stop label-style block capture only at canonical section-heading boundaries.
1610+
if current_key is not None and _is_canonical_heading_boundary(line):
1611+
_flush_current()
1612+
continue
15651613
match = label_pattern.match(line)
15661614
if match:
15671615
candidate = re.sub(r"\s+", " ", match.group(1).strip().lower())
@@ -1576,11 +1624,29 @@ def _flush_current() -> None:
15761624
current_lines.append(line.rstrip())
15771625
_flush_current()
15781626

1579-
if blocks and not blocks.get("description"):
1627+
if blocks and not blocks.get("description") and not _has_heading_section("Description"):
15801628
# If label-style blocks are present but no explicit Description block exists,
15811629
# do not keep the heading parser fallback description (it may contain raw labels).
15821630
parsed.pop("description", None)
15831631

1632+
if _has_heading_section("Description") and not blocks.get("description") and parsed.get("description"):
1633+
# In mixed heading output, trim inline label-style suffix blocks from description
1634+
# to avoid duplicating notes/dependencies in normalized body output.
1635+
description_lines: list[str] = []
1636+
for line in str(parsed["description"]).splitlines():
1637+
inline_match = label_pattern.match(line)
1638+
if inline_match:
1639+
candidate = re.sub(r"\s+", " ", inline_match.group(1).strip().lower())
1640+
canonical = label_aliases.get(candidate)
1641+
if canonical and canonical != "description":
1642+
break
1643+
description_lines.append(line.rstrip())
1644+
cleaned_heading_description = "\n".join(description_lines).strip()
1645+
if cleaned_heading_description:
1646+
parsed["description"] = cleaned_heading_description
1647+
else:
1648+
parsed.pop("description", None)
1649+
15841650
if blocks.get("description"):
15851651
parsed["description"] = blocks["description"]
15861652
if blocks.get("acceptance_criteria"):

tests/unit/backlog/test_ai_refiner.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,26 @@ def test_generate_refinement_prompt_mentions_no_comments_when_empty(
9090
prompt = refiner.generate_refinement_prompt(arbitrary_backlog_item, user_story_template, comments=[])
9191
assert "No comments found" in prompt
9292

93+
@beartype
94+
def test_generate_refinement_prompt_includes_expected_output_scaffold(
95+
self, refiner: BacklogAIRefiner, arbitrary_backlog_item: BacklogItem, user_story_template: BacklogTemplate
96+
) -> None:
97+
"""Prompt includes canonical output scaffold for Copilot consistency."""
98+
prompt = refiner.generate_refinement_prompt(arbitrary_backlog_item, user_story_template)
99+
assert "Expected Output Scaffold" in prompt
100+
assert "## Work Item Properties / Metadata" in prompt
101+
assert "## Description" in prompt
102+
assert "## Acceptance Criteria" in prompt
103+
104+
@beartype
105+
def test_generate_refinement_prompt_instructs_to_omit_unknown_metadata_fields(
106+
self, refiner: BacklogAIRefiner, arbitrary_backlog_item: BacklogItem, user_story_template: BacklogTemplate
107+
) -> None:
108+
"""Prompt instructs omitting unknown metadata fields instead of placeholders."""
109+
prompt = refiner.generate_refinement_prompt(arbitrary_backlog_item, user_story_template)
110+
assert "omit unknown metadata fields" in prompt.lower()
111+
assert "do not emit placeholders" in prompt.lower()
112+
93113
@beartype
94114
def test_validate_and_score_complete_refinement(
95115
self, refiner: BacklogAIRefiner, arbitrary_backlog_item: BacklogItem, user_story_template: BacklogTemplate

0 commit comments

Comments
 (0)