🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness by sealfe · Pull Request #524 · 1712n/dn-institute

sealfe · 2026-02-14T12:23:13Z

🌰 Summary

This PR delivers a comprehensive overhaul of the article-check-claude QA bot, targeting three key areas: prompt engineering, multi-file PR handling, and infrastructure robustness. Every change is designed to produce more actionable, accurate, and structured review feedback for Crypto Attack Wiki submissions.

Changes Overview

1. Prompt Engineering — Complete Rewrite (client.py)

The most impactful change: all three core prompts have been rewritten from scratch with crypto-incident-specific guidance.

EXTRACTING_PROMPT (Statement Extraction)

Before: Generic "extract important statements" instruction
After: Domain-specific guidance to extract verifiable factual claims focusing on:
- Specific monetary amounts and losses
- Dates and timestamps
- Named entities, protocols, and individuals
- Blockchain addresses and transaction hashes
- Attack method descriptions
- Protocol response claims (pauses, bounties, post-mortems)
Targets 5-15 high-value statements per article (was unbounded)

RETRIEVAL_PROMPT (Fact-Checking Search)

Before: Generic search instructions
After: Crypto-specific search query best practices:
- Include entity name AND year for precision
- Search for incident name + "hack"/"exploit" + amount for verification
- Direct address lookups for on-chain verification
- Iterative query refinement guidance
Emphasis on exact number matching (e.g., $20M vs $20.5M)
Date precision verification (month/day/year must all match)

ANSWER_PROMPT (Final Report Generation)

Before: 4 loosely defined checks mixed together
After: 7 distinct, structured validation sections:

Section	What It Checks
1. Fact-Check Results	Statement-by-statement verification with ✅/❌/⚠️ and sources
2. Metadata Validation	All 6 required YAML fields present and correct
3. Section Structure	Exactly 5 required sections in correct order
4. Filename Validation	`YYYY-MM-DD-Entity-Name.md` format compliance
5. Timeline Format	`Month DD, YYYY, HH:MM AM/PM UTC:` format
6. Content Quality	References, completeness, objectivity, blockchain evidence
7. Hugo SSG Formatting	Valid Markdown and YAML for Hugo

This structured approach ensures reviewers get consistent, actionable feedback rather than free-form text.

2. Multi-File PR Support (article_checker_claude.py)

Critical bug fix: The original bot only processed diff[0] — meaning only the first file in any PR was checked.

Now iterates over all files in the PR diff
Filters to article files matching configurable ARTICLE_PATH_PATTERN
Skips non-article files (code, config, images) with logged messages
Pre-validates metadata structure before sending to LLM (saves API costs for malformed submissions)
Generates per-file reports aggregated into a single PR comment
Properly extracts filenames from diff headers for validation

3. Model & Configuration Upgrades (config.json)

Parameter	Before	After	Why
Search Model	claude-3-opus-20240229	claude-sonnet-4-20250514	Better accuracy, lower cost
Summarize Model	claude-3-haiku-20240307	claude-3-5-haiku-20241022	Improved quality
Search Max Tokens	4000	8000	Room for thorough 7-section analysis
Summarize Max Tokens	512	1024	Better article extraction
Search Results	1 (hardcoded)	3 (configurable)	Multiple corroborating sources
Max Searches	5 (hardcoded)	8 (configurable)	More thorough fact-checking

New configuration fields:

VALID_ATTACK_TYPES: 21 recognized attack categories for validation
VALID_ENTITY_TYPES: 18 recognized entity types
VALID_METADATA_HEADERS: Required frontmatter fields
VALID_SECTION_HEADERS: Required article sections
ARTICLE_PATH_PATTERN: Configurable article path filter

4. Robustness Improvements

websearch.py

Safe asyncio event loop handling: _get_or_create_event_loop() helper for Python 3.10+ compatibility
Scrape retry logic: Falls back to Brave snippet when page scraping fails (max 2 retries with backoff)
Graceful async gather: Catches per-task exceptions instead of failing the entire batch
Bounds checking: Prevents IndexError on empty result lists

utils.py

HTML cleanup: Strips <script>, <style>, <nav>, <footer>, <header>, <aside>, <noscript> before text extraction
Content length limit: Truncates scraped content to 50K chars to prevent token overflow
Explicit timeout: Uses aiohttp.ClientTimeout with 15s limit
Separate timeout handling: Catches asyncio.TimeoutError distinctly

client.py

Safe statement count parsing: Falls back to 5 if number_of_statements parsing fails
Per-search error handling: Individual search failures don't crash the pipeline
Answer fallback: If <answer> tag extraction fails, retries with raw response
Increased extract tokens: 1000 → 2000 for statement extraction

5. Workflow Updates (article-check-claude.yml)

actions/checkout@v3 → v4
actions/cache@v3 → v4
Added actions/setup-python@v5 with explicit Python 3.11
Improved cache configuration with restore-keys fallback

Files Changed

File	Changes
`.github/workflows/article-check-claude.yml`	Workflow modernization
`tools/article_checker/article_checker_claude.py`	Multi-file support, pre-validation, structured output
`tools/article_checker/claude_retriever/client.py`	Complete prompt rewrite, error handling
`tools/article_checker/claude_retriever/searcher/searchtools/websearch.py`	Asyncio fix, retry logic, fallbacks
`tools/article_checker/claude_retriever/utils.py`	HTML cleanup, timeout handling, retry
`tools/article_checker/config.json`	Model upgrade, configurable params, validation data

Testing

All Python files pass syntax validation (ast.parse)
YAML workflow file validates correctly
Config JSON loads without errors
Backward compatible with single-file PRs
Non-article files properly filtered

Fixes #408

🌰

Major improvements to the article-check-claude QA bot: Prompt Engineering (client.py): - Rewritten EXTRACTING_PROMPT with crypto-specific guidance for better statement extraction (focus on verifiable claims with amounts, dates, entities) - Rewritten RETRIEVAL_PROMPT with search query best practices for crypto incidents (entity+year, amount verification, date precision) - Completely restructured ANSWER_PROMPT into 7 distinct validation sections: 1. Fact-check results with source attribution 2. Metadata validation (all 6 required fields) 3. Section structure check (5 required sections) 4. Filename validation (YYYY-MM-DD-Entity.md format) 5. Timeline format check (Month DD, YYYY, HH:MM AM/PM UTC) 6. Content quality assessment (references, completeness, objectivity) 7. Hugo SSG formatting check Multi-file PR Support (article_checker_claude.py): - Process ALL article files in PR, not just diff[0] - Filter files by article path pattern (configurable) - Pre-validate metadata structure before LLM calls (saves API costs) - Structured per-file reporting with aggregate summary - Extract and validate filenames from diff headers Model & Config Upgrades (config.json): - Search model: claude-3-opus -> claude-sonnet-4-20250514 - Summarize model: claude-3-haiku -> claude-3-5-haiku - Increased max tokens: 4000 -> 8000 for more thorough analysis - Increased summarize tokens: 512 -> 1024 - Added configurable N_SEARCH_RESULTS_TO_USE (3) and MAX_SEARCHES_TO_TRY (8) - Added validation reference data: VALID_ATTACK_TYPES, VALID_ENTITY_TYPES, VALID_METADATA_HEADERS, VALID_SECTION_HEADERS, ARTICLE_PATH_PATTERN Robustness (websearch.py, utils.py): - Safe asyncio event loop handling (Python 3.10+ compatible) - Retry logic with backoff for web scraping (max_retries=2) - Fallback to Brave snippet when scraping fails - Graceful exception handling in async gather - Strip non-content HTML elements (script, style, nav, footer, etc.) - Explicit timeout handling with aiohttp.ClientTimeout - Graceful number_of_statements parsing with fallback Workflow (article-check-claude.yml): - Updated to actions/checkout@v4, actions/cache@v4, actions/setup-python@v5 - Added explicit Python 3.11 setup - Improved cache configuration with restore-keys Fixes 1712n#408

sealfe · 2026-02-14T12:23:22Z

@Kseymur Requesting your review on this QA bot improvement PR.

This submission focuses on three key areas:

Prompt Engineering — Complete rewrite of all three core prompts (extraction, retrieval, answer) with crypto-incident-specific guidance, producing a structured 7-section validation report instead of free-form text
Multi-file PR support — Fixes the critical bug where only diff[0] was processed, with pre-validation to save API costs
Infrastructure robustness — Safe asyncio handling, retry logic, graceful degradation, model upgrades

Happy to address any feedback or make adjustments. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness#524

🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness#524
sealfe wants to merge 1 commit into1712n:mainfrom
sealfe:improve-qa-bot-prompts-and-pipeline

sealfe commented Feb 14, 2026

Uh oh!

sealfe commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

sealfe commented Feb 14, 2026

🌰 Summary

Changes Overview

1. Prompt Engineering — Complete Rewrite (client.py)

EXTRACTING_PROMPT (Statement Extraction)

RETRIEVAL_PROMPT (Fact-Checking Search)

ANSWER_PROMPT (Final Report Generation)

2. Multi-File PR Support (article_checker_claude.py)

3. Model & Configuration Upgrades (config.json)

4. Robustness Improvements

websearch.py

utils.py

client.py

5. Workflow Updates (article-check-claude.yml)

Files Changed

Testing

Uh oh!

sealfe commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments