Skip to content

🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness#524

Open
sealfe wants to merge 1 commit into1712n:mainfrom
sealfe:improve-qa-bot-prompts-and-pipeline
Open

🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness#524
sealfe wants to merge 1 commit into1712n:mainfrom
sealfe:improve-qa-bot-prompts-and-pipeline

Conversation

@sealfe
Copy link

@sealfe sealfe commented Feb 14, 2026

🌰 Summary

This PR delivers a comprehensive overhaul of the article-check-claude QA bot, targeting three key areas: prompt engineering, multi-file PR handling, and infrastructure robustness. Every change is designed to produce more actionable, accurate, and structured review feedback for Crypto Attack Wiki submissions.


Changes Overview

1. Prompt Engineering — Complete Rewrite (client.py)

The most impactful change: all three core prompts have been rewritten from scratch with crypto-incident-specific guidance.

EXTRACTING_PROMPT (Statement Extraction)

  • Before: Generic "extract important statements" instruction
  • After: Domain-specific guidance to extract verifiable factual claims focusing on:
    • Specific monetary amounts and losses
    • Dates and timestamps
    • Named entities, protocols, and individuals
    • Blockchain addresses and transaction hashes
    • Attack method descriptions
    • Protocol response claims (pauses, bounties, post-mortems)
  • Targets 5-15 high-value statements per article (was unbounded)

RETRIEVAL_PROMPT (Fact-Checking Search)

  • Before: Generic search instructions
  • After: Crypto-specific search query best practices:
    • Include entity name AND year for precision
    • Search for incident name + "hack"/"exploit" + amount for verification
    • Direct address lookups for on-chain verification
    • Iterative query refinement guidance
  • Emphasis on exact number matching (e.g., $20M vs $20.5M)
  • Date precision verification (month/day/year must all match)

ANSWER_PROMPT (Final Report Generation)

  • Before: 4 loosely defined checks mixed together
  • After: 7 distinct, structured validation sections:
Section What It Checks
1. Fact-Check Results Statement-by-statement verification with ✅/❌/⚠️ and sources
2. Metadata Validation All 6 required YAML fields present and correct
3. Section Structure Exactly 5 required sections in correct order
4. Filename Validation YYYY-MM-DD-Entity-Name.md format compliance
5. Timeline Format **Month DD, YYYY, HH:MM AM/PM UTC:** format
6. Content Quality References, completeness, objectivity, blockchain evidence
7. Hugo SSG Formatting Valid Markdown and YAML for Hugo

This structured approach ensures reviewers get consistent, actionable feedback rather than free-form text.

2. Multi-File PR Support (article_checker_claude.py)

Critical bug fix: The original bot only processed diff[0] — meaning only the first file in any PR was checked.

  • Now iterates over all files in the PR diff
  • Filters to article files matching configurable ARTICLE_PATH_PATTERN
  • Skips non-article files (code, config, images) with logged messages
  • Pre-validates metadata structure before sending to LLM (saves API costs for malformed submissions)
  • Generates per-file reports aggregated into a single PR comment
  • Properly extracts filenames from diff headers for validation

3. Model & Configuration Upgrades (config.json)

Parameter Before After Why
Search Model claude-3-opus-20240229 claude-sonnet-4-20250514 Better accuracy, lower cost
Summarize Model claude-3-haiku-20240307 claude-3-5-haiku-20241022 Improved quality
Search Max Tokens 4000 8000 Room for thorough 7-section analysis
Summarize Max Tokens 512 1024 Better article extraction
Search Results 1 (hardcoded) 3 (configurable) Multiple corroborating sources
Max Searches 5 (hardcoded) 8 (configurable) More thorough fact-checking

New configuration fields:

  • VALID_ATTACK_TYPES: 21 recognized attack categories for validation
  • VALID_ENTITY_TYPES: 18 recognized entity types
  • VALID_METADATA_HEADERS: Required frontmatter fields
  • VALID_SECTION_HEADERS: Required article sections
  • ARTICLE_PATH_PATTERN: Configurable article path filter

4. Robustness Improvements

websearch.py

  • Safe asyncio event loop handling: _get_or_create_event_loop() helper for Python 3.10+ compatibility
  • Scrape retry logic: Falls back to Brave snippet when page scraping fails (max 2 retries with backoff)
  • Graceful async gather: Catches per-task exceptions instead of failing the entire batch
  • Bounds checking: Prevents IndexError on empty result lists

utils.py

  • HTML cleanup: Strips <script>, <style>, <nav>, <footer>, <header>, <aside>, <noscript> before text extraction
  • Content length limit: Truncates scraped content to 50K chars to prevent token overflow
  • Explicit timeout: Uses aiohttp.ClientTimeout with 15s limit
  • Separate timeout handling: Catches asyncio.TimeoutError distinctly

client.py

  • Safe statement count parsing: Falls back to 5 if number_of_statements parsing fails
  • Per-search error handling: Individual search failures don't crash the pipeline
  • Answer fallback: If <answer> tag extraction fails, retries with raw response
  • Increased extract tokens: 1000 → 2000 for statement extraction

5. Workflow Updates (article-check-claude.yml)

  • actions/checkout@v3v4
  • actions/cache@v3v4
  • Added actions/setup-python@v5 with explicit Python 3.11
  • Improved cache configuration with restore-keys fallback

Files Changed

File Changes
.github/workflows/article-check-claude.yml Workflow modernization
tools/article_checker/article_checker_claude.py Multi-file support, pre-validation, structured output
tools/article_checker/claude_retriever/client.py Complete prompt rewrite, error handling
tools/article_checker/claude_retriever/searcher/searchtools/websearch.py Asyncio fix, retry logic, fallbacks
tools/article_checker/claude_retriever/utils.py HTML cleanup, timeout handling, retry
tools/article_checker/config.json Model upgrade, configurable params, validation data

Testing

  • All Python files pass syntax validation (ast.parse)
  • YAML workflow file validates correctly
  • Config JSON loads without errors
  • Backward compatible with single-file PRs
  • Non-article files properly filtered

Fixes #408

🌰

Major improvements to the article-check-claude QA bot:

Prompt Engineering (client.py):
- Rewritten EXTRACTING_PROMPT with crypto-specific guidance for better
  statement extraction (focus on verifiable claims with amounts, dates, entities)
- Rewritten RETRIEVAL_PROMPT with search query best practices for crypto
  incidents (entity+year, amount verification, date precision)
- Completely restructured ANSWER_PROMPT into 7 distinct validation sections:
  1. Fact-check results with source attribution
  2. Metadata validation (all 6 required fields)
  3. Section structure check (5 required sections)
  4. Filename validation (YYYY-MM-DD-Entity.md format)
  5. Timeline format check (Month DD, YYYY, HH:MM AM/PM UTC)
  6. Content quality assessment (references, completeness, objectivity)
  7. Hugo SSG formatting check

Multi-file PR Support (article_checker_claude.py):
- Process ALL article files in PR, not just diff[0]
- Filter files by article path pattern (configurable)
- Pre-validate metadata structure before LLM calls (saves API costs)
- Structured per-file reporting with aggregate summary
- Extract and validate filenames from diff headers

Model & Config Upgrades (config.json):
- Search model: claude-3-opus -> claude-sonnet-4-20250514
- Summarize model: claude-3-haiku -> claude-3-5-haiku
- Increased max tokens: 4000 -> 8000 for more thorough analysis
- Increased summarize tokens: 512 -> 1024
- Added configurable N_SEARCH_RESULTS_TO_USE (3) and MAX_SEARCHES_TO_TRY (8)
- Added validation reference data: VALID_ATTACK_TYPES, VALID_ENTITY_TYPES,
  VALID_METADATA_HEADERS, VALID_SECTION_HEADERS, ARTICLE_PATH_PATTERN

Robustness (websearch.py, utils.py):
- Safe asyncio event loop handling (Python 3.10+ compatible)
- Retry logic with backoff for web scraping (max_retries=2)
- Fallback to Brave snippet when scraping fails
- Graceful exception handling in async gather
- Strip non-content HTML elements (script, style, nav, footer, etc.)
- Explicit timeout handling with aiohttp.ClientTimeout
- Graceful number_of_statements parsing with fallback

Workflow (article-check-claude.yml):
- Updated to actions/checkout@v4, actions/cache@v4, actions/setup-python@v5
- Added explicit Python 3.11 setup
- Improved cache configuration with restore-keys

Fixes 1712n#408
@sealfe
Copy link
Author

sealfe commented Feb 14, 2026

@Kseymur Requesting your review on this QA bot improvement PR.

This submission focuses on three key areas:

  1. Prompt Engineering — Complete rewrite of all three core prompts (extraction, retrieval, answer) with crypto-incident-specific guidance, producing a structured 7-section validation report instead of free-form text
  2. Multi-file PR support — Fixes the critical bug where only diff[0] was processed, with pre-validation to save API costs
  3. Infrastructure robustness — Safe asyncio handling, retry logic, graceful degradation, model upgrades

Happy to address any feedback or make adjustments. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve QA Bot

1 participant

Comments