🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness#524
Open
sealfe wants to merge 1 commit into1712n:mainfrom
Open
🌰 Improve QA bot: comprehensive prompt engineering, multi-file support, and robustness#524sealfe wants to merge 1 commit into1712n:mainfrom
sealfe wants to merge 1 commit into1712n:mainfrom
Conversation
Major improvements to the article-check-claude QA bot: Prompt Engineering (client.py): - Rewritten EXTRACTING_PROMPT with crypto-specific guidance for better statement extraction (focus on verifiable claims with amounts, dates, entities) - Rewritten RETRIEVAL_PROMPT with search query best practices for crypto incidents (entity+year, amount verification, date precision) - Completely restructured ANSWER_PROMPT into 7 distinct validation sections: 1. Fact-check results with source attribution 2. Metadata validation (all 6 required fields) 3. Section structure check (5 required sections) 4. Filename validation (YYYY-MM-DD-Entity.md format) 5. Timeline format check (Month DD, YYYY, HH:MM AM/PM UTC) 6. Content quality assessment (references, completeness, objectivity) 7. Hugo SSG formatting check Multi-file PR Support (article_checker_claude.py): - Process ALL article files in PR, not just diff[0] - Filter files by article path pattern (configurable) - Pre-validate metadata structure before LLM calls (saves API costs) - Structured per-file reporting with aggregate summary - Extract and validate filenames from diff headers Model & Config Upgrades (config.json): - Search model: claude-3-opus -> claude-sonnet-4-20250514 - Summarize model: claude-3-haiku -> claude-3-5-haiku - Increased max tokens: 4000 -> 8000 for more thorough analysis - Increased summarize tokens: 512 -> 1024 - Added configurable N_SEARCH_RESULTS_TO_USE (3) and MAX_SEARCHES_TO_TRY (8) - Added validation reference data: VALID_ATTACK_TYPES, VALID_ENTITY_TYPES, VALID_METADATA_HEADERS, VALID_SECTION_HEADERS, ARTICLE_PATH_PATTERN Robustness (websearch.py, utils.py): - Safe asyncio event loop handling (Python 3.10+ compatible) - Retry logic with backoff for web scraping (max_retries=2) - Fallback to Brave snippet when scraping fails - Graceful exception handling in async gather - Strip non-content HTML elements (script, style, nav, footer, etc.) - Explicit timeout handling with aiohttp.ClientTimeout - Graceful number_of_statements parsing with fallback Workflow (article-check-claude.yml): - Updated to actions/checkout@v4, actions/cache@v4, actions/setup-python@v5 - Added explicit Python 3.11 setup - Improved cache configuration with restore-keys Fixes 1712n#408
Author
|
@Kseymur Requesting your review on this QA bot improvement PR. This submission focuses on three key areas:
Happy to address any feedback or make adjustments. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🌰 Summary
This PR delivers a comprehensive overhaul of the
article-check-claudeQA bot, targeting three key areas: prompt engineering, multi-file PR handling, and infrastructure robustness. Every change is designed to produce more actionable, accurate, and structured review feedback for Crypto Attack Wiki submissions.Changes Overview
1. Prompt Engineering — Complete Rewrite (client.py)
The most impactful change: all three core prompts have been rewritten from scratch with crypto-incident-specific guidance.
EXTRACTING_PROMPT (Statement Extraction)
RETRIEVAL_PROMPT (Fact-Checking Search)
ANSWER_PROMPT (Final Report Generation)
YYYY-MM-DD-Entity-Name.mdformat compliance**Month DD, YYYY, HH:MM AM/PM UTC:**formatThis structured approach ensures reviewers get consistent, actionable feedback rather than free-form text.
2. Multi-File PR Support (article_checker_claude.py)
Critical bug fix: The original bot only processed
diff[0]— meaning only the first file in any PR was checked.ARTICLE_PATH_PATTERN3. Model & Configuration Upgrades (config.json)
New configuration fields:
VALID_ATTACK_TYPES: 21 recognized attack categories for validationVALID_ENTITY_TYPES: 18 recognized entity typesVALID_METADATA_HEADERS: Required frontmatter fieldsVALID_SECTION_HEADERS: Required article sectionsARTICLE_PATH_PATTERN: Configurable article path filter4. Robustness Improvements
websearch.py
_get_or_create_event_loop()helper for Python 3.10+ compatibilityutils.py
<script>,<style>,<nav>,<footer>,<header>,<aside>,<noscript>before text extractionaiohttp.ClientTimeoutwith 15s limitasyncio.TimeoutErrordistinctlyclient.py
number_of_statementsparsing fails<answer>tag extraction fails, retries with raw response5. Workflow Updates (article-check-claude.yml)
actions/checkout@v3→v4actions/cache@v3→v4actions/setup-python@v5with explicit Python 3.11restore-keysfallbackFiles Changed
.github/workflows/article-check-claude.ymltools/article_checker/article_checker_claude.pytools/article_checker/claude_retriever/client.pytools/article_checker/claude_retriever/searcher/searchtools/websearch.pytools/article_checker/claude_retriever/utils.pytools/article_checker/config.jsonTesting
ast.parse)Fixes #408
🌰