Skip to content

Add LLM-based sanity checker to CI for automated scraper PRs #78

@leblancfg

Description

@leblancfg

Problem

Currently, automated scraper PRs are created daily but there's no validation of data quality before auto-merge. Bad scraper output (like the 443 xAI models) could go straight to production if CI passes.

Proposed Solution

Add an LLM-based sanity checker as a CI step that validates scraped data before allowing auto-merge:

Checks to implement:

  1. Volume checks: Flag if scraper returns unusually high/low number of items (e.g., xAI: 443 items vs expected ~5)
  2. Data quality: Validate that items have required fields (dates, context, valid model IDs)
  3. Pattern detection: Detect concatenated model IDs, invalid dates, placeholder values ("N/A", "TBD")
  4. Delta analysis: Compare with previous data to flag suspicious changes (e.g., 10x increase in deprecations)

Implementation approach:

  • Add new CI job: sanity-check
  • Use Claude API (Haiku for speed/cost) to analyze scraped data
  • Output structured report with pass/fail + warnings
  • Block auto-merge if critical issues found
  • Allow auto-merge if only warnings (log for review)

Benefits:

  • Catch data quality issues before they hit production RSS feed
  • Maintain automation while ensuring quality
  • Provide detailed feedback for debugging scraper issues
  • Can evolve checks over time without changing scrapers

Example prompt structure:

Analyze this scraped deprecation data and check for:
1. Unusual item counts per provider
2. Missing required fields
3. Invalid date formats
4. Concatenated or malformed model IDs
5. Suspicious patterns

Data: {json_data}
Previous counts: {historical_counts}

Return JSON with: {pass: bool, critical_issues: [], warnings: []}

Acceptance Criteria:

  • CI job added that runs LLM sanity check
  • Blocks auto-merge on critical issues
  • Logs warnings but allows merge
  • Uses cost-effective model (Haiku)
  • Completes in <30 seconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions