Skip to content

Reddit monitor truncates cross-reference URLs in quote extraction #1297

@ErikEvenson

Description

@ErikEvenson

Description

The Reddit monitor's quote extraction truncates text at ~80 characters, which clips URLs embedded in Reddit comments. This produces partial, unusable cross-reference URLs in the backfill report (e.g., http://aurora2.pentarch.org/index.php?bo, https://youtu.be/_l).

Evidence

Discussion #1296 backfill report contains multiple truncated URLs across triage pages. Resolving them requires re-fetching the original Reddit comments.

Root Cause

_extract_quote() in matcher.py:168 caps output at 200 chars, but the triage digest table further truncates quotes when formatting markdown table rows. Cross-reference URLs embedded in post/comment text get clipped before the full URL is preserved.

Note: _detect_cross_references() extracts URLs separately and correctly — this only affects the human-readable quote column in triage output.

Suggested Fix

  1. Preserve full URLs detected by _detect_cross_references() in the match result even when quote text is truncated
  2. Or: increase quote length limit for items with detected cross-references
  3. Or: append detected cross-reference URLs as a separate field in triage output rather than relying on the quote column

Impact

Low — cross-references are detected and stored correctly via _detect_cross_references(). This only affects the human-readable triage display where a reviewer might want to click the URL directly.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions