-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
The Reddit monitor's quote extraction truncates text at ~80 characters, which clips URLs embedded in Reddit comments. This produces partial, unusable cross-reference URLs in the backfill report (e.g., http://aurora2.pentarch.org/index.php?bo, https://youtu.be/_l).
Evidence
Discussion #1296 backfill report contains multiple truncated URLs across triage pages. Resolving them requires re-fetching the original Reddit comments.
Root Cause
_extract_quote() in matcher.py:168 caps output at 200 chars, but the triage digest table further truncates quotes when formatting markdown table rows. Cross-reference URLs embedded in post/comment text get clipped before the full URL is preserved.
Note: _detect_cross_references() extracts URLs separately and correctly — this only affects the human-readable quote column in triage output.
Suggested Fix
- Preserve full URLs detected by
_detect_cross_references()in the match result even when quote text is truncated - Or: increase quote length limit for items with detected cross-references
- Or: append detected cross-reference URLs as a separate field in triage output rather than relying on the quote column
Impact
Low — cross-references are detected and stored correctly via _detect_cross_references(). This only affects the human-readable triage display where a reviewer might want to click the URL directly.
Related
- Discussion [Reddit Monitor] Backfill Report — 2026-02-17 #1296 (backfill report)
- bug: Reddit monitor matcher produces false positives on short/empty posts #1294 (false positive fix, committed in 8eda798)