Reddit monitor: issues with many unverified claims attract disproportionate false positives

## Description

Issues with many unverified claims containing common game keywords (e.g., "ships", "fuel", "population") attract disproportionate match counts in the Reddit monitor. Issue #707 (Civilian Economy, 32 unverified claims) matched 543 posts with 59 high-confidence — nearly all false positives from generic term overlap.

## Root Cause

`token_set_ratio` scores based on token intersection. Issues with long bodies containing many common Aurora terms create a large token pool that overlaps with almost any Aurora-related Reddit post. The #1294 fix (8eda798) addresses the short-post side, but the long-issue side remains: an issue with 32 claims and hundreds of keywords is a magnet for matches.

## Suggested Approaches

1. **Term specificity weighting (TF-IDF style):** Weight keywords by how unique they are across all issues. "box launcher" is specific; "ships" appears in every issue. Penalize matches driven by common terms.
2. **Issue text length normalization:** Scale the fuzzy score inversely with issue text length — longer issue bodies should require higher raw scores to qualify.
3. **Max claims threshold:** Issues with >N unverified claims could be split or excluded from automated matching, since they match everything.

## Impact

Medium — false positives waste reviewer time and add noise to issue comment threads. The #1294 fix reduces short-post false positives but doesn't address the long-issue attractors.

## Related

- #1294 (short text false positives, fixed in 8eda798)
- Discussion #1296 (backfill report showing the pattern)
- #707 (primary example: 543 false positive matches)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reddit monitor: issues with many unverified claims attract disproportionate false positives #1298

Description

Root Cause

Suggested Approaches

Impact

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reddit monitor: issues with many unverified claims attract disproportionate false positives #1298

Description

Description

Root Cause

Suggested Approaches

Impact

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions