Skip to content

Barcode detection causes false positives with plain numeric text #802

@shukebeta

Description

@shukebeta

Description

The barcode/QR code detection feature (enabled by default with detect-codes=true) can produce false positives when processing images containing plain numeric text. When a false positive occurs, the OCR path is completely skipped, resulting in incorrect text extraction.

Steps to Reproduce

  1. Ensure detect-codes=true in settings (default)
  2. Capture a screenshot containing numeric text that resembles a barcode pattern (e.g., "91385057399027")
  3. NormCap detects it as a barcode and returns only the numbers
  4. Expected OCR processing with proper text extraction is skipped

Example Log

WARNING - === CODES detection enabled, calling detect_codes() ===
WARNING - === Found 1 raw results, codes=['91385057399027'] ===
WARNING - === Single code detected: '91385057399027' (type=TextType.SINGLE_LINE) ===
WARNING - === detect_codes() returned: DetectionResult(...detector=<TextDetector.BARCODE>) ===

Current Behavior

When detect_codes=true (default), the detection logic prioritizes barcode/QR code detection:

  • If a code is detected (even falsely), OCR is skipped entirely
  • Users get incomplete/incorrect text extraction without realizing why

Expected Behavior

Potential improvements:

  1. Reduce false positive rate: Configure zxingcpp with stricter thresholds
  2. Confidence scoring: Only skip OCR if barcode confidence is high
  3. Dual detection: Run both code detection and OCR, intelligently choose the better result
  4. User control: Make it easier to disable code detection when not needed

Environment

  • NormCap version: latest (main branch)
  • Platform: Linux
  • Settings: default (detect-codes=true)

Impact

This affects users processing:

  • Mixed text with numbers (especially Asian text with digits)
  • Documents with number sequences
  • Any content where numeric patterns might trigger false barcode detection

Workaround

Manually disable code detection:

# In settings GUI: uncheck "Detect codes"
# Or edit config:
sed -i 's/detect-codes=true/detect-codes=false/' ~/.config/normcap/settings.conf

Related

This issue was discovered while testing PR #801 (smart whitespace stripping for CJK text).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions