feat(recognition): add configurable confidence aggregation methods by sneakybatman · Pull Request #2032 · mindee/doctr

sneakybatman · 2025-12-15T13:23:16Z

Summary

This PR adds support for configurable word-level confidence score aggregation methods in text recognition models. Previously, models used either arithmetic mean or minimum for aggregating character-level confidence scores into word-level confidence, with no way for users to customize this behavior.

Motivation

Different use cases may require different confidence aggregation strategies:

Arithmetic mean: Good general-purpose default, balances all character confidences
Geometric mean: More sensitive to low confidence characters, useful when any low confidence should significantly impact the word score
Harmonic mean: Even more conservative, heavily penalizes low confidence characters
Minimum: Most conservative approach, word confidence equals weakest character (good for high-precision requirements)
Maximum: Most optimistic, useful when you want the best-case confidence
Custom callable: Full flexibility for specialized use cases

Changes

Add aggregate_confidence() utility function in core.py with support for 5 built-in methods plus custom callables
Add ConfidenceAggregation type alias for type hints
Add confidence_aggregation parameter to RecognitionPostProcessor base class
Update all PyTorch PostProcessors: PARSeq, ViTSTR, CRNN, SAR, MASTER, VIPTR
Update all TensorFlow PostProcessors: PARSeq, ViTSTR, SAR, MASTER
Update remap_preds() for split crop handling to use configurable aggregation
Add comprehensive unit tests (20 new test cases)

Usage Example

from doctr.models import recognition

# Use default aggregation (model-specific)
model = recognition.parseq(pretrained=True)

# Or customize at the PostProcessor level
from doctr.models.recognition.parseq.pytorch import PARSeqPostProcessor

# Use geometric mean for more conservative confidence scores
processor = PARSeqPostProcessor(vocab, confidence_aggregation="geometric_mean")

# Use custom aggregation function
import numpy as np
processor = PARSeqPostProcessor(vocab, confidence_aggregation=lambda probs: np.percentile(probs, 25))

Test plan

All existing tests pass
New unit tests for aggregate_confidence() function cover all 5 methods
Tests verify correct handling of edge cases (empty arrays, single values, zeros)
Tests verify custom callable support
PyTorch postprocessor tests updated and passing
TensorFlow postprocessor tests updated and passing

Add support for configurable word-level confidence score aggregation methods in text recognition models. Users can now choose how to aggregate character-level confidence scores into word-level confidence. Supported aggregation methods: - "mean": Arithmetic mean (default for transformer models) - "geometric_mean": Geometric mean (sensitive to low values) - "harmonic_mean": Harmonic mean (even more sensitive to low values) - "min": Minimum confidence (most conservative, default for CTC/attention models) - "max": Maximum confidence (most optimistic) - Custom callable: User-defined aggregation function Changes: - Add `aggregate_confidence()` utility function in core.py - Add `confidence_aggregation` parameter to RecognitionPostProcessor - Update all PyTorch PostProcessors (PARSeq, ViTSTR, CRNN, SAR, MASTER, VIPTR) - Update all TensorFlow PostProcessors (PARSeq, ViTSTR, SAR, MASTER) - Update `remap_preds()` for split crop handling - Add comprehensive unit tests for aggregation methods - Maintain backward compatibility with sensible defaults per model type

sneakybatman · 2026-01-09T08:29:25Z

@felixdittrich92 anything else needed for this PR to be approved?

felixdittrich92 · 2026-01-12T08:36:56Z

@felixdittrich92 anything else needed for this PR to be approved?

Hi @sneakybatman 👋,

Excuse the late reply.
Have had some delays last year but will be back soon and then check your PR also.
In front thanks for opening and working on doctr 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(recognition): add configurable confidence aggregation methods#2032

feat(recognition): add configurable confidence aggregation methods#2032
sneakybatman wants to merge 1 commit intomindee:mainfrom
sneakybatman:feature/configurable-confidence-aggregation

sneakybatman commented Dec 15, 2025

Uh oh!

sneakybatman commented Jan 9, 2026

Uh oh!

felixdittrich92 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sneakybatman commented Dec 15, 2025

Summary

Motivation

Changes

Usage Example

Uh oh!

sneakybatman commented Jan 9, 2026

Uh oh!

felixdittrich92 commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants