Skip to content

A robust MRZ extraction and validation engine library designed for real-world KYC and identity verification workflows.

License

Notifications You must be signed in to change notification settings

AzwadFawadHasan/OmniMRZ

Repository files navigation

OmniMRZ — Python MRZ Extraction & Validation Library for Passport OCR and KYC

License Downloads Python CodeQL PyPI

OmniMRZ is an open-source Python library for Machine Readable Zone (MRZ) extraction, parsing, and ICAO-9303 validation from passport and ID images, built for OCR, KYC, and identity verification systems.

It is a production-grade MRZ extraction and validation engine designed for high-accuracy KYC, identity verification, and document intelligence pipelines.

Unlike simple MRZ readers, OmniMRZ evaluates whether an MRZ is structurally correct, cryptographically valid, and logically plausible.

Typical Use Cases

🛂 Passport and ID card OCR pipelines
🏦 KYC / AML identity verification systems
✈️ Border control and immigration preprocessing
📄 Document digitization and archiving
🔐 Authentication and onboarding workflows

⭐ Show Your Support If OmniMRZ helped you or saved development time: 👉 Please consider starring the repository It helps visibility and motivates continued development

Features

Installation

Contributing

Why OmniMRZ?

Unlike basic MRZ readers, OmniMRZ provides end-to-end MRZ quality assurance:

  • Combines OCR, structural validation, checksum verification, and logical consistency checks
  • Fully compliant with ICAO 9303
  • Designed for production KYC and identity verification systems
  • Robust against OCR noise and partially corrupted MRZ lines

Features

At a glance

  • MRZ detection and extraction from images
  • Supports TD3 (passport) format
  • Checksum validation (ICAO 9303)
  • Logical and structural validation
  • Clean Python API

Detailed features

🔍 MRZ Extraction

  • PaddleOCR-based MRZ text extraction (robust on mobile & noisy images)
  • Intelligent MRZ line clustering & reconstruction
  • Automatic MRZ type detection (TD1 / TD2 / TD3)
  • OCR noise filtering & MRZ-safe character normalization
  • Works even with partially corrupted or misaligned MRZs

🧱 Structural Validation (ICAO 9303)

  • Exact line-length enforcement
  • Strict MRZ format verification
  • Field-level structural checks
  • Early-exit gating for invalid layouts

🔢 Checksum Validation

  • Fully ICAO-9303 compliant checksum algorithm
  • Field-level validation:
  • Document number
  • Date of birth
  • Expiry date
  • Composite checksum
  • OCR-error tolerant digit correction (O→0, S→5, B→8, etc.)
  • Detailed checksum failure diagnostics

🧠 Logical & Semantic Validation

  • Expired document detection
  • Future date-of-birth detection
  • Implausible age detection
  • DOB ≥ expiry detection
  • Gender value validation (M, F, X, <)
  • Cross-field consistency signals (issuer vs nationality)

📤 Output

  • Clean MRZ text
  • Structured JSON
  • Deterministic pass / fail / warning signals
  • Human-readable error messages

Installation

pip install omnimrz

Note: PaddleOCR requires additional system dependencies. Please ensure PaddlePaddle installs correctly on your platform.

pip install paddleocr
pip install paddle paddle

or if that fails then run

python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

Quick Usage

from omnimrz import OmniMRZ

omni = OmniMRZ()
result = omni.process("ukpassport.jpg")

print(result)

Output Example

{
    "extraction": {
        "status": "SUCCESS(extraction of mrz)",
        "line1": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<",
        "line2": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00" 
    },
    "structural_validation": {
        "status": "PASS",
        "mrz_type": "TD3",
        "errors": []
    },
    "checksum_validation": {
        "status": "PASS",
        "errors": []
    },
    "parsed_data": {
        "status": "PARSED",
        "data": {
            "document_type": "P",
            "issuing_country": "GBR",
            "surname": "PUDARSAN",
            "given_names": "HENERT",
            "document_number": "707797979",
            "nationality": "GBR",
            "date_of_birth": "1995-05-20",
            "gender": "M",
            "expiry_date": "2017-04-22",
            "personal_number": ""
        }
    },
    "logical_validation": {
        "status": "FAIL",
        "errors": [
            "DOCUMENT_EXPIRED"
        ]
    },
    "screenshot_detection": {
        "status": "PASS",
        "is_screenshot": false,
        "score": 3,
        "confidence": 30.0,
        "reasons": [
            "Low ELA: 0.38",
            "High horizontal edges: 0.51",
            "High sharpness: 2029.58"
        ]
    }
}

Citing OmniMRZ

If you use OmniMRZ in academic research or publications, please consider citing this repository:

Contributing

Contributions are welcome!🤝

  1. Fork the repository
  2. Create your feature branch
git checkout -b feature/amazing-feature
  1. Commit your changes
  2. Push to your branch
  3. Open a Pull Request

Keywords

MRZ extraction, passport OCR, machine readable zone, ICAO 9303, MRZ parser, Python OCR, identity verification, KYC automation, document intelligence, ID card scanning, border control OCR

misc

Visitor Count

About

A robust MRZ extraction and validation engine library designed for real-world KYC and identity verification workflows.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages