Skip to content

Fast, SOC‑ready malicious document scanner that turns suspicious PDFs, DOC(X), XLS(X), and RTFs into IOC‑rich, SIEM‑friendly reports.

License

Notifications You must be signed in to change notification settings

PKHarsimran/IOC-Inspector

Repository files navigation

IOC Inspector 🕵️‍♂️

CI Lint & Type-check
Codecov License: MIT Python GitHub release Security Contributions welcome

Fast, SOC-ready malicious-document scanner — turn suspicious PDFs, DOC(X), XLS(X) & RTFs into IOC-rich, SIEM-friendly reports.


✅ What's New

  • Cross-platform CI with Linux + Windows and Python 3.10/3.11 support
  • Improved parser error handling with custom ParserError
  • Dynamic API key loading for test reliability
  • Coverage-gated CI with >80% unit test coverage
  • Final README polish ✨
  • Concurrent directory scanning with --threads

⚡ Why IOC Inspector?

🔑 Value to Analysts
One-command triage ioc-inspector invoice.docx → instant verdict & Markdown report
Actionable scoring Custom heuristics blend macro flags, auto-exec/API hits, embedded-object metrics and threat-feed look-ups (VirusTotal + AbuseIPDB) into a 0-100 risk score
Analyst-first outputs Markdown for tickets, JSON / CSV for Splunk & Elastic
Runs anywhere Linux • Windows • headless in GitHub Actions
Extensible All logic lives in ioc_inspector_core/ — swap parsers, add feeds, tweak weights

🔍 Feature Matrix

Category What you get
Formats PDF • DOC / DOCX • XLS / XLSX • RTF
Static Analysis Macro dump, deep auto-exec & suspicious-API analysis, obfuscation finder, embedded-object counter
IOC Extraction URLs • Domains • IPs • Base64 blobs • Hidden links
Threat Enrichment VirusTotal • AbuseIPDB
Scoring Engine Heuristic weights + rule modifiers (configurable)
Reporting Markdown, JSON, CSV, JSONL, HTML
                                      |

| Automation | Dir-recursive scan • --threads for concurrency • Quiet/Verbose switches • GitHub Actions workflow |


🚀 Quick Start

# 1 – Clone
$ git clone https://github.com/PKHarsimran/IOC-Inspector.git
$ cd IOC-Inspector

# 2 – Install (Linux/macOS)
$ python -m venv venv && source venv/bin/activate

# 2 – Install (Windows)
> python -m venv venv && venv\Scripts\activate

# 3 – Install requirements
(venv) $ pip install -r requirements.txt

# 4 – Set up API keys
(venv) $ cp .env.example .env
(venv) $ nano .env    # Add your VT_API_KEY & ABUSEIPDB_API_KEY

# 5 – Run
(venv) $ python main.py --file examples/sample_invoice.docx --report
Example Output examples/sample_invoice.docx: score=45 verdict=suspicious See reports/sample_invoice_report.md for full IOC tables.

⚙️ Configuration Highlights (settings.py)

RISK_WEIGHTS = {
    "macro":          25,   # any VBA present
    "autoexec":       15,   # AutoOpen / Document_Open …
    "obfuscation":    20,   # long Base-64 blobs, XOR strings
    "susp_call":       5,   # CreateObject, Shell … (×3 capped at 15)
    "malicious_url":  30,   # VirusTotal consensus
    "malicious_ip":   25,   # AbuseIPDB ≥ confidence cutoff
}

VT_THRESHOLD            = 5    # vendors that must flag URL/IP malicious
ABUSE_CONFIDENCE_CUTOFF = 70   # AbuseIPDB confidence to flag IP
REPORT_FORMATS          = ["markdown", "json"]

🗂️ Repository Layout

ioc-inspector/
├── ioc_inspector_core/         ← all analysis logic
│   ├── __init__.py
│   ├── pdf_parser.py
│   ├── doc_parser.py
│   ├── macro_analyzer.py       ← deep VBA heuristics
│   ├── url_reputation.py
│   ├── abuseipdb_check.py
│   ├── heuristics.py
│   └── report_generator.py
│
├── logger.py
├── main.py
├── settings.py
│
├── examples/
├── reports/        (git-ignored)
├── logs/           (git-ignored)
│
├── tests/
└── requirements.txt

📦 Dependencies at a Glance

Category Package Why it’s needed
Core oletools, pdfminer.six, PyMuPDF, requests, python-dotenv, tldextract Parsing, enrichment, API config
Reporting (builtin) Markdown/JSON/CSV/JSONL/HTML rendering
Optional tabulate, rich, jinja2 Pretty console output, HTML reports

🗺️ How the code flows

flowchart TD
    CLI["CLI (main.py)"] --> DISPATCH["Dispatcher (__init__.analyze)"]

    subgraph "Parsers"
        DISPATCH --> PDF["pdf_parser.py"]
        DISPATCH --> OFFICE["doc_parser.py"]
        OFFICE --> MACRO["macro_analyzer.py"]
    end

    PDF --> ENRICH
    MACRO --> ENRICH
    subgraph "Reputation enrichment"
        ENRICH --> VT["url_reputation.py"]
        ENRICH --> ABIP["abuseipdb_check.py"]
    end

    ENRICH --> SCORE["heuristics.py"]
    SCORE --> REPORT["report_generator.py"]
    SCORE --> LOG["logger.py"]
    REPORT --> OUTPUT["Markdown / JSON"]
Loading

What happens step-by-step

Stage Module Job
CLI main.py Reads flags, builds file list, prints a headline.
Dispatcher ioc_inspector_core/__init__.py Routes each file to the right parser.
Parsers pdf_parser.py & doc_parser.py Extract URLs, IPs, macros, embeds, JavaScript.
Enrichment url_reputation.py, abuseipdb_check.py Query VirusTotal & AbuseIPDB; attach verdicts.
Scoring heuristics.py Apply weights, produce 0-100 risk score & verdict.
Reporting report_generator.py Write Markdown + JSON with IOC tables.
Logging logger.py Console + rotating file breadcrumbs for every stage.

📊 Coverage & Reliability

  • >80% test coverage (enforced in CI)
  • ✅ Coverage badge + reports via Codecov
  • ✅ Works on Linux and Windows runners
  • ✅ CLI smoke test validates API usage and report generation

🛣️ Roadmap to v1.0.0

This outlines the path for taking IOC Inspector from a solid prototype (v0.1.0) to a polished, production-ready v1.0.0 release.


✅ Phase 1: Foundation (v0.1.0 – Done)

  • Static IOC extraction: PDF, DOCX, XLSX, RTF
  • Threat enrichment: VirusTotal + AbuseIPDB
  • Heuristic-based scoring engine
  • Markdown + JSON reporting
  • Command-line interface with flags (--report, --quiet, etc.)
  • Cross-platform CI (Linux + Windows)
  • 80%+ test coverage with CLI smoke tests
  • Final README polish and first release tag

🚧 Phase 2: Stability & Feedback (v0.2.x)

Focus: Hardening the product & improving feedback loop

Technical Improvements

  • JSON schema validation for report output
  • Improve error messaging with file context (e.g., filetype, parser used)
  • Separate reporting logic from CLI to enable more formats

Developer Experience

  • Add make test, make lint, make run shortcuts
  • Add GitHub Discussions or feedback template
  • Incorporate feedback from test users

✨ Phase 3: Export & Integrations (v0.3.x)

Focus: SIEM-friendliness & analyst use

  • CSV export for Splunk or Excel
  • JSONL support for batch pipelines
  • HTML export with embedded styles
  • Normalize field naming for ingestion (e.g. ioc.type, ioc.source)
  • (Optional) Tag known MITRE ATT&CK techniques from enriched IOCs

🚀 Phase 4: Productionization (v0.9.x)

Focus: Distribution & packaging polish

  • Publish to PyPI for pipx install
  • Provide Docker image with CLI entrypoint
  • Build Windows binary via PyInstaller
  • Automate changelogs & releases via GitHub Actions
  • Use SemVer auto-tagging (release-please)

🏁 v1.0.0 Criteria

IOC Inspector will be tagged v1.0.0 when:

  • All supported formats parse reliably with test coverage
  • JSON / Markdown / CSV output is schema-stable
  • Test coverage is >90%
  • CLI is frictionless and documented
  • Docker + PyPI builds work out-of-box
  • Users validate usefulness via feedback

🧩 Post-1.0 Ideas

Optional features to consider post-v1.0:

  • Ntfy/webhook notifications for batch runs
  • Web UI using Streamlit or Flask
  • Threat feed exporter (e.g. to MISP or CSV dump)
  • Language support for French / Spanish SOC teams

💬 Questions? Feedback? File an Issue or start a discussion.


About

Fast, SOC‑ready malicious document scanner that turns suspicious PDFs, DOC(X), XLS(X), and RTFs into IOC‑rich, SIEM‑friendly reports.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •