Skip to content

randellconley-admin/eml_extractor

Repository files navigation

EML Content Extractor

A Python tool to extract content from .eml (email) files and save it as text files.

Features

  • Extract email headers (From, To, Subject, Date, etc.)
  • Extract plain text content from email body
  • Extract HTML content and convert to readable text
  • Handle attachments (list and optionally extract)
  • Support for both single file and batch processing
  • Command-line interface and Python API

Installation

Option 1: Using uv (Recommended)

  1. Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Clone this repository:
git clone <repository-url>
cd eml-extractor
  1. Install dependencies:
uv sync
  1. Run with uv (automatic virtual environment):
uv run python eml_extractor.py input.eml

Or activate the virtual environment manually:

source .venv/bin/activate
python eml_extractor.py input.eml

Option 2: Using pip

  1. Clone this repository:
git clone <repository-url>
cd eml-extractor
  1. Install dependencies:
pip install -r requirements.txt

Usage

Command Line Interface

Extract a single .eml file (content only, no headers):

# Using uv (recommended)
uv run python eml_extractor.py input.eml

# Using regular python
python eml_extractor.py input.eml

Extract with email headers included:

uv run python eml_extractor.py input.eml --include-headers

Extract multiple .eml files from a directory:

uv run python eml_extractor.py /path/to/eml/files/

Specify output directory:

uv run python eml_extractor.py input.eml --output /path/to/output/

Python API

from eml_extractor import EMLExtractor

extractor = EMLExtractor()
content = extractor.extract_from_file('email.eml')
print(content)

Output Format

The extracted content includes:

  • Plain text body
  • HTML body (converted to text)
  • Attachment information
  • Email metadata (headers) - only when --include-headers is used

By default, headers are excluded to focus on the email content. Use --include-headers to include technical email headers in the output.

Requirements

  • Python 3.6+
  • beautifulsoup4 (for HTML parsing)
  • chardet (for encoding detection)

License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages