A Python tool to extract content from .eml (email) files and save it as text files.
- Extract email headers (From, To, Subject, Date, etc.)
- Extract plain text content from email body
- Extract HTML content and convert to readable text
- Handle attachments (list and optionally extract)
- Support for both single file and batch processing
- Command-line interface and Python API
- Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh- Clone this repository:
git clone <repository-url>
cd eml-extractor- Install dependencies:
uv sync- Run with uv (automatic virtual environment):
uv run python eml_extractor.py input.emlOr activate the virtual environment manually:
source .venv/bin/activate
python eml_extractor.py input.eml- Clone this repository:
git clone <repository-url>
cd eml-extractor- Install dependencies:
pip install -r requirements.txtExtract a single .eml file (content only, no headers):
# Using uv (recommended)
uv run python eml_extractor.py input.eml
# Using regular python
python eml_extractor.py input.emlExtract with email headers included:
uv run python eml_extractor.py input.eml --include-headersExtract multiple .eml files from a directory:
uv run python eml_extractor.py /path/to/eml/files/Specify output directory:
uv run python eml_extractor.py input.eml --output /path/to/output/from eml_extractor import EMLExtractor
extractor = EMLExtractor()
content = extractor.extract_from_file('email.eml')
print(content)The extracted content includes:
- Plain text body
- HTML body (converted to text)
- Attachment information
- Email metadata (headers) - only when
--include-headersis used
By default, headers are excluded to focus on the email content. Use --include-headers to include technical email headers in the output.
- Python 3.6+
- beautifulsoup4 (for HTML parsing)
- chardet (for encoding detection)
MIT License