Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions docs/EMAIL_SEARCH_PERFORMANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Email Search Performance & Filter Requirements

## Overview

The `list_emails_metadata()` method requires at least one search filter to prevent expensive mailbox scans that can timeout on large mailboxes. This document explains the reasoning and best practices.

## Why Filters Are Required

### The Problem: Unfiltered Searches

When calling `list_emails_metadata()` without any filters:

```python
# ❌ BAD: No filters
list_emails_metadata(account_name="Galaxia", page=1, page_size=5)
```

Despite requesting only 5 emails per page, the internal flow is:

```
1. IMAP uid_search("ALL") ← Scans entire mailbox
2. Fetch dates for ALL emails ← Could be 100,000+ emails
3. Sort ALL emails ← Memory intensive
4. Then paginate to get 5 emails ← Finally!
```

**Pagination only applies AFTER the expensive operations.** This means:

- On a mailbox with 10,000 emails: seconds of delay
- On a mailbox with 100,000+ emails: minutes or timeout
- On enterprise mailboxes: can hang indefinitely

### IMAP Protocol Limitation

IMAP doesn't support "give me the first N emails" queries. The protocol requires:

1. **SEARCH** - Define criteria and get matching UIDs (returns ALL matches)
2. **FETCH** - Get data for specific UIDs
3. **SORT** - Order results (optional, server-dependent)

There's no built-in way to limit results before the search phase.

## Best Practices

### ✅ Fast Searches

#### 1. Date Range (Fastest)

```python
# Get last 30 days of emails
from datetime import datetime, timedelta
since = datetime.now() - timedelta(days=30)
result = list_emails_metadata(account_name="Galaxia", since=since)
```

**Why:** IMAP servers heavily index by date. Returns only recent emails.

#### 2. Text Search (Medium Speed)

```python
# Search for specific sender
result = list_emails_metadata(
account_name="Galaxia",
from_address="boss@company.com"
)
```

**Why:** Text searches use server indices, but could match many emails.

#### 3. Combined Filters (Fastest & Best)

```python
# Search for work emails from last month
since = datetime.now() - timedelta(days=30)
result = list_emails_metadata(
account_name="Galaxia",
subject="project",
from_address="team@company.com",
since=since
)
```

**Why:** Narrows search space at IMAP level (most efficient).

#### 4. Flag-Based Search

```python
# Get unread emails from the last 7 days
since = datetime.now() - timedelta(days=7)
result = list_emails_metadata(
account_name="Galaxia",
seen=False, # Unread emails
since=since
)
```

**Why:** Flag searches are fast; combining with date range is best.

## Performance Comparison

| Query | Mailbox Size | Time |
| ---------------------- | ---------------- | ------------ |
| `SEARCH ALL` | 10,000 emails | ~1 second |
| `SEARCH ALL` | 100,000 emails | ~10+ seconds |
| `SEARCH ALL` | 1,000,000 emails | **TIMEOUT** |
| `SEARCH SINCE <date>` | Any size | ~100ms |
| `SEARCH FROM "sender"` | 100,000 emails | ~500ms |
| `SEARCH SINCE + FROM` | 100,000 emails | ~100ms |

## Error Message Explanation

When no filters are provided:

```
ValueError: At least one filter is required to prevent expensive searches
on large mailboxes. Recommended: combine a date range (since/before) with
optional text filters (subject/from/to).
Example: since=datetime(2026, 1, 1) or subject='work' + since=datetime(2025, 1, 1)
```

This error prevents:

- Silent performance degradation
- Unexplained timeouts
- User frustration with "why is this so slow?"

## Available Filters

All filters prevent full mailbox scans:

- **`since`** (datetime) - Emails after date (fastest)
- **`before`** (datetime) - Emails before date (fastest)
- **`subject`** (string) - Subject line text search
- **`from_address`** (string) - Sender email address
- **`to_address`** (string) - Recipient email address
- **`seen`** (bool) - Read/unread emails
- **`flagged`** (bool) - Starred/flagged emails
- **`answered`** (bool) - Emails with replies

## Recommendations

### For Applications

1. **Always provide a date range** - This is the fastest and most predictable
2. **Combine with text filters** - Narrow results further
3. **Handle pagination** - Combine with `page` and `page_size`
4. **Cache results** - Don't re-query immediately

### For Users

1. **Start with recent emails** - Last 30-90 days is usually sufficient
2. **Use specific searches** - If looking for something, add subject/from filters
3. **Be explicit** - Don't rely on defaults; always specify your intent

## Migration Guide

If you were using unfiltered searches before:

### Before (Would fail now)

```python
result = list_emails_metadata(account_name="Galaxia")
```

### After

```python
from datetime import datetime, timedelta

# Option 1: Last 30 days
since = datetime.now() - timedelta(days=30)
result = list_emails_metadata(account_name="Galaxia", since=since)

# Option 2: Search for specific sender
result = list_emails_metadata(
account_name="Galaxia",
from_address="colleague@company.com"
)

# Option 3: Combine filters (recommended)
since = datetime.now() - timedelta(days=90)
result = list_emails_metadata(
account_name="Galaxia",
subject="project",
since=since
)
```

## FAQ

**Q: Can I see all my emails?**
A: Yes, use a large date range: `since=datetime(2000, 1, 1)`. On large mailboxes, this may take several seconds or timeout depending on server capacity.

**Q: Why is pagination alone not enough?**
A: IMAP requires a full search before paginating. Pagination only applies after results are returned, so it doesn't prevent the initial expensive scan.

**Q: What if my IMAP server is fast?**
A: Even fast servers struggle with "ALL" searches on mailboxes with 100,000+ emails. Date range filters are always safer.

**Q: Can I search my entire mailbox?**
A: Technically yes, but it's not recommended for mailboxes > 50,000 emails. Use: `since=datetime(2000, 1, 1)` and be patient. Consider pagination with small `page_size` values.

## See Also

- [IMAP RFC 3501](https://tools.ietf.org/html/rfc3501) - IMAP Protocol Specification
- [mcp-email-server Repository](https://github.com/ai-zerolab/mcp-email-server) - Main project
73 changes: 73 additions & 0 deletions mcp_email_server/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,35 @@ async def list_available_accounts() -> list[AccountAttributes]:
return [account.masked() for account in settings.get_accounts()]


@mcp.tool(
description="List all mailboxes (folders) in an email account. Use this to discover available folders like Archive, Sent, Trash, etc."
)
async def list_mailboxes(
account_name: Annotated[str, Field(description="The name of the email account.")],
) -> list[dict]:
handler = dispatch_handler(account_name)
return await handler.list_mailboxes()


@mcp.tool(
description="Search emails using server-side IMAP search. Fast even with thousands of emails. "
"Searches in subject, body, and headers by default."
)
async def search_emails(
account_name: Annotated[str, Field(description="The name of the email account.")],
query: Annotated[str, Field(description="Text to search for in emails.")],
mailbox: Annotated[str, Field(default="INBOX", description="Mailbox to search in.")] = "INBOX",
search_in: Annotated[
Literal["all", "subject", "body", "from"],
Field(default="all", description="Where to search: 'all' (headers+body), 'subject', 'body', or 'from'."),
] = "all",
page: Annotated[int, Field(default=1, description="Page number (starting from 1).")] = 1,
page_size: Annotated[int, Field(default=20, description="Number of results per page.")] = 20,
) -> dict:
handler = dispatch_handler(account_name)
return await handler.search_emails(query, mailbox, search_in, page, page_size)


@mcp.tool(description="Add a new email account configuration to the settings.")
async def add_email_account(email: EmailSettings) -> str:
settings = get_settings()
Expand Down Expand Up @@ -196,6 +225,50 @@ async def delete_emails(
return result


@mcp.tool(description="Mark one or more emails as read or unread. Use list_emails_metadata first to get the email_id.")
async def mark_emails_as_read(
account_name: Annotated[str, Field(description="The name of the email account.")],
email_ids: Annotated[
list[str],
Field(description="List of email_id to mark (obtained from list_emails_metadata)."),
],
mailbox: Annotated[str, Field(default="INBOX", description="The mailbox containing the emails.")] = "INBOX",
read: Annotated[bool, Field(default=True, description="True to mark as read, False to mark as unread.")] = True,
) -> str:
handler = dispatch_handler(account_name)
success_ids, failed_ids = await handler.mark_emails_as_read(email_ids, mailbox, read)

status = "read" if read else "unread"
result = f"Successfully marked {len(success_ids)} email(s) as {status}"
if failed_ids:
result += f", failed to mark {len(failed_ids)} email(s): {', '.join(failed_ids)}"
return result


@mcp.tool(
description="Move one or more emails to a different mailbox/folder. Common destinations: 'Archive', 'Trash', 'Spam'. Use list_emails_metadata first to get the email_id."
)
async def move_emails(
account_name: Annotated[str, Field(description="The name of the email account.")],
email_ids: Annotated[
list[str],
Field(description="List of email_id to move (obtained from list_emails_metadata)."),
],
destination_mailbox: Annotated[
str,
Field(description="Target mailbox name (e.g., 'Archive', 'Trash', 'Spam', '[Gmail]/All Mail')."),
],
source_mailbox: Annotated[str, Field(default="INBOX", description="Source mailbox.")] = "INBOX",
) -> str:
handler = dispatch_handler(account_name)
moved_ids, failed_ids = await handler.move_emails(email_ids, destination_mailbox, source_mailbox)

result = f"Successfully moved {len(moved_ids)} email(s) to '{destination_mailbox}'"
if failed_ids:
result += f", failed to move {len(failed_ids)} email(s): {', '.join(failed_ids)}"
return result


@mcp.tool(
description="Download an email attachment and save it to the specified path. This feature must be explicitly enabled in settings (enable_attachment_download=true) due to security considerations.",
)
Expand Down
58 changes: 58 additions & 0 deletions mcp_email_server/emails/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,70 @@ async def send_email(
references: Space-separated Message-IDs for the thread chain.
"""

@abc.abstractmethod
async def list_mailboxes(self) -> list[dict]:
"""
List all mailboxes (folders) in the email account.

Returns:
List of dictionaries with mailbox info (name, flags, delimiter).
"""

@abc.abstractmethod
async def search_emails(
self,
query: str,
mailbox: str = "INBOX",
search_in: str = "all",
page: int = 1,
page_size: int = 20,
) -> dict:
"""
Search emails using server-side IMAP SEARCH.

Args:
query: Text to search for.
mailbox: Mailbox to search in (default: "INBOX").
search_in: Where to search - "all", "subject", "body", "from".
page: Page number (starting from 1).
page_size: Number of results per page.

Returns:
Dictionary with query, total, page, and emails list.
"""

@abc.abstractmethod
async def delete_emails(self, email_ids: list[str], mailbox: str = "INBOX") -> tuple[list[str], list[str]]:
"""
Delete emails by their IDs. Returns (deleted_ids, failed_ids)
"""

@abc.abstractmethod
async def mark_emails_as_read(
self, email_ids: list[str], mailbox: str = "INBOX", read: bool = True
) -> tuple[list[str], list[str]]:
"""
Mark emails as read or unread. Returns (success_ids, failed_ids)

Args:
email_ids: List of email IDs to mark.
mailbox: The mailbox containing the emails (default: "INBOX").
read: True to mark as read, False to mark as unread.
"""

@abc.abstractmethod
async def move_emails(
self, email_ids: list[str], destination_mailbox: str, source_mailbox: str = "INBOX"
) -> tuple[list[str], list[str]]:
"""
Move emails to another mailbox. Returns (moved_ids, failed_ids)

Args:
email_ids: List of email IDs to move.
destination_mailbox: Target mailbox name (e.g., "Archive", "Trash").
source_mailbox: Source mailbox (default: "INBOX").
"""

@abc.abstractmethod
async def download_attachment(
self,
Expand Down
Loading
Loading