Skip to content

Perf: Batch fetch emails to reduce IMAP round trips#107

Merged
Wh1isper merged 2 commits intoai-zerolab:mainfrom
avarun42:perf/optimize-email-fetching
Jan 19, 2026
Merged

Perf: Batch fetch emails to reduce IMAP round trips#107
Wh1isper merged 2 commits intoai-zerolab:mainfrom
avarun42:perf/optimize-email-fetching

Conversation

@avarun42
Copy link
Contributor

@avarun42 avarun42 commented Jan 19, 2026

For context, the state of the current main branch was completely unusable with my email account before this diff. Massive n+1 query which would take many minutes before timing out.

Summary

  • Batch fetch INTERNALDATE for all UIDs in chunks of 5000 (Yahoo compatibility)
  • Sort and paginate in Python
  • Batch fetch full headers only for the requested page

Why INTERNALDATE?

Server receipt time is 40x faster to fetch than parsing Date headers, with negligible sorting differences (tested: max 20 position difference on 25k emails).

Performance

Scenario Before After
25k emails, page 1 25k fetches (~30 min) 5 date fetches + 1 header fetch (~2s)

Test plan

  • Unit tests for _parse_headers, _batch_fetch_dates, _batch_fetch_headers
  • Integration test for sorted pagination behavior

@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 83.69565% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
mcp_email_server/emails/classic.py 83.6% 9 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

@Wh1isper Wh1isper self-assigned this Jan 19, 2026
@Wh1isper
Copy link
Member

@avarun42 Thanks, I like this pr! I've made some changes to the parallelization. Do you think they're appropriate?

avarun42 and others added 2 commits January 19, 2026 19:45
Previously, get_emails_metadata_stream() fetched headers one-by-one for
ALL emails before sorting and paginating. For a mailbox with 25k emails,
this meant 25k IMAP round trips taking 30+ minutes.

Now uses a two-phase batch approach:
1. Batch fetch INTERNALDATE for all UIDs (chunked at 5000 for Yahoo)
2. Sort by date in Python, then paginate
3. Batch fetch full headers for the requested page only (typically 10)

Uses INTERNALDATE (server receipt time) instead of Date header for sorting.
Tested on 25k emails: max position difference of 20, average 0.1 vs
Date header sorting - negligible for UX, but 40x faster to fetch.

Performance: 25k emails page 1 goes from 30+ min to ~2 seconds.
- Parallelize _batch_fetch_dates using asyncio.gather for better performance
- Improve variable naming (t0/t1/t2 -> descriptive names)
- Restore helpful comments for code readability
- Fix type handling in _batch_fetch_headers (accept both bytes and str)
- Add comprehensive tests for batch methods

Co-Authored-By: Paintress <paintress@arcoer.com>
@Wh1isper Wh1isper force-pushed the perf/optimize-email-fetching branch from b413eb2 to 635d10e Compare January 19, 2026 11:46
@Wh1isper Wh1isper merged commit de08972 into ai-zerolab:main Jan 19, 2026
8 checks passed
jbkjr pushed a commit to jbkjr/mcp-email-server that referenced this pull request Jan 25, 2026
Merged latest upstream changes including:
- verify_ssl option for SMTP connections (PR ai-zerolab#105)
- Batch fetch emails performance optimization (PR ai-zerolab#107)
- Pre-commit updates

Preserved local features:
- Folder management tools
- Label management tools (ProtonMail)
- Mark emails as read/unread
- Improved mailbox parameter descriptions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jbkjr pushed a commit to jbkjr/mcp-email-server that referenced this pull request Jan 26, 2026
aioimaplib returns FETCH responses in 3 separate parts:
- i:   b'N FETCH (BODY[HEADER] {size}' - contains BODY[HEADER]
- i+1: bytearray(...)                   - raw header content
- i+2: b' UID N)'                       - contains UID

The original code assumed UID was on the same line as BODY[HEADER], but
aioimaplib separates them. This caused list_emails_metadata to return
empty results when used with actual IMAP servers.

Also fixes test mocks to use correct response format.

Fixes batch fetch regression introduced in PR ai-zerolab#107.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jbkjr pushed a commit to jbkjr/mcp-email-server that referenced this pull request Jan 26, 2026
aioimaplib returns FETCH responses in 3 separate parts:
- i:   b'N FETCH (BODY[HEADER] {size}' - contains BODY[HEADER]
- i+1: bytearray(...)                   - raw header content
- i+2: b' UID N)'                       - contains UID

The original code assumed UID was on the same line as BODY[HEADER], but
aioimaplib separates them. This caused list_emails_metadata to return
empty results when used with actual IMAP servers.

Also fixes test mocks to use correct response format.

Fixes batch fetch regression introduced in PR ai-zerolab#107.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jbkjr pushed a commit to jbkjr/mcp-email-server that referenced this pull request Jan 26, 2026
Remove TestParseHeaderToMetadata (uses non-existent _parse_header_to_metadata)
and TestGetEmailsStreamWithSort (tests SORT capability from PR ai-zerolab#107) that were
accidentally included during rebase. These test upstream's batch fetch
implementation, not the folder management feature.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants