Skip to content

Perf: Use IMAP SORT and batch fetch for metadata retrieval#100

Closed
jbkjr wants to merge 24 commits intoai-zerolab:mainfrom
jbkjr:perf/batch-fetch-metadata
Closed

Perf: Use IMAP SORT and batch fetch for metadata retrieval#100
jbkjr wants to merge 24 commits intoai-zerolab:mainfrom
jbkjr:perf/batch-fetch-metadata

Conversation

@jbkjr
Copy link
Contributor

@jbkjr jbkjr commented Jan 15, 2026

Summary

Addresses the performance concern raised by @Wh1isper in #98 review. The previous fix fetched headers for ALL emails before sorting, which could cause performance issues with large mailboxes.

Before: n individual IMAP FETCH calls for n emails — O(n)
After: 2 batch IMAP calls regardless of mailbox size — O(1)

Changes

  • Add _has_sort_capability() to detect IMAP SORT extension (RFC 5256)
  • Add _batch_fetch_dates() for efficient date-only header fetching
  • Add _batch_fetch_headers() for batch full header fetching
  • Refactor get_emails_metadata_stream() with two paths:
    • SORT path: Use server-side sorting, then batch fetch only the page
    • Fallback path: Batch fetch Date headers, sort client-side, batch fetch page headers

Performance Comparison

Scenario (10,000 emails, page 1) Before After
Network calls 10,000 2
Memory usage ~20MB (all headers) ~300KB (dates) or ~20KB (page only)

Test plan

  • All 110 existing tests pass
  • Updated test_get_emails_stream to verify batch fetch behavior
  • Manual testing with SORT-capable server (Gmail)
  • Manual testing with non-SORT server (ProtonMail Bridge)

🤖 Generated with Claude Code

Jack Koch and others added 5 commits January 14, 2026 23:43
Add 6 new MCP tools for IMAP folder operations:
- list_folders: List all folders/mailboxes with flags
- move_emails: Move emails between folders (MOVE or COPY+DELETE fallback)
- copy_emails: Copy emails to folder (useful for labels in Proton Mail)
- create_folder: Create new folders
- delete_folder: Delete folders
- rename_folder: Rename folders

This enables full folder management through the MCP interface, with
special consideration for Proton Mail Bridge compatibility.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Addresses performance concern raised in ai-zerolab#98 review. The previous fix
fetched headers for ALL emails before sorting, causing O(n) network
calls for large mailboxes.

Changes:
- Add SORT capability detection (RFC 5256)
- When SORT supported: server-side sorting, fetch only page headers
- Fallback: batch fetch Date headers, sort, fetch page headers
- Reduces network calls from O(n) to O(2) for any mailbox size

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address maintainer feedback on PR ai-zerolab#99:

- Add enable_folder_management config flag (disabled by default)
- All folder management tools now require explicit opt-in
- Add MCP_EMAIL_SERVER_ENABLE_FOLDER_MANAGEMENT env var support
- Add comprehensive tests for folder management (30 new tests)
- Update README documentation with new setting

Tests cover:
- Permission checks when disabled (6 tests)
- Tool functionality when enabled (6 tests)
- Handler method tests (6 tests)
- EmailClient IMAP operation tests (6 tests)
- Edge cases (3 tests)
- Config tests (3 tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Jack Koch and others added 12 commits January 15, 2026 01:02
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Tests for _has_sort_capability helper
- Tests for _parse_date_from_header helper
- Tests for _batch_fetch_dates with success, empty, and error cases
- Tests for _batch_fetch_headers with success, empty, and error cases
- Tests for _parse_header_to_metadata including CC handling
- Tests for SORT path in get_emails_metadata_stream
- Tests for SORT fallback when SORT command fails
- Tests for empty search results
- Tests for ascending order
- Tests for pagination
- Tests for date fetch fallback

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log exception in _parse_date_from_header instead of silent pass (S110)
- Add noqa: C901 for get_emails_metadata_stream complexity
- Use RuntimeError instead of Exception in test (TRY002)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Different IMAP servers return UID in different positions:
- Some include UID in the FETCH line: b'1 FETCH (UID 1 BODY[...]'
- Others (like Proton Bridge) return UID separately: b' UID 1)'

Updated _batch_fetch_dates and _batch_fetch_headers to handle both
formats by tracking pending UID/data and emitting results when the
pair is complete, regardless of order.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract _append_header_metadata helper to reduce _batch_fetch_headers
complexity from 11 to under 10 (ruff C901).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 6 new MCP tools for managing ProtonMail labels:
- list_labels: List all labels (filters Labels/ prefix folders)
- apply_label: Apply label to emails (copy to Labels/X)
- remove_label: Remove label from emails (delete from Labels/X)
- get_email_labels: Get all labels for an email
- create_label: Create new label
- delete_label: Delete label

Labels in ProtonMail Bridge are exposed as IMAP folders under the
Labels/ prefix. These tools provide semantic operations for label
management while using the underlying folder operations.

Includes comprehensive tests (26 new tests, all passing).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add three new tri-state filter parameters:
- seen: True=read (SEEN), False=unread (UNSEEN), None=all
- flagged: True=starred (FLAGGED), False=not starred (UNFLAGGED), None=all
- answered: True=replied (ANSWERED), False=not replied (UNANSWERED), None=all

These filters enable compound searches like "unread emails from SenderX
in Labels/Y" by combining mailbox, from_address, and seen parameters.

Also updated mailbox parameter description to document label usage
(e.g., 'Labels/LabelName' for ProtonMail).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add clearer documentation for the mailbox parameter across all tools,
explaining standard IMAP folders and provider-specific paths for Gmail
([Gmail]/...) and ProtonMail Bridge (Folders/<name>, Labels/<name>).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jbkjr jbkjr closed this Jan 15, 2026
@jbkjr jbkjr deleted the perf/batch-fetch-metadata branch January 15, 2026 08:35
Jack Koch and others added 2 commits January 15, 2026 04:03
Add functionality to mark emails as read or unread using IMAP \Seen flag:
- EmailMarkResponse model for operation results
- mark_emails abstract method in EmailHandler
- Implementation in EmailClient and ClassicEmailHandler
- MCP tool exposed via app.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clarify that move_emails removes from source folder and apply_label
only tags without removing from INBOX.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jbkjr jbkjr restored the perf/batch-fetch-metadata branch January 15, 2026 09:21
@jbkjr jbkjr reopened this Jan 15, 2026
Jack Koch and others added 2 commits January 15, 2026 06:23
Cherry-picked test improvements from feature/folder-management:
- Comprehensive tests for folder management edge cases
- Tests for _parse_list_response exception handling
- Tests for EmailClient.delete_emails coverage
- Ruff RUF059 fixes for unused variables

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cherry-picked test improvements from pr/mark-read-unread:
- Comprehensive tests for mark_emails functionality
- Logout error test for mark_emails coverage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Wh1isper
Copy link
Member

Can you resolve the conflict?

@Wh1isper Wh1isper self-assigned this Jan 19, 2026
- Add test for batch_fetch_dates with UID after data
- Add test for batch_fetch_headers with UID after data
- Add test for bytes without UID match (continue path)

These tests cover the alternate IMAP response format used by
Proton Mail Bridge where UID comes after the data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Jack Koch and others added 2 commits January 25, 2026 08:34
- Add test for empty To header in _parse_header_to_metadata
- Add tests for non-bytes items (None, int) in _batch_fetch_dates loop
- Add tests for non-bytes items (None, int) in _batch_fetch_headers loop
- Achieves 100% branch coverage on batch fetch optimization code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test empty SORT response handling (lines 451-453)
- Test empty page after pagination with SORT (line 463)
- Test empty email_ids after split (lines 494-495)
- Test empty page in fallback path (line 518)
- Test logout error handling (lines 534-535)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jbkjr jbkjr force-pushed the perf/batch-fetch-metadata branch from a057364 to 813a274 Compare January 25, 2026 13:36
@jbkjr
Copy link
Contributor Author

jbkjr commented Jan 25, 2026

Merge conflicts resolved - rebased onto main. Ready for review.

@jbkjr
Copy link
Contributor Author

jbkjr commented Jan 25, 2026

Looks like #107 addresses this with a similar approach using INTERNALDATE (which is probably better than parsing Date headers anyway - faster and more reliable).

One thing from this PR that might still be valuable: the IMAP SORT capability check. When the server supports the SORT extension (RFC 5256), it can return UIDs already sorted, avoiding the need to fetch dates entirely. That's the optimal path when available.

Happy to close this PR, or I could extract just the SORT optimization as a smaller enhancement to #107 if there's interest.

@jbkjr
Copy link
Contributor Author

jbkjr commented Jan 26, 2026

Closing as superseded by upstream's PR #107 which implemented a similar batch fetch optimization.

@jbkjr jbkjr closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants