Skip to content

feat: add batch message processing to Similarity Search API#540

Open
285729101 wants to merge 1 commit into1712n:mainfrom
285729101:feat/batch-processing
Open

feat: add batch message processing to Similarity Search API#540
285729101 wants to merge 1 commit into1712n:mainfrom
285729101:feat/batch-processing

Conversation

@285729101
Copy link

Summary

Adds a POST /batch endpoint to the Similarity Search API worker for processing multiple messages in a single request.

Methodology

Architecture: Separate Batch Endpoint

A dedicated POST /batch endpoint is introduced rather than modifying the existing POST / endpoint. This preserves backward compatibility and any caching mechanisms configured on the single-message endpoint.

Two-Level Deduplication for Cost Efficiency

The batch implementation reduces both AI and Vectorize costs through deduplication:

  1. Text-level deduplication: Before calling Workers AI, duplicate texts are removed so each unique text is embedded only once. In workloads with repeated messages (e.g., near-duplicate detection against a reference set), this can significantly reduce embedding costs.

  2. Query-level deduplication: Identical (text, namespace) pairs are queried against Vectorize only once. Results are mapped back to all matching entries in the original request, preserving response order.

Resource Efficiency

Operation Single endpoint (N requests) Batch endpoint (1 request, N messages)
AI calls N 1
Vectorize queries N ≤ N (reduced by deduplication)
HTTP round-trips N 1

Cloudflare Workers Constraints

  • Batch size limit: 100 messages per request, matching the @cf/baai/bge-base-en-v1.5 model's maxItems constraint
  • Parallel execution: Vectorize queries run concurrently via Promise.all
  • No new bindings: Uses only existing AI and Vectorize bindings

Code Organization

Shared helper functions (embedTexts, querySimilarity) are extracted so both endpoints use the same embedding and query logic, reducing duplication and maintenance burden.

API

Request

POST /batch
X-API-Key: <key>
Content-Type: application/json

{
  "messages": [
    { "text": "first message", "namespace": "ns1" },
    { "text": "second message", "namespace": "ns2" }
  ]
}

Response

{
  "results": [
    { "similarity_score": 0.85 },
    { "similarity_score": 0.42 }
  ]
}

Error Responses

  • 400 — invalid input (not an array, empty, exceeds limit, bad format)
  • 401 — missing or invalid API key

Tests

Added comprehensive test coverage:

  • Valid batch request with multiple messages
  • Deduplication behavior (duplicate entries return identical scores)
  • Validation: non-array input, empty array, oversized batch, invalid entry format
  • Authentication on the batch endpoint
  • Updated AI mock to return proper embedding arrays

Fixes #431

🤖 Generated with Claude Code

Adds a POST /batch endpoint for processing multiple messages in a single
request, with two levels of deduplication for cost efficiency:

1. Text deduplication: identical texts are embedded only once via a single
   AI call, reducing Workers AI costs in duplicate-heavy workloads
2. Query deduplication: identical (text, namespace) pairs are queried once
   against Vectorize, with results mapped back to all matching entries

Design decisions:
- Separate /batch endpoint preserves the existing single-message endpoint
  and any caching configured on it
- Single AI embedding call per batch using the model's native array support
- Parallel Vectorize queries via Promise.all for maximum throughput
- Batch size limit of 100 (model's maxItems constraint)
- Shared helper functions (embedTexts, querySimilarity) between endpoints

Includes comprehensive tests for validation, deduplication, auth, and
batch size enforcement.

Fixes 1712n#431

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Similarity Search API Batch Processing

1 participant

Comments