feat: add batch message processing to Similarity Search API by 285729101 · Pull Request #540 · 1712n/dn-institute

285729101 · 2026-02-16T19:13:26Z

Summary

Adds a POST /batch endpoint to the Similarity Search API worker for processing multiple messages in a single request.

Methodology

Architecture: Separate Batch Endpoint

A dedicated POST /batch endpoint is introduced rather than modifying the existing POST / endpoint. This preserves backward compatibility and any caching mechanisms configured on the single-message endpoint.

Two-Level Deduplication for Cost Efficiency

The batch implementation reduces both AI and Vectorize costs through deduplication:

Text-level deduplication: Before calling Workers AI, duplicate texts are removed so each unique text is embedded only once. In workloads with repeated messages (e.g., near-duplicate detection against a reference set), this can significantly reduce embedding costs.
Query-level deduplication: Identical (text, namespace) pairs are queried against Vectorize only once. Results are mapped back to all matching entries in the original request, preserving response order.

Resource Efficiency

Operation	Single endpoint (N requests)	Batch endpoint (1 request, N messages)
AI calls	N	1
Vectorize queries	N	≤ N (reduced by deduplication)
HTTP round-trips	N	1

Cloudflare Workers Constraints

Batch size limit: 100 messages per request, matching the @cf/baai/bge-base-en-v1.5 model's maxItems constraint
Parallel execution: Vectorize queries run concurrently via Promise.all
No new bindings: Uses only existing AI and Vectorize bindings

Code Organization

Shared helper functions (embedTexts, querySimilarity) are extracted so both endpoints use the same embedding and query logic, reducing duplication and maintenance burden.

API

Request

POST /batch
X-API-Key: <key>
Content-Type: application/json

{
  "messages": [
    { "text": "first message", "namespace": "ns1" },
    { "text": "second message", "namespace": "ns2" }
  ]
}

Response

{
  "results": [
    { "similarity_score": 0.85 },
    { "similarity_score": 0.42 }
  ]
}

Error Responses

400 — invalid input (not an array, empty, exceeds limit, bad format)
401 — missing or invalid API key

Tests

Added comprehensive test coverage:

Valid batch request with multiple messages
Deduplication behavior (duplicate entries return identical scores)
Validation: non-array input, empty array, oversized batch, invalid entry format
Authentication on the batch endpoint
Updated AI mock to return proper embedding arrays

Fixes #431

🤖 Generated with Claude Code

Adds a POST /batch endpoint for processing multiple messages in a single request, with two levels of deduplication for cost efficiency: 1. Text deduplication: identical texts are embedded only once via a single AI call, reducing Workers AI costs in duplicate-heavy workloads 2. Query deduplication: identical (text, namespace) pairs are queried once against Vectorize, with results mapped back to all matching entries Design decisions: - Separate /batch endpoint preserves the existing single-message endpoint and any caching configured on it - Single AI embedding call per batch using the model's native array support - Parallel Vectorize queries via Promise.all for maximum throughput - Batch size limit of 100 (model's maxItems constraint) - Shared helper functions (embedTexts, querySimilarity) between endpoints Includes comprehensive tests for validation, deduplication, auth, and batch size enforcement. Fixes 1712n#431 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add batch message processing to Similarity Search API#540

feat: add batch message processing to Similarity Search API#540
285729101 wants to merge 1 commit into1712n:mainfrom
285729101:feat/batch-processing

285729101 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

285729101 commented Feb 16, 2026

Summary

Methodology

Architecture: Separate Batch Endpoint

Two-Level Deduplication for Cost Efficiency

Resource Efficiency

Cloudflare Workers Constraints

Code Organization

API

Request

Response

Error Responses

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments