feat: add batch message processing to Similarity Search API#540
Open
285729101 wants to merge 1 commit into1712n:mainfrom
Open
feat: add batch message processing to Similarity Search API#540285729101 wants to merge 1 commit into1712n:mainfrom
285729101 wants to merge 1 commit into1712n:mainfrom
Conversation
Adds a POST /batch endpoint for processing multiple messages in a single request, with two levels of deduplication for cost efficiency: 1. Text deduplication: identical texts are embedded only once via a single AI call, reducing Workers AI costs in duplicate-heavy workloads 2. Query deduplication: identical (text, namespace) pairs are queried once against Vectorize, with results mapped back to all matching entries Design decisions: - Separate /batch endpoint preserves the existing single-message endpoint and any caching configured on it - Single AI embedding call per batch using the model's native array support - Parallel Vectorize queries via Promise.all for maximum throughput - Batch size limit of 100 (model's maxItems constraint) - Shared helper functions (embedTexts, querySimilarity) between endpoints Includes comprehensive tests for validation, deduplication, auth, and batch size enforcement. Fixes 1712n#431 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
POST /batchendpoint to the Similarity Search API worker for processing multiple messages in a single request.Methodology
Architecture: Separate Batch Endpoint
A dedicated
POST /batchendpoint is introduced rather than modifying the existingPOST /endpoint. This preserves backward compatibility and any caching mechanisms configured on the single-message endpoint.Two-Level Deduplication for Cost Efficiency
The batch implementation reduces both AI and Vectorize costs through deduplication:
Text-level deduplication: Before calling Workers AI, duplicate texts are removed so each unique text is embedded only once. In workloads with repeated messages (e.g., near-duplicate detection against a reference set), this can significantly reduce embedding costs.
Query-level deduplication: Identical
(text, namespace)pairs are queried against Vectorize only once. Results are mapped back to all matching entries in the original request, preserving response order.Resource Efficiency
Cloudflare Workers Constraints
@cf/baai/bge-base-en-v1.5model'smaxItemsconstraintPromise.allCode Organization
Shared helper functions (
embedTexts,querySimilarity) are extracted so both endpoints use the same embedding and query logic, reducing duplication and maintenance burden.API
Request
Response
{ "results": [ { "similarity_score": 0.85 }, { "similarity_score": 0.42 } ] }Error Responses
400— invalid input (not an array, empty, exceeds limit, bad format)401— missing or invalid API keyTests
Added comprehensive test coverage:
Fixes #431
🤖 Generated with Claude Code