perf: deduplicate identical texts within a single embed_batch() call

## Context

Deferred from PR #235 — [Copilot flagged redundant API calls for duplicate texts](https://github.com/nearai/ironclaw/pull/235#discussion_r2830025673), and we [replied it's a valid but low-priority future optimization](https://github.com/nearai/ironclaw/pull/235#discussion_r2830415779).

## Problem

If `embed_batch(&["foo", "bar", "foo"])` is called with duplicates and none are cached, `miss_texts` includes `"foo"` twice, resulting in a redundant API call for the same content. The cache still returns correct results — it just wastes an HTTP round-trip for the duplicate.

## Proposed Solution

Before calling the inner provider, group misses by cache key:

1. Build a `HashMap<String, Vec<usize>>` mapping unique text → list of original indices
2. Call the inner provider only for unique texts
3. Fan out the returned embeddings back to all original indices

## Why It Was Deferred

- Cache works correctly as-is — the redundant call is a minor inefficiency
- In practice, callers rarely pass duplicate texts in a single batch
- The dedup + fan-out logic adds complexity for minimal real-world savings

## Effort

Medium — need the dedup map, unique-only provider call, and index fan-out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: deduplicate identical texts within a single embed_batch() call #243

Context

Problem

Proposed Solution

Why It Was Deferred

Effort

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf: deduplicate identical texts within a single embed_batch() call #243

Description

Context

Problem

Proposed Solution

Why It Was Deferred

Effort

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions