Skip to content

[Performance]: Parallelize Batch Embedding Generation #51

@Kavirubc

Description

@Kavirubc

Problem Statement

The EmbedBatch function used by the CLI index command currently processes embeddings one by one in a serial loop. This significantly slows down bulk indexing for large repositories.

Proposed Solution

  • Refactor EmbedBatch in internal/integrations/gemini/embedder.go to use goroutines (and errgroup) to parallelize embedding requests.
  • Since Gemini SDK supports concurrent calls, this will allow faster processing of chunks during bulk indexing.

Context

Identified during bulk indexing of the simili-bot repository.

Metadata

Metadata

Labels

coreRelated to core engineenhancementNew feature or requesthelp wantedExtra attention is neededperformancev0.2.0Target for v0.2.0

Type

No type

Projects

Status

Todo

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions