Skip to content

[0.2.0v][Feature] PR indexing with dedicated collection and pr-duplicate CLI #43

@Kavirubc

Description

@Kavirubc

Overview

Extend the simili index command to optionally index pull requests into a dedicated Qdrant collection, and add a simili pr-duplicate CLI command to detect duplicate PRs against both issues and PRs.

Scope

  • Add --include-prs flag to simili index to index PR metadata (title, description, changed file paths, linked issues) into a separate collection
  • Add QDRANT_PR_COLLECTION / qdrant.pr_collection config option for the dedicated PR collection
  • Add simili pr-duplicate CLI command to query both collections and run LLM duplicate detection
  • processPullRequest worker: fetch PR details + file paths, build metadata text, embed and upsert
  • Dry-run support for PR indexing

Acceptance Criteria

  • simili index --include-prs indexes PRs into the configured PR collection
  • PR collection is created automatically if it does not exist
  • simili pr-duplicate --repo owner/repo --number 123 returns duplicate candidates from both collections
  • Falls back gracefully if no PR collection is configured
  • All existing issue indexing behaviour is unchanged
  • Tests cover the new indexing and query paths

Notes

Extracted from PR #40. Implementation reference: nick1udwig/simili-bot@index-and-query-prs.
Depends on: #42 (OpenAI provider support) or can be implemented independently with Gemini only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreRelated to core engineenhancementNew feature or requesttriageLabel for incoming issues

    Projects

    Status

    In Progress

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions