v0.1.5
Release v0.1.5
🎉 Major Features
🔧 Batch CLI Command
Process multiple issues from JSON files against the vector database without making any GitHub writes.
Features:
- Process arrays of issues from JSON files
- Support JSON (detailed) and CSV (stakeholder-friendly) output formats
- Concurrent processing with configurable worker pool
- Override collection, thresholds, and top-k via CLI flags
- Perfect for testing bot logic on historical data
Usage:
simili batch --file issues.json --out-file results.csv --format csv --workers 5Use Cases:
- Test bot logic on historical issues without spamming repos
- Generate analysis reports for stakeholders
- Validate similarity search and duplicate detection accuracy
- Audit quality assessment without requiring write access
🌐 Web UI (simili-web)
Interactive web interface for real-time issue analysis with minimal shadcn-style design.
Features:
- Submit issues via web form (title, body, org, repo)
- View similar issues with similarity scores
- See duplicate detection with LLM reasoning
- Get quality scores and improvement suggestions
- Receive label and transfer recommendations
- Single-binary deployment with embedded static files
Usage:
export GEMINI_API_KEY=xxx QDRANT_URL=xxx QDRANT_API_KEY=xxx QDRANT_COLLECTION=xxx
./simili-web
# Open http://localhost:8080🐛 Bug Fixes
Duplicate Detection Improvements
- Fixed: Related issues incorrectly marked as duplicates
- Root cause: Only titles were passed to LLM, not full issue bodies
- Solution: Now extracts and passes full text field from Qdrant payload
- Rewrote duplicate detection prompt to be more conservative
- Raised confidence threshold to 0.85
- Added DuplicateReason field to capture LLM reasoning
Before: Issue chain #8640 → #8641 → #8642 incorrectly marked as duplicates
After: Issues correctly identified as related but not duplicates
Transfer Detection
- Fixed transfer comment message when issue stays in current repo
- Prevent incorrect routing by including current repo in candidates
✨ Enhancements
Message Templates
- Redesigned all bot messages with professional branding
- Improved clarity and consistency across all interactions
Hybrid Routing
- Implemented hybrid routing with repository documentation learning
- Combines rule-based and VDB semantic matching for better accuracy
🔧 Technical Changes
- Extracted reusable ExecutePipeline() function for batch processing
- Added URL sanitization to prevent XSS vulnerabilities
- Improved error handling for JSON encoding operations
- Fixed parameter shadowing issues
- Updated embedding model to gemini-embedding-001
📚 Documentation
- Added comprehensive README for simili-web
- Updated main README with batch CLI usage
- Added API documentation and troubleshooting guides
- Included examples for both CLI and web UI
📦 What's Changed
Full Changelog: v0.1.0...v0.1.5