A personal search engine CLI that stores, crawls, and searches links using PostgreSQL with three search modes: BM25 keyword search, vector semantic search, and hybrid search combining both using Reciprocal Rank Fusion (RRF).
- Link Management: Add, list, remove, and organize URLs
- Web Crawling: Extract titles, descriptions, and content from pages using Playwright
- BM25 Full-Text Search: Fast keyword matching with
pg_textsearch - Semantic Vector Search: Find conceptually similar content using OpenAI embeddings via
pgvector - Hybrid Search: Combine keyword and semantic search with RRF for best results
- Web Interface: Browser-based UI for searching and managing links
- MCP Server: Expose search as tools for LLM integration (Claude, etc.)
- Search Caching: Automatic caching of hybrid search results for performance
- Python 3.12+
- uv (Python package manager)
- PostgreSQL with extensions:
- pg_textsearch (BM25 search)
- pgvector (vector similarity)
- pgai (AI embeddings)
Database: Use TigerData (free tier available) which includes all required extensions pre-installed with managed OpenAI API keys for embeddings.
git clone <repo-url>
cd search-engine
# Install globally as a uv tool
uv tool install -e .
# Or run without installing
uv run tars --helpplaywright install chromiumCreate a free database on TigerData, then add your connection string to a .env file:
# Option A: Single connection string
DATABASE_URL=postgresql://user:password@host:5432/dbname
# Option B: Individual variables
PGHOST=localhost
PGPORT=5432
PGDATABASE=tars
PGUSER=postgres
PGPASSWORD=secrettars db initThis creates:
linkstable with full-text search columns- BM25 index for keyword search
- Search cache table for hybrid search results
# Add embedding column and HNSW index
tars db vector init
# Generate embeddings for existing links
tars db vector embed# Add some links
tars add https://docs.python.org
tars add https://react.dev/learn
tars add https://www.postgresql.org/docs/
# Crawl to extract content
tars crawl
# Search (hybrid by default)
tars search "python web development"
# View all links
tars listtars add <url> # Add a new link
tars list # List all stored links (paginated)
tars list -n 20 -p 2 # Page 2 with 20 results per page
tars remove <url> # Remove by URL
tars remove "*.example.com" # Remove by glob pattern
tars update <url> # Update timestamp for a link
tars clean-list # Remove duplicate links (CSV mode only)# Hybrid search (BM25 + vector with RRF) - recommended
tars search "<query>"
tars search "machine learning" --keyword-weight 0.7 --vector-weight 0.3
tars search "python" --min-score 0.01 -n 20
# BM25 keyword search only
tars text_search "<query>"
tars text_search "postgresql tutorial" -n 10 -p 1
# Vector semantic search only
tars vector "<query>"
tars vector "how to build web apps" -n 10tars crawl # Crawl uncrawled links (default)
tars crawl <url> # Crawl a specific URL
tars crawl --all # Re-crawl all links
tars crawl --missing # Only crawl links never crawled
tars crawl --old 7 # Crawl links not crawled in last 7 daystars db init # Initialize database schema
tars db migrate # Import links from CSV to database
tars db status # Show database connection status
# Vector embedding management
tars db vector init # Add embedding column and HNSW index
tars db vector embed # Generate embeddings for all pending links
tars db vector embed -n 50 # Generate embeddings for 50 links
tars db vector status # Show embedding statustars web # Start web server at http://127.0.0.1:8000
tars web --port 3000 # Custom port
tars web --open # Open browser automatically
tars web --reload # Enable auto-reload for development# Run as stdio server (for local Claude Code)
tars mcp
# Run as HTTP/SSE server (for remote connections)
tars mcp --sse --port 8000Add to Claude Code's MCP config (~/.claude/claude_mcp_settings.json):
{
"mcpServers": {
"tars": {
"command": "tars",
"args": ["mcp"]
}
}
}- Uses
pg_textsearchextension with BM25 ranking algorithm - Best for exact keyword matching
- Fast and efficient for known terms
tars text_search "PostgreSQL performance tuning"- Uses OpenAI
text-embedding-3-smallviapgai - Finds conceptually similar content even without exact matches
- Great for natural language queries
tars vector "how do databases store data efficiently"- Combines BM25 and vector search using Reciprocal Rank Fusion (RRF)
- Best of both worlds: keyword precision + semantic understanding
- Adjustable weights to favor keywords or semantics
# Equal weights (default)
tars search "python machine learning"
# Favor keyword matches
tars search "exact error message" --keyword-weight 0.8 --vector-weight 0.2
# Favor semantic similarity
tars search "feeling anxious" --keyword-weight 0.3 --vector-weight 0.7src/tars/
├── __init__.py # CLI entry point and argument parsing
├── db.py # PostgreSQL operations (CRUD, search, embeddings)
├── crawl.py # Web crawling with Playwright
├── mcp/ # MCP server for LLM integration
│ ├── __init__.py
│ ├── server.py # FastMCP server with tools
│ └── models.py # Pydantic models for MCP responses
└── web/ # Web interface
├── __init__.py
├── app.py # FastAPI application
├── routes/ # API and page routes
└── templates/ # Jinja2 HTML templates
CREATE TABLE links (
id UUID PRIMARY KEY,
url TEXT UNIQUE NOT NULL,
title TEXT,
description TEXT,
content TEXT,
notes TEXT,
tags TEXT[],
hidden BOOLEAN DEFAULT FALSE,
added_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ,
crawled_at TIMESTAMPTZ,
http_status INTEGER,
crawl_error TEXT,
search_text TEXT GENERATED ALWAYS AS (...) STORED,
embedding vector(1536)
);Complete setup from scratch:
# 1. Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone the repository
git clone <repo-url>
cd search-engine
# 3. Install tars globally
uv tool install -e .
# 4. Install browser for crawling
playwright install chromium
# 5. Create database on TigerData (free tier available)
# Sign up at: https://tsdb.co/jm-pgtextsearch
# Create a service and copy the connection string
cat > .env << 'EOF'
DATABASE_URL=postgresql://user:password@host:5432/dbname
EOF
# 6. Initialize database
tars db init
# 7. Initialize vector search
tars db vector init
# 8. Add your first links
tars add https://docs.python.org
tars add https://react.dev
tars add https://www.postgresql.org
# 9. Crawl links to extract content
tars crawl
# 10. Generate embeddings for semantic search
tars db vector embed
# 11. Verify everything works
tars db status
tars db vector status
# 12. Search!
tars search "web development"
tars text_search "python"
tars vector "building modern applications"
# 13. (Optional) Start web interface
tars web --open
# 14. (Optional) Set up MCP for Claude Code
# Add to ~/.claude/claude_mcp_settings.json| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | - |
PGHOST |
Database host | - |
PGPORT |
Database port | 5432 |
PGDATABASE |
Database name | - |
PGUSER |
Database user | - |
PGPASSWORD |
Database password | - |
TARS_CACHE_TTL |
Search cache TTL in seconds | 3600 |
rich- Terminal output formattingpsycopg- PostgreSQL database accessplaywright- Web crawlingpython-dotenv- Environment configurationfastapi- Web interface APIuvicorn- ASGI serverjinja2- HTML templatingfastmcp- MCP server framework
MIT