Browser Agent CLI

A modular command-line agent that answers user queries by searching the web, scraping and summarizing content, and caching results for fast future retrieval. Supports both OpenAI and local embedding models, with robust vector search and persistent storage using ChromaDB.

Features

Query validation using LLMs (OpenAI or local)
Web scraping with Playwright (Yandex search, robust extraction)
Summarization of web content using LLMs
Semantic caching: stores and retrieves answers using vector similarity (ChromaDB or Pinecone)
Embeddings: switch between OpenAI (text-embedding-3-small, 1536-dim) and local SentenceTransformers (all-MiniLM-L6-v2, 384-dim)
Persistent storage: ChromaDB with .chroma directory
Diagnostics: scripts to view cache and vector DB contents
Error handling and clear output formatting

Installation

Clone this repo and cd into the directory.

Install dependencies:

pip install -r requirements.txt
playwright install

(Optional) Create a .env file with your OpenAI API key if using OpenAI models:
```
OPENAI_API_KEY=sk-...
```

Usage

Run a query from the command line:

python agent.py "your search query here"

The agent will check the cache, validate the query, scrape the web, summarize results, and store the answer for future use.
Cached/semantically similar answers are retrieved instantly.

Embedding Models & Similarity

Default: all-MiniLM-L6-v2 (local, 384-dim, fast, good for semantic similarity)
OpenAI: text-embedding-3-small (1536-dim, requires API key)
ChromaDB collections are dimension-locked. If you switch embedding models, delete the .chroma directory to reset.
Cosine similarity is preferred for semantic search. ChromaDB uses L2 by default, but you can compute cosine similarity manually if needed.

Technical Notes

ChromaDB uses ANN for fast vector search. L2 (Euclidean) distance is default, but for semantic similarity, cosine is better.
Example: "list places in delhi" vs "show places are delhi" should have high similarity. With OpenAI+L2, score was 0.71; with MiniLM+cosine, score was 0.88.
See test/readme.md for model comparison and more details.

Troubleshooting

ChromaDB dimension error: If you see Collection expecting embedding with dimension of 1536, got 384, delete the .chroma directory and rerun.
Yandex tracking URLs: The scraper now skips ad/tracking links to avoid timeouts.
API keys: Not prompted interactively. Set in .env if needed.

Scripts

view_chromaDB.py: View ChromaDB contents and diagnostics
view_cache_json.py: View cache.json contents

Requirements

See requirements.txt for all dependencies.

For more technical notes and model accuracy, see test/readme.md.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
test		test
vectorstore		vectorstore
agent.py		agent.py
embedder.py		embedder.py
readme.md		readme.md
requirements.txt		requirements.txt
scraper.py		scraper.py
summarizer.py		summarizer.py
validator.py		validator.py
view_chromaDB.py		view_chromaDB.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Browser Agent CLI

Features

Installation

Usage

Embedding Models & Similarity

Technical Notes

Troubleshooting

Scripts

Requirements

About

Uh oh!

Releases

Packages

Languages

Rhriti/browser_agent

Folders and files

Latest commit

History

Repository files navigation

Browser Agent CLI

Features

Installation

Usage

Embedding Models & Similarity

Technical Notes

Troubleshooting

Scripts

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages