Skip to content

Private RAG Assistant – A Chrome extension that brings intelligent Q&A to any webpage using (RAG). Combines Groq's fast LLM (llama-3.3-70b) with local Ollama embeddings for complete privacy. Features automatic page indexing, smart chunking, vector search, a modern dashboard. Manifest V3 compliant, zero dependencies, works instantly

Notifications You must be signed in to change notification settings

SuyashBhavalkar3/On-The-Fly-Rag-Extension-For-Chrome-Using-Ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Private RAG Assistant - Chrome Extension (MV3)

On-the-fly Retrieval-Augmented Generation chatbot for any webpage using Groq LLM + local Ollama embeddings

Version Manifest Status


📋 Table of Contents


🎯 Overview

Private RAG Assistant is a Chrome extension that brings intelligent question-answering capabilities to any webpage. It combines:

  • LLM Backend: Groq API (llama-3.3-70b-versatile) for fast, powerful responses
  • Embeddings: Local Ollama (nomic-embed-text) for semantic understanding
  • Zero Dependencies: Pure vanilla JavaScript, no npm, no external libraries
  • MV3 Compliant: Modern Chrome Extension Manifest V3 architecture
  • Privacy-First: All embeddings computed locally, only API keys stored locally

Extract page content → Chunk intelligently → Generate embeddings locally → Answer questions with context.


✨ Features

  • Automatic Page Indexing: Extracts and processes webpage text on page load
  • Local Embeddings: Uses Ollama for semantic search (no cloud embeddings)
  • Smart Chunking: 500-char chunks with 100-char overlap for optimal context
  • Rate-Limited Queue: Prevents API overload with 80ms delays
  • Real-time Status: Live dashboard showing indexed chunks, queue status, API state
  • Production UI: Modern gradient design with 420px × 600px standard extension size
  • Secure Storage: API keys encrypted in chrome.storage.local
  • Zero Setup: Default API key pre-configured, works out-of-the-box
  • Error Handling: Comprehensive logging for debugging

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Chrome Extension (MV3)                   │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────────┐  ┌──────────────────┐                 │
│  │  Content Script  │  │  Popup UI        │                 │
│  │  (content.js)    │  │  (popup.html/js) │                 │
│  │  • Extract text  │  │  • User Q&A      │                 │
│  │  • Send to BG    │  │  • Status board  │                 │
│  └────────┬─────────┘  └────────┬─────────┘                 │
│           │                     │                            │
│           └──────────┬──────────┘                            │
│                      │ MESSAGE PASSING                       │
│                      ▼                                        │
│  ┌──────────────────────────────────────────────────────┐  │
│  │   Background Service Worker (background.js)         │  │
│  │                                                       │  │
│  │  • Page text reception                              │  │
│  │  • Text chunking (500 chars, 100 overlap)           │  │
│  │  • Embedding generation (Ollama local)              │  │
│  │  • Vector store management (in-memory)              │  │
│  │  • RAG context retrieval (cosine similarity)         │  │
│  │  • LLM queries (Groq API)                           │  │
│  │  • State & queue management                         │  │
│  └──────┬──────────────────────────┬────────────────────┘  │
│         │                          │                        │
│         ▼                          ▼                        │
│  ┌──────────────────┐      ┌──────────────────┐            │
│  │  Local Ollama    │      │  Groq API Cloud  │            │
│  │  (Embeddings)    │      │  (LLM Responses) │            │
│  │  Port: 11434     │      │  REST Endpoint   │            │
│  │  Model: nomic    │      │  llama-3.3-70b   │            │
│  └──────────────────┘      └──────────────────┘            │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Data Flow

Webpage Loaded
    ↓
Content Script Extracts Text (max 100KB)
    ↓
Sends PAGE_TEXT message to Background
    ↓
Text Chunked (500 chars, 100 overlap, max 8 chunks)
    ↓
Each Chunk → Ollama API (generate embedding vector)
    ↓
Vectors stored in vectorStore (with original text)
    ↓
User asks question in popup
    ↓
Question embedded via Ollama
    ↓
Cosine similarity search (find top 4 matching chunks)
    ↓
Context + Question sent to Groq LLM
    ↓
Response displayed in popup

📸 Screenshots

Screenshot 1: Main Interface

Main Interface Clean, modern popup with gradient header and organized sections

Screenshot 2: Status Dashboard

Status Dashboard Real-time monitoring of chunks indexed, page length, API status, and embedding queue

Screenshot 3: Question Answering

Q&A Interface Ask questions about page content with instant RAG-powered responses

Screenshot 4: API Management

API Configuration Secure API key storage with visual status indicators


📦 Prerequisites

Required

  • Chrome Browser: Version 88+ (for MV3 support)
  • Groq API Key: Get from console.groq.com
    • Model: llama-3.3-70b-versatile
    • Free tier available

Required (Local)

  • Ollama: Download from ollama.ai
    • Command to run: ollama serve
    • Default port: 11434
    • Model to pull: ollama pull nomic-embed-text

Verify Setup

# Test Ollama is running
curl http://localhost:11434/api/tags

# Test embeddings endpoint
curl -X POST http://localhost:11434/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model":"nomic-embed-text","prompt":"test text"}'

🚀 Installation

Step 1: Clone or Download

git clone https://github.com/yourusername/OTF_RAG_Extension.git
cd OTF_RAG_Extension

Step 2: Load in Chrome

  1. Open Chrome and navigate to chrome://extensions
  2. Enable "Developer mode" (toggle in top-right corner)
  3. Click "Load unpacked"
  4. Select the extension folder
  5. Extension appears in your Chrome toolbar

Step 3: Start Ollama

ollama serve

Keep this terminal window running in the background.

Step 4: Configure Groq API Key (Optional)

The extension comes with a default API key pre-configured. To use your own:

  1. Get key from console.groq.com
  2. Open any webpage
  3. Click the extension icon → enter your Groq API key
  4. Click "Save & Verify"

⚡ Quick Start

  1. Start Ollama: ollama serve (in a terminal)
  2. Load Extension: chrome://extensions → Load unpacked
  3. Open Webpage: Visit any website (news, docs, tutorials, etc.)
  4. Click Extension: Icon appears in toolbar
  5. Ask a Question: "What is this page about?" → Get instant answers

That's it! The extension automatically indexes the page and answers based on actual content.


⚙️ Configuration

Background Service Worker (background.js)

Edit CONFIG object to customize:

const CONFIG = {
  CHUNK_SIZE: 500,              // Characters per chunk
  CHUNK_OVERLAP: 100,           // Overlap between chunks
  MAX_CHUNKS: 8,                // Max chunks to process per page
  TOP_RESULTS: 4,               // Relevant chunks to use in RAG
  EMBEDDING_RATE_LIMIT_MS: 80,  // Delay between embedding requests
  
  GROQ_URL: "https://api.groq.com/openai/v1/chat/completions",
  OLLAMA_URL: "http://localhost:11434/api/embeddings",
  OLLAMA_MODEL: "nomic-embed-text",
  GROQ_MODEL: "llama-3.3-70b-versatile"
};

API Key Storage

Keys are stored in chrome.storage.local (encrypted):

// Check current key (in DevTools console)
chrome.storage.local.get(null, console.log);

// Set new key
chrome.storage.local.set({
    GROQ_API_KEY: "your_key_here"
});

🔧 How It Works

1. Text Extraction (content.js)

When a page loads, the content script:

  • Extracts document.body.innerText (max 100KB)
  • Sends PAGE_TEXT message with URL and timestamp
  • Includes retry logic for already-open pages
const pageText = document.body.innerText.slice(0, 100000);
chrome.runtime.sendMessage({ type: "PAGE_TEXT", text: pageText, url });

2. Chunking (background.js)

Text is split into overlapping chunks for better context:

Text: "The quick brown fox jumps over the lazy dog"
Chunk 1: "The quick brown fox jumps"
Chunk 2: "brown fox jumps over the"
Chunk 3: "jumps over the lazy dog"
  • Window size: 500 characters
  • Overlap: 100 characters
  • Max chunks: 8 (prevents too many API calls)

3. Embedding Generation (background.js)

Each chunk is sent to local Ollama:

POST http://localhost:11434/api/embeddings
{
  "model": "nomic-embed-text",
  "prompt": "chunk text here"
}
 { "embedding": [0.123, -0.456, ...768 dimensions] }

Rate-limited queue prevents API overload (80ms between requests).

4. RAG Retrieval (background.js)

When user asks a question:

  1. Question is embedded via Ollama
  2. Cosine similarity search finds top-4 matching chunks
  3. Context assembled and sent to Groq LLM
similarity = cos(questionVector, chunkVector)
Top chunks ranked by similarity score
Context = "Based on content: [top 4 chunks]..."
Response = LLM(question + context)

5. LLM Response (background.js)

Groq generates intelligent answer:

POST https://api.groq.com/openai/v1/chat/completions
{
  "model": "llama-3.3-70b-versatile",
  "messages": [
    { "role": "system", "content": "You are helpful" },
    { "role": "user", "content": "Context: ... Question: ..." }
  ]
}

💻 Development

Project Structure

OTF_RAG_Extension/
├── manifest.json          # Extension configuration (MV3)
├── background.js          # Service Worker (490 lines)
│   ├── State management
│   ├── Text chunking
│   ├── Embedding coordination
│   ├── RAG retrieval
│   ├── LLM queries
│   └── Message handlers
├── content.js             # Content Script (66 lines)
│   ├── Page text extraction
│   ├── Message sending
│   └── Retry logic
├── popup.html             # UI Template (333 lines)
│   ├── Header with logo
│   ├── API key section
│   ├── Question input
│   ├── Response display
│   └── Status dashboard
├── popup.js               # Popup Logic (181 lines)
│   ├── API key management
│   ├── Question handling
│   ├── Status updates
│   └── Event listeners
├── icons/                 # Extension icons
├── public/                # Screenshots for docs
│   ├── 1.png
│   ├── 2.png
│   ├── 3.png
│   └── 4.png
└── README.md              # This file

Debugging

Open Service Worker Console:

  1. chrome://extensions
  2. Click "Inspect views" → "service worker"
  3. View logs with [RAG BG] prefix

Open Popup Console:

  1. Right-click extension icon
  2. Select "Inspect popup"
  3. View logs with [RAG Popup] prefix

Open Content Script Console:

  1. Open DevTools (F12) on any webpage
  2. View logs with [RAG Content] prefix

Logging

All components use prefixed console.log for easy filtering:

console.log("[RAG BG] Message");        // Background
console.log("[RAG Content] Message");   // Content script
console.log("[RAG Popup] Message");     // Popup UI

🐛 Troubleshooting

Extension Not Appearing in Toolbar

  • Reload extension: chrome://extensions → Click reload button
  • Check if MV3 is supported: Chrome 88+
  • Verify manifest.json is valid

Ollama 403 Errors

Symptoms: "Ollama response status: 403"

Solutions:

# Verify Ollama is running
ollama serve

# Verify model is loaded
ollama pull nomic-embed-text

# Test endpoint directly
curl http://localhost:11434/api/embeddings \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"model":"nomic-embed-text","prompt":"test"}'

No Text Being Indexed

Symptoms: "Chunks: 0" in status

Cause: Content script only runs on NEW pages after extension reload

Solution:

  1. Reload extension in chrome://extensions
  2. Open a fresh webpage (new tab)
  3. Click extension icon
  4. Check status → should show chunks

Groq API Errors

Get API key:

  1. Visit console.groq.com
  2. Sign up (free)
  3. Create API key
  4. Enter in extension popup

Verify key works:

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"Hi"}]}'

Empty Responses

  • Check Groq API key is configured (status shows "✓ Configured")
  • Verify Ollama is running and accessible
  • Check page has substantial text content
  • Look for errors in service worker console

📊 Status Dashboard

The popup displays real-time metrics:

Metric Meaning
Chunks Number of text chunks indexed from current page
Page Length Characters extracted from webpage
API Key Status of Groq API key (✓ Configured or ⚠️ Not set)
Queue Pending embedding requests in queue

🔐 Security & Privacy

  • No data sent externally except to Groq API (for LLM queries)
  • Embeddings computed locally via Ollama (never sent anywhere)
  • API keys encrypted in chrome.storage.local
  • No tracking or analytics
  • No user data collected or stored on servers

📝 License

MIT License - Feel free to use, modify, and distribute


🤝 Contributing

Contributions welcome! Areas for enhancement:

  • Persistent vector store (IndexedDB)
  • Multi-tab context sharing
  • Alternative LLM backends
  • Alternative embedding models
  • User preference storage
  • Dark mode toggle
  • Export conversation history

📞 Support

Common Issues

Q: Extension crashes on startup? A: Check browser console for errors, ensure Chrome 88+

Q: Ollama connection refused? A: Run ollama serve in terminal before loading extension

Q: API key not saving? A: Try manually setting in DevTools:

chrome.storage.local.set({GROQ_API_KEY: "your_key"})

Q: Slow responses? A: Reduce MAX_CHUNKS in CONFIG or increase EMBEDDING_RATE_LIMIT_MS

Documentation Links


📈 Performance Metrics

Typical performance on modern hardware:

Operation Time
Page text extraction ~50ms
Text chunking ~10ms
Single embedding ~200ms
8 chunks (queued) ~2s
RAG retrieval ~100ms
Groq LLM response ~2-5s
Total workflow ~7-10s

🎯 Roadmap

v0.2.0 (Next)

  • IndexedDB for persistent vectors
  • Multi-tab context aggregation
  • Conversation history export

v0.3.0 (Future)

  • Alternative embedding models
  • Custom LLM model support
  • Settings panel in popup

v1.0.0 (Stable)

  • Chrome Web Store release
  • Full test coverage
  • Performance optimizations

Made with ❤️ for smarter web browsing

Last Updated: January 2026

About

Private RAG Assistant – A Chrome extension that brings intelligent Q&A to any webpage using (RAG). Combines Groq's fast LLM (llama-3.3-70b) with local Ollama embeddings for complete privacy. Features automatic page indexing, smart chunking, vector search, a modern dashboard. Manifest V3 compliant, zero dependencies, works instantly

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published