🤖 Private RAG Assistant - Chrome Extension (MV3)

On-the-fly Retrieval-Augmented Generation chatbot for any webpage using Groq LLM + local Ollama embeddings

📋 Table of Contents

Overview
Features
Architecture
Screenshots
Prerequisites
Installation
Quick Start
Configuration
How It Works
Development
Troubleshooting
Project Structure

🎯 Overview

Private RAG Assistant is a Chrome extension that brings intelligent question-answering capabilities to any webpage. It combines:

LLM Backend: Groq API (llama-3.3-70b-versatile) for fast, powerful responses
Embeddings: Local Ollama (nomic-embed-text) for semantic understanding
Zero Dependencies: Pure vanilla JavaScript, no npm, no external libraries
MV3 Compliant: Modern Chrome Extension Manifest V3 architecture
Privacy-First: All embeddings computed locally, only API keys stored locally

Extract page content → Chunk intelligently → Generate embeddings locally → Answer questions with context.

✨ Features

✅ Automatic Page Indexing: Extracts and processes webpage text on page load
✅ Local Embeddings: Uses Ollama for semantic search (no cloud embeddings)
✅ Smart Chunking: 500-char chunks with 100-char overlap for optimal context
✅ Rate-Limited Queue: Prevents API overload with 80ms delays
✅ Real-time Status: Live dashboard showing indexed chunks, queue status, API state
✅ Production UI: Modern gradient design with 420px × 600px standard extension size
✅ Secure Storage: API keys encrypted in chrome.storage.local
✅ Zero Setup: Default API key pre-configured, works out-of-the-box
✅ Error Handling: Comprehensive logging for debugging

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Chrome Extension (MV3)                   │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────────┐  ┌──────────────────┐                 │
│  │  Content Script  │  │  Popup UI        │                 │
│  │  (content.js)    │  │  (popup.html/js) │                 │
│  │  • Extract text  │  │  • User Q&A      │                 │
│  │  • Send to BG    │  │  • Status board  │                 │
│  └────────┬─────────┘  └────────┬─────────┘                 │
│           │                     │                            │
│           └──────────┬──────────┘                            │
│                      │ MESSAGE PASSING                       │
│                      ▼                                        │
│  ┌──────────────────────────────────────────────────────┐  │
│  │   Background Service Worker (background.js)         │  │
│  │                                                       │  │
│  │  • Page text reception                              │  │
│  │  • Text chunking (500 chars, 100 overlap)           │  │
│  │  • Embedding generation (Ollama local)              │  │
│  │  • Vector store management (in-memory)              │  │
│  │  • RAG context retrieval (cosine similarity)         │  │
│  │  • LLM queries (Groq API)                           │  │
│  │  • State & queue management                         │  │
│  └──────┬──────────────────────────┬────────────────────┘  │
│         │                          │                        │
│         ▼                          ▼                        │
│  ┌──────────────────┐      ┌──────────────────┐            │
│  │  Local Ollama    │      │  Groq API Cloud  │            │
│  │  (Embeddings)    │      │  (LLM Responses) │            │
│  │  Port: 11434     │      │  REST Endpoint   │            │
│  │  Model: nomic    │      │  llama-3.3-70b   │            │
│  └──────────────────┘      └──────────────────┘            │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Data Flow

Webpage Loaded
    ↓
Content Script Extracts Text (max 100KB)
    ↓
Sends PAGE_TEXT message to Background
    ↓
Text Chunked (500 chars, 100 overlap, max 8 chunks)
    ↓
Each Chunk → Ollama API (generate embedding vector)
    ↓
Vectors stored in vectorStore (with original text)
    ↓
User asks question in popup
    ↓
Question embedded via Ollama
    ↓
Cosine similarity search (find top 4 matching chunks)
    ↓
Context + Question sent to Groq LLM
    ↓
Response displayed in popup

📸 Screenshots

Screenshot 1: Main Interface

Clean, modern popup with gradient header and organized sections

Screenshot 2: Status Dashboard

Real-time monitoring of chunks indexed, page length, API status, and embedding queue

Screenshot 3: Question Answering

Ask questions about page content with instant RAG-powered responses

Screenshot 4: API Management

Secure API key storage with visual status indicators

📦 Prerequisites

Required

Chrome Browser: Version 88+ (for MV3 support)
Groq API Key: Get from console.groq.com
- Model: llama-3.3-70b-versatile
- Free tier available

Required (Local)

Ollama: Download from ollama.ai
- Command to run: ollama serve
- Default port: 11434
- Model to pull: ollama pull nomic-embed-text

Verify Setup

# Test Ollama is running
curl http://localhost:11434/api/tags

# Test embeddings endpoint
curl -X POST http://localhost:11434/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model":"nomic-embed-text","prompt":"test text"}'

🚀 Installation

Step 1: Clone or Download

git clone https://github.com/yourusername/OTF_RAG_Extension.git
cd OTF_RAG_Extension

Step 2: Load in Chrome

Open Chrome and navigate to chrome://extensions
Enable "Developer mode" (toggle in top-right corner)
Click "Load unpacked"
Select the extension folder
Extension appears in your Chrome toolbar

Step 3: Start Ollama

ollama serve

Keep this terminal window running in the background.

Step 4: Configure Groq API Key (Optional)

The extension comes with a default API key pre-configured. To use your own:

Get key from console.groq.com
Open any webpage
Click the extension icon → enter your Groq API key
Click "Save & Verify"

⚡ Quick Start

Start Ollama: ollama serve (in a terminal)
Load Extension: chrome://extensions → Load unpacked
Open Webpage: Visit any website (news, docs, tutorials, etc.)
Click Extension: Icon appears in toolbar
Ask a Question: "What is this page about?" → Get instant answers

That's it! The extension automatically indexes the page and answers based on actual content.

⚙️ Configuration

Background Service Worker (`background.js`)

Edit CONFIG object to customize:

const CONFIG = {
  CHUNK_SIZE: 500,              // Characters per chunk
  CHUNK_OVERLAP: 100,           // Overlap between chunks
  MAX_CHUNKS: 8,                // Max chunks to process per page
  TOP_RESULTS: 4,               // Relevant chunks to use in RAG
  EMBEDDING_RATE_LIMIT_MS: 80,  // Delay between embedding requests
  
  GROQ_URL: "https://api.groq.com/openai/v1/chat/completions",
  OLLAMA_URL: "http://localhost:11434/api/embeddings",
  OLLAMA_MODEL: "nomic-embed-text",
  GROQ_MODEL: "llama-3.3-70b-versatile"
};

API Key Storage

Keys are stored in chrome.storage.local (encrypted):

// Check current key (in DevTools console)
chrome.storage.local.get(null, console.log);

// Set new key
chrome.storage.local.set({
    GROQ_API_KEY: "your_key_here"
});

🔧 How It Works

1. Text Extraction (content.js)

When a page loads, the content script:

Extracts document.body.innerText (max 100KB)
Sends PAGE_TEXT message with URL and timestamp
Includes retry logic for already-open pages

const pageText = document.body.innerText.slice(0, 100000);
chrome.runtime.sendMessage({ type: "PAGE_TEXT", text: pageText, url });

2. Chunking (background.js)

Text is split into overlapping chunks for better context:

Text: "The quick brown fox jumps over the lazy dog"
Chunk 1: "The quick brown fox jumps"
Chunk 2: "brown fox jumps over the"
Chunk 3: "jumps over the lazy dog"

Window size: 500 characters
Overlap: 100 characters
Max chunks: 8 (prevents too many API calls)

3. Embedding Generation (background.js)

Each chunk is sent to local Ollama:

POST http://localhost:11434/api/embeddings
{
  "model": "nomic-embed-text",
  "prompt": "chunk text here"
}
→ { "embedding": [0.123, -0.456, ...768 dimensions] }

Rate-limited queue prevents API overload (80ms between requests).

4. RAG Retrieval (background.js)

When user asks a question:

Question is embedded via Ollama
Cosine similarity search finds top-4 matching chunks
Context assembled and sent to Groq LLM

similarity = cos(questionVector, chunkVector)
Top chunks ranked by similarity score
Context = "Based on content: [top 4 chunks]..."
Response = LLM(question + context)

5. LLM Response (background.js)

Groq generates intelligent answer:

POST https://api.groq.com/openai/v1/chat/completions
{
  "model": "llama-3.3-70b-versatile",
  "messages": [
    { "role": "system", "content": "You are helpful" },
    { "role": "user", "content": "Context: ... Question: ..." }
  ]
}

💻 Development

Project Structure

OTF_RAG_Extension/
├── manifest.json          # Extension configuration (MV3)
├── background.js          # Service Worker (490 lines)
│   ├── State management
│   ├── Text chunking
│   ├── Embedding coordination
│   ├── RAG retrieval
│   ├── LLM queries
│   └── Message handlers
├── content.js             # Content Script (66 lines)
│   ├── Page text extraction
│   ├── Message sending
│   └── Retry logic
├── popup.html             # UI Template (333 lines)
│   ├── Header with logo
│   ├── API key section
│   ├── Question input
│   ├── Response display
│   └── Status dashboard
├── popup.js               # Popup Logic (181 lines)
│   ├── API key management
│   ├── Question handling
│   ├── Status updates
│   └── Event listeners
├── icons/                 # Extension icons
├── public/                # Screenshots for docs
│   ├── 1.png
│   ├── 2.png
│   ├── 3.png
│   └── 4.png
└── README.md              # This file

Debugging

Open Service Worker Console:

chrome://extensions
Click "Inspect views" → "service worker"
View logs with [RAG BG] prefix

Open Popup Console:

Right-click extension icon
Select "Inspect popup"
View logs with [RAG Popup] prefix

Open Content Script Console:

Open DevTools (F12) on any webpage
View logs with [RAG Content] prefix

Logging

All components use prefixed console.log for easy filtering:

console.log("[RAG BG] Message");        // Background
console.log("[RAG Content] Message");   // Content script
console.log("[RAG Popup] Message");     // Popup UI

🐛 Troubleshooting

Extension Not Appearing in Toolbar

Reload extension: chrome://extensions → Click reload button
Check if MV3 is supported: Chrome 88+
Verify manifest.json is valid

Ollama 403 Errors

Symptoms: "Ollama response status: 403"

Solutions:

# Verify Ollama is running
ollama serve

# Verify model is loaded
ollama pull nomic-embed-text

# Test endpoint directly
curl http://localhost:11434/api/embeddings \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"model":"nomic-embed-text","prompt":"test"}'

No Text Being Indexed

Symptoms: "Chunks: 0" in status

Cause: Content script only runs on NEW pages after extension reload

Solution:

Reload extension in chrome://extensions
Open a fresh webpage (new tab)
Click extension icon
Check status → should show chunks

Groq API Errors

Get API key:

Visit console.groq.com
Sign up (free)
Create API key
Enter in extension popup

Verify key works:

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"Hi"}]}'

Empty Responses

Check Groq API key is configured (status shows "✓ Configured")
Verify Ollama is running and accessible
Check page has substantial text content
Look for errors in service worker console

📊 Status Dashboard

The popup displays real-time metrics:

Metric	Meaning
Chunks	Number of text chunks indexed from current page
Page Length	Characters extracted from webpage
API Key	Status of Groq API key (✓ Configured or ⚠️ Not set)
Queue	Pending embedding requests in queue

🔐 Security & Privacy

No data sent externally except to Groq API (for LLM queries)
Embeddings computed locally via Ollama (never sent anywhere)
API keys encrypted in chrome.storage.local
No tracking or analytics
No user data collected or stored on servers

📝 License

MIT License - Feel free to use, modify, and distribute

🤝 Contributing

Contributions welcome! Areas for enhancement:

📞 Support

Common Issues

Q: Extension crashes on startup? A: Check browser console for errors, ensure Chrome 88+

Q: Ollama connection refused? A: Run ollama serve in terminal before loading extension

Q: API key not saving? A: Try manually setting in DevTools:

chrome.storage.local.set({GROQ_API_KEY: "your_key"})

Q: Slow responses? A: Reduce MAX_CHUNKS in CONFIG or increase EMBEDDING_RATE_LIMIT_MS

Documentation Links

📈 Performance Metrics

Typical performance on modern hardware:

Operation	Time
Page text extraction	~50ms
Text chunking	~10ms
Single embedding	~200ms
8 chunks (queued)	~2s
RAG retrieval	~100ms
Groq LLM response	~2-5s
Total workflow	~7-10s

🎯 Roadmap

v0.2.0 (Next)

IndexedDB for persistent vectors
Multi-tab context aggregation
Conversation history export

v0.3.0 (Future)

Alternative embedding models
Custom LLM model support
Settings panel in popup

v1.0.0 (Stable)

Chrome Web Store release
Full test coverage
Performance optimizations

Made with ❤️ for smarter web browsing

Last Updated: January 2026

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
public		public
.gitignore		.gitignore
background.js		background.js
content.js		content.js
manifest.json		manifest.json
popup.html		popup.html
popup.js		popup.js
readme.md		readme.md
readme.pdf		readme.pdf
readme_tmp.html		readme_tmp.html

SuyashBhavalkar3/On-The-Fly-Rag-Extension-For-Chrome-Using-Ollama

Folders and files

Latest commit

History

Repository files navigation