Skip to content

RaghavSethi006/RAG-based-chatbot-for-custom-data-embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“š Streamlit RAG Chatbot App with Groq

This project is a Retrieval-Augmented Generation (RAG) pipeline built with Streamlit, LangChain, FAISS, and Groq LLMs. It allows you to upload PDF documents, embed and store them, and query the content using a powerful language model backend.

๐Ÿš€ Features

Upload and parse PDF documents with PyMuPDF (fitz)

Split text using RecursiveCharacterTextSplitter

Generate vector embeddings with HuggingFaceEmbeddings

Store and query embeddings in a FAISS vector database

Ask natural language questions and get context-aware answers via Groq LLMs

Simple, interactive Streamlit web UI

๐Ÿ› ๏ธ Tech Stack

Streamlit โ€“ UI framework

LangChain โ€“ RAG pipeline utilities

FAISS โ€“ Vector search

HuggingFace Embeddings โ€“ Embeddings

Groq โ€“ LLM inference

PyMuPDF โ€“ PDF parsing

๐Ÿ“‚ Project Structure . โ”œโ”€โ”€ app.py # Main Streamlit app
โ”œโ”€โ”€ requirements.txt # Dependencies
โ”œโ”€โ”€ .env # API keys (Groq, HuggingFace, etc.)
โ””โ”€โ”€ vectorstore/ # FAISS index (auto-created after upload)

  1. Add environment variables

Create a .env file:

GROQ_API_KEY=your_groq_api_key HF_MODEL=sentence-transformers/all-MiniLM-L6-v2

  1. Run the app streamlit run app.py

๐ŸŽฏ Usage

Open the Streamlit app in your browser

Upload one or more PDF files

Ask questions about the documents

Get AI-powered answers with citations from your PDFs

๐Ÿ”ฎ Future Improvements

Add support for multiple embedding models

Implement persistent vector storage (e.g., Pinecone, Weaviate)

Add document summarization and chat history

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages