A Streamlit web application that enables users to ask questions about YouTube videos using Retrieval-Augmented Generation (RAG). The app fetches video transcripts, creates embeddings, and uses AI to answer questions based on the video content.
Main Purpose: This project was created to understand and utilize different LangChain components in a practical, real-world application. It demonstrates the integration of various LangChain modules including text splitters, vector stores, embeddings, retrievers, prompts, and chains to build a complete RAG system.
- ๐ฏ RAG-powered Q&A: Ask natural language questions about any YouTube video
- ๐ Semantic Search: Advanced retrieval using vector embeddings
- ๐ง Context-aware Answers: AI generates answers based on actual video content
- ๐ Multi-language Support: Supports videos with English and Hindi transcripts
- ๐จ Clean UI: Beautiful Streamlit interface with intuitive design
- โก Fast Processing: Efficient vector storage and retrieval
- ๐ LangChain Integration: Demonstrates practical usage of multiple LangChain components
- ๐ Educational: Perfect for learning how to build RAG systems with LangChain
- Enter a YouTube URL
- Click "Process Video" to fetch transcript and create embeddings
- Ask any question about the video content
- Get AI-powered answers based on the actual transcript
- Python 3.8 or higher
- Perplexity API key (get it from Perplexity AI)
-
Clone the repository
git clone https://github.com/yourusername/youtube-rag-app.git cd youtube-rag-app -
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
cp .env.template .env # Edit .env and add your PPLX_API_KEY -
Run the application
streamlit run app.py
Create a .env file with the following variables:
# Required
PPLX_API_KEY=your_perplexity_api_key_here
# Optional
HUGGINGFACE_API_TOKEN=your_huggingface_token_hereyoutube-rag-app/
โโโ app.py # Main Streamlit application
โโโ src/
โ โโโ __init__.py
โ โโโ utils/
โ โโโ __init__.py
โ โโโ transcript_fetcher.py # YouTube transcript handling
โ โโโ vector_store.py # Text chunking and embeddings
โ โโโ rag_chain.py # RAG chain implementation
โโโ requirements.txt # Python dependencies
โโโ .env.template # Environment variables template
โโโ .gitignore # Git ignore rules
โโโ README.md # This file
This project demonstrates the practical implementation of various LangChain components:
- Transcript Fetching: Uses
youtube-transcript-apito extract video transcripts - Text Processing: Implements LangChain's
RecursiveCharacterTextSplitterfor optimal text chunking - Embeddings: Utilizes LangChain's
HuggingFaceEmbeddingswithsentence-transformers/all-mpnet-base-v2model - Vector Storage: Leverages LangChain's
FAISSintegration for efficient similarity search - Retrieval: Creates a retriever using LangChain's vector store interface
- Prompt Engineering: Uses LangChain's
PromptTemplatefor structured prompt creation - Chain Construction: Implements LangChain's
RunnableParallelandRunnableLambdafor complex workflows - Question Answering: Integrates everything into a complete RAG pipeline using LangChain's chain paradigm
- "What is the main topic of this video?"
- "Can you summarize the key points discussed?"
- "Who are the people mentioned in the video?"
- "What are the important concepts explained?"
- Educational content
- Tutorials and how-to videos
- Lectures and presentations
- Interviews and discussions
- Any video with available transcripts
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the RAG framework
- Streamlit for the web app framework
- Perplexity AI for the language model
- HuggingFace for embeddings models
- youtube-transcript-api for transcript extraction
- Requires videos to have available transcripts
- Transcript availability depends on YouTube's auto-generation or manual upload
- API rate limits may apply based on your Perplexity AI plan
- Processing time depends on video length and transcript size
- Support for multiple video URLs
- Transcript translation capabilities
- Chat history and conversation context
- Video timestamp references in answers
- Export Q&A sessions
- Support for additional language models
For questions or suggestions, please open an issue on GitHub or contact harshvirani.91@gmail.com.
Made with โค๏ธ and AI