Skip to content

๐ŸŽฅ YouTube RAG Q&A App - Learn LangChain through Practice

License

Notifications You must be signed in to change notification settings

HarshVirani914/youtube-rag-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฅ YouTube RAG Question Answering App

A Streamlit web application that enables users to ask questions about YouTube videos using Retrieval-Augmented Generation (RAG). The app fetches video transcripts, creates embeddings, and uses AI to answer questions based on the video content.

Main Purpose: This project was created to understand and utilize different LangChain components in a practical, real-world application. It demonstrates the integration of various LangChain modules including text splitters, vector stores, embeddings, retrievers, prompts, and chains to build a complete RAG system.

Python Streamlit License

โœจ Features

  • ๐ŸŽฏ RAG-powered Q&A: Ask natural language questions about any YouTube video
  • ๐Ÿ” Semantic Search: Advanced retrieval using vector embeddings
  • ๐Ÿง  Context-aware Answers: AI generates answers based on actual video content
  • ๐ŸŒ Multi-language Support: Supports videos with English and Hindi transcripts
  • ๐ŸŽจ Clean UI: Beautiful Streamlit interface with intuitive design
  • โšก Fast Processing: Efficient vector storage and retrieval
  • ๐Ÿ”— LangChain Integration: Demonstrates practical usage of multiple LangChain components
  • ๐Ÿ“š Educational: Perfect for learning how to build RAG systems with LangChain

๐Ÿš€ Demo

  1. Enter a YouTube URL
  2. Click "Process Video" to fetch transcript and create embeddings
  3. Ask any question about the video content
  4. Get AI-powered answers based on the actual transcript

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8 or higher
  • Perplexity API key (get it from Perplexity AI)

Setup

  1. Clone the repository

    git clone https://github.com/yourusername/youtube-rag-app.git
    cd youtube-rag-app
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables

    cp .env.template .env
    # Edit .env and add your PPLX_API_KEY
  5. Run the application

    streamlit run app.py

๐Ÿ“ Environment Variables

Create a .env file with the following variables:

# Required
PPLX_API_KEY=your_perplexity_api_key_here

# Optional
HUGGINGFACE_API_TOKEN=your_huggingface_token_here

๐Ÿ—๏ธ Project Structure

youtube-rag-app/
โ”œโ”€โ”€ app.py                      # Main Streamlit application
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ transcript_fetcher.py   # YouTube transcript handling
โ”‚       โ”œโ”€โ”€ vector_store.py         # Text chunking and embeddings
โ”‚       โ””โ”€โ”€ rag_chain.py           # RAG chain implementation
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ”œโ”€โ”€ .env.template              # Environment variables template
โ”œโ”€โ”€ .gitignore                 # Git ignore rules
โ””โ”€โ”€ README.md                  # This file

๐Ÿ”ง How It Works

This project demonstrates the practical implementation of various LangChain components:

  1. Transcript Fetching: Uses youtube-transcript-api to extract video transcripts
  2. Text Processing: Implements LangChain's RecursiveCharacterTextSplitter for optimal text chunking
  3. Embeddings: Utilizes LangChain's HuggingFaceEmbeddings with sentence-transformers/all-mpnet-base-v2 model
  4. Vector Storage: Leverages LangChain's FAISS integration for efficient similarity search
  5. Retrieval: Creates a retriever using LangChain's vector store interface
  6. Prompt Engineering: Uses LangChain's PromptTemplate for structured prompt creation
  7. Chain Construction: Implements LangChain's RunnableParallel and RunnableLambda for complex workflows
  8. Question Answering: Integrates everything into a complete RAG pipeline using LangChain's chain paradigm

๐ŸŽฏ Usage Examples

Example Questions:

  • "What is the main topic of this video?"
  • "Can you summarize the key points discussed?"
  • "Who are the people mentioned in the video?"
  • "What are the important concepts explained?"

Supported Video Types:

  • Educational content
  • Tutorials and how-to videos
  • Lectures and presentations
  • Interviews and discussions
  • Any video with available transcripts

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

โš ๏ธ Limitations

  • Requires videos to have available transcripts
  • Transcript availability depends on YouTube's auto-generation or manual upload
  • API rate limits may apply based on your Perplexity AI plan
  • Processing time depends on video length and transcript size

๐Ÿ”ฎ Future Enhancements

  • Support for multiple video URLs
  • Transcript translation capabilities
  • Chat history and conversation context
  • Video timestamp references in answers
  • Export Q&A sessions
  • Support for additional language models

๐Ÿ“ง Contact

For questions or suggestions, please open an issue on GitHub or contact harshvirani.91@gmail.com.


Made with โค๏ธ and AI

Releases

No releases published

Packages

No packages published