This application serves as a template for a completely local agentic Retrieval-Augmented Generation (RAG) setup with a clean web interface. It integrates vectorization capabilities using PhiData to organize and retrieve knowledge efficiently from various data sources. The application utilizes Open Hermes for retrieval and context refinement, and Llama 3.2 for response generation, providing a robust framework for knowledge-driven conversational AI.
The application supports:
- Local folders
- Drag and drop files
- AWS S3
- Google Drive
- Confluence
- Prerequisites
- Installation
- Configuration
- Usage
- Model Workflow
- Vectorization and Knowledge Base
- Customization
- License
- Additional Resources
- Contributing
Before installing and running the application, ensure you have the following installed on your system:
- Git
- Docker
- Docker Compose
- Ollama
- Required Models:
-
Llama 3.2 Model: Pull the model using Ollama with the command:
ollama pull llama3.2
-
OpenHermes Model: Pull the
openhermesmodel using Ollama with the command:ollama pull openhermes
-
Follow these steps to install and run the application:
Clone the repository to your local machine using git clone:
git clone https://github.com/JuliusPinsker/RAGForge.gitChange your current directory to the cloned repository:
cd RAGForgeUse Docker Compose to build the Docker images and start the services in detached mode:
docker compose up --build -dThis command will:
- Clone the PhiData repository.
- Build the application image using the provided
Dockerfile. - Start the PostgreSQL database and the application services as defined in
docker-compose.yml.
The embedding model used in this application can be specified in the app.py file within the create_knowledge_base function. The default embedding model is llama3.2, but the retrieval and response generation process leverages both openhermes and llama3.2:
- Open Hermes handles retrieval and context refinement from the knowledge base.
- Llama 3.2 generates the final responses.
Ensure that both models are pulled using Ollama before running the application.
Once the application is running, you can access the web interface at http://localhost:8501.
- File Browsing: Browse and select files from local directories to load into the knowledge base.
- Drag and Drop: Upload files directly through the web interface.
- External Connections: Connect and load files from AWS S3, Google Drive, and Confluence.
- PDF (
.pdf) - Text files (
.txt) - Markdown files (
.md)
- Interact with the Agent: Ask questions and receive responses based on the knowledge base.
- Source Inclusion: Responses include sources for better traceability.
- Chat History: View previous interactions for context.
This application employs Open Hermes and Llama 3.2 in a collaborative workflow to enhance retrieval-augmented generation:
- Task: Fetches relevant documents from the knowledge base.
- Functionality:
- Performs similarity searches against the vector database.
- Summarizes and refines retrieved content to provide the most relevant context for queries.
- Specialty: Ensures precise and domain-relevant information is selected for response generation.
- Task: Generates conversational responses based on the context retrieved by Open Hermes.
- Functionality:
- Processes retrieved context into natural language answers.
- Ensures the responses are coherent, detailed, and user-friendly.
- Incorporates sources into responses for traceability.
- Query Interpretation: The user’s query is passed to Open Hermes.
- Document Retrieval: Open Hermes retrieves and refines relevant documents from the knowledge base.
- Response Generation: Llama 3.2 generates the final response based on the retrieved context.
This application employs vectorization to represent documents and text files as high-dimensional numerical embeddings using PhiData. These embeddings allow efficient semantic search and retrieval of relevant documents based on user queries.
-
Vector Database:
- Utilizes
pgvector, a PostgreSQL extension for storing and querying vector embeddings. - Stores all document embeddings in a structured format for fast retrieval.
- Utilizes
-
Embedding Model:
- Embeddings are generated using the
OllamaEmbedderclass from PhiData, which leverages state-of-the-art models likellama3.2andopenhermes.
- Embeddings are generated using the
-
Knowledge Base Integration:
- The application uses
AgentKnowledge, a knowledge base class that connects to the vector database. - Handles the organization of embeddings, retrieval of documents, and inclusion of relevant sources in the responses.
- The application uses
You can customize various settings through the web interface:
- Database URL: Specify the database URL for the knowledge base.
- Embeddings Table Name: Define the table name where embeddings are stored.
- Reload Knowledge Base: Reload the knowledge base if changes are made.
- Credentials: Provide your AWS Access Key ID and Secret Access Key.
- Bucket Name: Specify the S3 bucket name to load files from.
- Credentials JSON: Paste your Google Drive service account credentials in JSON format.
- File Loading: Load supported files directly from your Google Drive.
- URL: Enter your Confluence instance URL.
- Username and API Token: Provide your Confluence username and API token.
- Content Loading: Load attachments from Confluence pages.
This project is licensed under the Mozilla Public License Version 2.0. See the License file for details.
To learn more about configuring the embedding model and other advanced features, please visit the PhiData documentation on Ollama Embedder.
