Skip to content

A production-ready, enterprise-grade Agentic RAG ingestion pipeline built with n8n, Supabase (pgvector), and AI embeddings. Implements event-driven orchestration, hybrid RAG for structured and unstructured data, vector similarity search, and multi-tenant architecture to deliver client-isolated, retrieval-ready knowledge bases.

Notifications You must be signed in to change notification settings

anshwysmcbel2710/agentic-rag-n8n-ingestion-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿท๏ธ Project Title

AGENTIC RAG N8N PIPELINE

A multi-tenant, production-grade Agentic Retrieval-Augmented Generation (RAG) ingestion pipeline built using n8n, Supabase (PostgreSQL + pgvector), and AI embeddings.


๐Ÿงพ Executive Summary

This project delivers an end-to-end automated document intelligence pipeline that continuously monitors client-specific Google Drive folders, processes documents across multiple formats, converts them into semantic embeddings, and stores them in isolated vector databases for AI-powered retrieval.

The system is designed with enterprise principles:

  • Strict client-level data isolation
  • Event-driven ingestion and reprocessing
  • Hybrid support for unstructured and structured data
  • Scalable, auditable, and production-ready architecture

๐Ÿ“‘ Table of Contents

  1. ๐Ÿงฉ Project Overview
  2. ๐ŸŽฏ Objectives & Goals
  3. โœ… Acceptance Criteria
  4. ๐Ÿ’ป Prerequisites
  5. โš™๏ธ Installation & Setup
  6. ๐Ÿ”— API Documentation
  7. ๐Ÿ–ฅ๏ธ UI / Frontend
  8. ๐Ÿ”ข Status Codes
  9. ๐Ÿš€ Features
  10. ๐Ÿงฑ Tech Stack & Architecture
  11. ๐Ÿ› ๏ธ Workflow & Implementation
  12. ๐Ÿงช Testing & Validation
  13. ๐Ÿ” Validation Summary
  14. ๐Ÿงฐ Verification Testing Tools & Commands
  15. ๐Ÿงฏ Troubleshooting & Debugging
  16. ๐Ÿ”’ Security & Secrets
  17. โ˜๏ธ Deployment (Vercel)
  18. โšก Quick-Start Cheat Sheet
  19. ๐Ÿงพ Usage Notes
  20. ๐Ÿง  Performance & Optimization
  21. ๐ŸŒŸ Enhancements & Features
  22. ๐Ÿงฉ Maintenance & Future Work
  23. ๐Ÿ† Key Achievements
  24. ๐Ÿงฎ High-Level Architecture
  25. ๐Ÿ—‚๏ธ Project Structure
  26. ๐Ÿงญ How to Demonstrate Live
  27. ๐Ÿ’ก Summary, Closure & Compliance

๐Ÿงฉ Project Overview

The Agentic RAG n8n Pipeline functions as a backend intelligence system that transforms raw enterprise documents into a searchable, AI-ready knowledge base.

It is fully automated and designed to operate continuously with minimal operational overhead.


๐ŸŽฏ Objectives & Goals

  • Automate document ingestion without human intervention
  • Enable semantic search and RAG-based querying
  • Maintain strong tenant isolation per client
  • Support frequent document updates and re-indexing
  • Ensure data traceability and auditability

โœ… Acceptance Criteria

  • New files trigger ingestion automatically
  • Updated files reprocess without duplication
  • Vectors are stored only in client-specific tables
  • Semantic queries return relevant results
  • No cross-client data exposure is possible

๐Ÿ’ป Prerequisites

  • n8n (self-hosted or cloud)
  • Supabase project with pgvector enabled
  • Google Drive API credentials
  • OpenAI embedding API access
  • PostgreSQL 14 or newer

โš™๏ธ Installation & Setup

  1. Clone the repository
  2. Configure environment variables using .env.example
  3. Deploy Supabase and enable pgvector
  4. Execute schema.sql and match-function.sql per client
  5. Import the n8n workflow JSON
  6. Configure Google Drive folder paths

๐Ÿ”— API Documentation

  • Semantic search is exposed via Supabase RPC
  • Function name pattern: match_[client]_documents
  • Inputs: query embedding, match count, metadata filter
  • Outputs: ranked content chunks with similarity scores

๐Ÿ–ฅ๏ธ UI / Frontend

This repository intentionally excludes a frontend.

  • Designed as a backend intelligence layer
  • Consumable by web apps, chatbots, or internal tools
  • UI teams interact via Supabase APIs or custom services

๐Ÿ”ข Status Codes

  • 200 โ€“ Successful request
  • 400 โ€“ Invalid input
  • 401 โ€“ Unauthorized access
  • 500 โ€“ Internal system error

๐Ÿš€ Features

The AGENTIC-RAG-N8N-PIPELINE provides a comprehensive, enterprise-grade feature set designed for scalable, multi-tenant AI document intelligence systems.

Core Functional Features

  • Automated Document Ingestion
    • Event-driven ingestion triggered by Google Drive file creation and updates
    • Zero manual intervention after initial configuration
  • Multi-Format Document Support
    • PDF, Google Docs, Google Sheets
    • Excel (XLSX), CSV, Plain Text
  • Hybrid Data Handling
    • Unstructured text processed for semantic understanding
    • Structured tabular data preserved row-by-row for analytical queries
  • Client-Specific Isolation
    • Dedicated tables and vector stores per client
    • No cross-tenant access at any stage

AI & Retrieval Features

  • High-quality semantic embeddings generation
  • Cosine similarityโ€“based vector search
  • Metadata-filtered retrieval for scoped RAG responses
  • Re-indexing and cleanup on document updates

๐Ÿงฑ Tech Stack & Architecture

Technology Stack

LayerTechnologyPurpose
Workflow Orchestrationn8nEvent-driven automation and control flow
StorageSupabase (PostgreSQL)Metadata, rows, and vector persistence
Vector EnginepgvectorSemantic similarity search
AI EmbeddingsOpenAIText-to-vector transformation
Source SystemGoogle DriveDocument repository

ASCII Component Architecture

+--------------------+
|  Google Drive      |
|  (Client Folder)   |
+---------+----------+
          |
          v
+--------------------+
| n8n Workflow       |
| (Event Triggers)   |
+---------+----------+
          |
          v
+----------------------------+
| Document Processing Layer  |
| - Extract Text             |
| - Extract Tables           |
+---------+------------------+
          |
          v
+----------------------------+
| Embedding & Chunking       |
| - Text Splitter            |
| - Embedding Generator      |
+---------+------------------+
          |
          v
+----------------------------+
| Supabase (pgvector)        |
| - Client Vector Tables     |
| - Metadata & Rows          |
+----------------------------+

๐Ÿ› ๏ธ Workflow & Implementation

End-to-End Implementation Flow

  1. Client Initialization
    • Create client-specific database tables
    • Create semantic match function
  2. Drive Monitoring
    • File Created and File Updated triggers
    • Loop over affected files
  3. Cleanup Logic
    • Delete old document rows
    • Delete old vectors for reprocessing
  4. Document Download & Routing
    • Download file securely
    • Route by MIME type
  5. Extraction & Aggregation
    • Extract text and tabular data
    • Aggregate and summarize content
  6. Embedding & Storage
    • Generate embeddings
    • Insert into client vector table

๐Ÿงช Testing & Validation

ID Area Command / Action Expected Output Explanation
T01 Ingestion Upload document Vectors created Validates full pipeline
T02 Update Edit document Old vectors replaced Ensures idempotency

๐Ÿ” Validation Summary

  • All workflow phases executed successfully
  • No duplicate vectors detected
  • Semantic relevance verified

๐Ÿงฐ Verification Testing Tools & Command Examples

  • n8n execution logs
  • Supabase SQL editor
  • REST clients for RPC testing

๐Ÿงฏ Troubleshooting & Debugging

Common Issues & Resolutions

IssueRoot CauseResolution
No ingestion triggered Incorrect Drive folder path Verify trigger configuration and permissions
Embeddings not stored Dimension mismatch Ensure embedding model = 1536 dimensions
Duplicate vectors Cleanup logic skipped Verify delete nodes execute before insert

Debugging Strategy

  • Inspect n8n execution logs per node
  • Validate Supabase tables after each phase
  • Test match function manually via SQL editor

๐Ÿ”’ Security & Secrets

  • Secrets managed via environment variables
  • No credentials committed to repository
  • Client isolation enforced by schema design

โ˜๏ธ Deployment

The pipeline backend runs independently of frontend deployment.

  • n8n deployed via Docker, VM, or managed cloud
  • Supabase hosted or self-managed
  • Frontend applications deployed on Vercel consume APIs

Deployment Flow

  1. Deploy Supabase and enable pgvector
  2. Deploy n8n and configure environment variables
  3. Import workflow JSON
  4. Run client onboarding SQL

โšก Quick-Start Cheat Sheet

  • Create client schema and match function
  • Configure Google Drive folder
  • Upload a document
  • Wait for ingestion completion
  • Query semantic search via RPC

๐Ÿงพ Usage Notes

  • Use one Drive folder per client
  • Normalize client identifiers before onboarding
  • Re-uploading files safely triggers re-indexing

๐Ÿง  Performance & Optimization

  • ivfflat index tuning for large datasets
  • Optimal chunk sizing for retrieval accuracy
  • Batch processing control in n8n

๐ŸŒŸ Enhancements & Features

  • Multi-language embedding support
  • Streaming ingestion pipelines
  • Advanced ranking and reranking logic

๐Ÿงฉ Maintenance & Future Work

  • Introduce Row-Level Security (RLS)
  • Add ingestion health monitoring
  • Support additional storage providers

๐Ÿ† Key Achievements

  • Fully automated Agentic RAG ingestion system
  • Enterprise-grade multi-tenant isolation
  • Scalable and extensible architecture

๐Ÿงฎ High-Level Architecture

[User / App]
     |
     v
[Semantic Query]
     |
     v
[Supabase RPC]
     |
     v
[Vector Search]
     |
     v
[Relevant Chunks]
     |
     v
[LLM / RAG Layer]

๐Ÿ—‚๏ธ Project Structure

AGENTIC-RAG-N8N-PIPELINE/
โ”œโ”€โ”€ architecture/
โ”‚   โ”œโ”€โ”€ phase-1-client-db-creation.md
โ”‚   โ”œโ”€โ”€ phase-2-drive-monitoring.md
โ”‚   โ”œโ”€โ”€ phase-3-document-processing.md
โ”‚   โ”œโ”€โ”€ phase-4-data-aggregation.md
โ”‚   โ”œโ”€โ”€ phase-5-embedding-processing.md
โ”‚   โ””โ”€โ”€ phase-6-vector-storage.md
โ”œโ”€โ”€ database/
โ”‚   โ”œโ”€โ”€ schema.sql
โ”‚   โ””โ”€โ”€ match-function.sql
โ”œโ”€โ”€ screenshots/
โ”‚   โ”œโ”€โ”€ connections/
โ”‚   โ”‚   โ”œโ”€โ”€ 01-google-drive-credentials-and-permissions.png
โ”‚   โ”‚   โ”œโ”€โ”€ 02-supabase-database-connection-settings.png
โ”‚   โ”‚   โ”œโ”€โ”€ 03-postgres-node-configuration.png
โ”‚   โ”‚   โ”œโ”€โ”€ 04-openai-embeddings-node-settings.png
โ”‚   โ”‚   โ””โ”€โ”€ 05-n8n-global-and-workflow-settings.png
โ”‚   โ”œโ”€โ”€ phase-1-client-db-creation/
โ”‚   โ”‚   โ”œโ”€โ”€ overview.png
โ”‚   โ”‚   โ””โ”€โ”€ workflow-nodes.png
โ”‚   โ”œโ”€โ”€ phase-2-drive-monitoring/
โ”‚   โ”‚   โ”œโ”€โ”€ cleanup-logic.png
โ”‚   โ”‚   โ””โ”€โ”€ triggers.png
โ”‚   โ”œโ”€โ”€ phase-3-document-processing/
โ”‚   โ”‚   โ”œโ”€โ”€ extraction-nodes.png
โ”‚   โ”‚   โ””โ”€โ”€ switch-routing.png
โ”‚   โ”œโ”€โ”€ phase-4-data-aggregation/
โ”‚   โ”‚   โ”œโ”€โ”€ aggregation-flow.png
โ”‚   โ”‚   โ””โ”€โ”€ schema-handling.png
โ”‚   โ”œโ”€โ”€ phase-5-embedding-processing/
โ”‚   โ”‚   โ”œโ”€โ”€ chunking.png
โ”‚   โ”‚   โ””โ”€โ”€ embeddings.png
โ”‚   โ””โ”€โ”€ phase-6-vector-storage/
โ”‚       โ””โ”€โ”€ supabase-insert.png
โ”œโ”€โ”€ workflows/
โ”‚   โ””โ”€โ”€ agentic-rag-ingestion.n8n.json
โ”œโ”€โ”€ .env.example
โ”œโ”€โ”€ .gitignore
โ””โ”€โ”€ README.html

๐Ÿ’ก Summary, Closure & Compliance

This project delivers a robust, compliant, and enterprise-ready Agentic RAG ingestion platform. The architecture ensures scalability, security, and correctness while aligning with modern AI, data engineering, and cloud best practices.

It is suitable for production deployment, client demonstrations, and long-term extension as an organizational knowledge intelligence backbone.

End of README

About

A production-ready, enterprise-grade Agentic RAG ingestion pipeline built with n8n, Supabase (pgvector), and AI embeddings. Implements event-driven orchestration, hybrid RAG for structured and unstructured data, vector similarity search, and multi-tenant architecture to deliver client-isolated, retrieval-ready knowledge bases.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published