🏷️ Project Title

AGENTIC RAG N8N PIPELINE

A multi-tenant, production-grade Agentic Retrieval-Augmented Generation (RAG) ingestion pipeline built using n8n, Supabase (PostgreSQL + pgvector), and AI embeddings.

🧾 Executive Summary

This project delivers an end-to-end automated document intelligence pipeline that continuously monitors client-specific Google Drive folders, processes documents across multiple formats, converts them into semantic embeddings, and stores them in isolated vector databases for AI-powered retrieval.

The system is designed with enterprise principles:

Strict client-level data isolation
Event-driven ingestion and reprocessing
Hybrid support for unstructured and structured data
Scalable, auditable, and production-ready architecture

📑 Table of Contents

🧩 Project Overview
🎯 Objectives & Goals
✅ Acceptance Criteria
💻 Prerequisites
⚙️ Installation & Setup
🔗 API Documentation
🖥️ UI / Frontend
🔢 Status Codes
🚀 Features
🧱 Tech Stack & Architecture
🛠️ Workflow & Implementation
🧪 Testing & Validation
🔍 Validation Summary
🧰 Verification Testing Tools & Commands
🧯 Troubleshooting & Debugging
🔒 Security & Secrets
☁️ Deployment (Vercel)
⚡ Quick-Start Cheat Sheet
🧾 Usage Notes
🧠 Performance & Optimization
🌟 Enhancements & Features
🧩 Maintenance & Future Work
🏆 Key Achievements
🧮 High-Level Architecture
🗂️ Project Structure
🧭 How to Demonstrate Live
💡 Summary, Closure & Compliance

🧩 Project Overview

The Agentic RAG n8n Pipeline functions as a backend intelligence system that transforms raw enterprise documents into a searchable, AI-ready knowledge base.

It is fully automated and designed to operate continuously with minimal operational overhead.

🎯 Objectives & Goals

Automate document ingestion without human intervention
Enable semantic search and RAG-based querying
Maintain strong tenant isolation per client
Support frequent document updates and re-indexing
Ensure data traceability and auditability

✅ Acceptance Criteria

New files trigger ingestion automatically
Updated files reprocess without duplication
Vectors are stored only in client-specific tables
Semantic queries return relevant results
No cross-client data exposure is possible

💻 Prerequisites

n8n (self-hosted or cloud)
Supabase project with pgvector enabled
Google Drive API credentials
OpenAI embedding API access
PostgreSQL 14 or newer

⚙️ Installation & Setup

Clone the repository
Configure environment variables using .env.example
Deploy Supabase and enable pgvector
Execute schema.sql and match-function.sql per client
Import the n8n workflow JSON
Configure Google Drive folder paths

🔗 API Documentation

Semantic search is exposed via Supabase RPC
Function name pattern: match_[client]_documents
Inputs: query embedding, match count, metadata filter
Outputs: ranked content chunks with similarity scores

🖥️ UI / Frontend

This repository intentionally excludes a frontend.

Designed as a backend intelligence layer
Consumable by web apps, chatbots, or internal tools
UI teams interact via Supabase APIs or custom services

🔢 Status Codes

200 – Successful request
400 – Invalid input
401 – Unauthorized access
500 – Internal system error

🚀 Features

The AGENTIC-RAG-N8N-PIPELINE provides a comprehensive, enterprise-grade feature set designed for scalable, multi-tenant AI document intelligence systems.

Core Functional Features

Automated Document Ingestion
- Event-driven ingestion triggered by Google Drive file creation and updates
- Zero manual intervention after initial configuration
Multi-Format Document Support
- PDF, Google Docs, Google Sheets
- Excel (XLSX), CSV, Plain Text
Hybrid Data Handling
- Unstructured text processed for semantic understanding
- Structured tabular data preserved row-by-row for analytical queries
Client-Specific Isolation
- Dedicated tables and vector stores per client
- No cross-tenant access at any stage

AI & Retrieval Features

High-quality semantic embeddings generation
Cosine similarity–based vector search
Metadata-filtered retrieval for scoped RAG responses
Re-indexing and cleanup on document updates

🧱 Tech Stack & Architecture

Technology Stack

Layer	Technology	Purpose
Workflow Orchestration	n8n	Event-driven automation and control flow
Storage	Supabase (PostgreSQL)	Metadata, rows, and vector persistence
Vector Engine	pgvector	Semantic similarity search
AI Embeddings	OpenAI	Text-to-vector transformation
Source System	Google Drive	Document repository

ASCII Component Architecture

+--------------------+
|  Google Drive      |
|  (Client Folder)   |
+---------+----------+
          |
          v
+--------------------+
| n8n Workflow       |
| (Event Triggers)   |
+---------+----------+
          |
          v
+----------------------------+
| Document Processing Layer  |
| - Extract Text             |
| - Extract Tables           |
+---------+------------------+
          |
          v
+----------------------------+
| Embedding & Chunking       |
| - Text Splitter            |
| - Embedding Generator      |
+---------+------------------+
          |
          v
+----------------------------+
| Supabase (pgvector)        |
| - Client Vector Tables     |
| - Metadata & Rows          |
+----------------------------+

🛠️ Workflow & Implementation

End-to-End Implementation Flow

Client Initialization
- Create client-specific database tables
- Create semantic match function
Drive Monitoring
- File Created and File Updated triggers
- Loop over affected files
Cleanup Logic
- Delete old document rows
- Delete old vectors for reprocessing
Document Download & Routing
- Download file securely
- Route by MIME type
Extraction & Aggregation
- Extract text and tabular data
- Aggregate and summarize content
Embedding & Storage
- Generate embeddings
- Insert into client vector table

🧪 Testing & Validation

ID	Area	Command / Action	Expected Output	Explanation
T01	Ingestion	Upload document	Vectors created	Validates full pipeline
T02	Update	Edit document	Old vectors replaced	Ensures idempotency

🔍 Validation Summary

All workflow phases executed successfully
No duplicate vectors detected
Semantic relevance verified

🧰 Verification Testing Tools & Command Examples

n8n execution logs
Supabase SQL editor
REST clients for RPC testing

🧯 Troubleshooting & Debugging

Common Issues & Resolutions

Issue	Root Cause	Resolution
No ingestion triggered	Incorrect Drive folder path	Verify trigger configuration and permissions
Embeddings not stored	Dimension mismatch	Ensure embedding model = 1536 dimensions
Duplicate vectors	Cleanup logic skipped	Verify delete nodes execute before insert

Debugging Strategy

Inspect n8n execution logs per node
Validate Supabase tables after each phase
Test match function manually via SQL editor

🔒 Security & Secrets

Secrets managed via environment variables
No credentials committed to repository
Client isolation enforced by schema design

☁️ Deployment

The pipeline backend runs independently of frontend deployment.

n8n deployed via Docker, VM, or managed cloud
Supabase hosted or self-managed
Frontend applications deployed on Vercel consume APIs

Deployment Flow

Deploy Supabase and enable pgvector
Deploy n8n and configure environment variables
Import workflow JSON
Run client onboarding SQL

⚡ Quick-Start Cheat Sheet

Create client schema and match function
Configure Google Drive folder
Upload a document
Wait for ingestion completion
Query semantic search via RPC

🧾 Usage Notes

Use one Drive folder per client
Normalize client identifiers before onboarding
Re-uploading files safely triggers re-indexing

🧠 Performance & Optimization

ivfflat index tuning for large datasets
Optimal chunk sizing for retrieval accuracy
Batch processing control in n8n

🌟 Enhancements & Features

Multi-language embedding support
Streaming ingestion pipelines
Advanced ranking and reranking logic

🧩 Maintenance & Future Work

Introduce Row-Level Security (RLS)
Add ingestion health monitoring
Support additional storage providers

🏆 Key Achievements

Fully automated Agentic RAG ingestion system
Enterprise-grade multi-tenant isolation
Scalable and extensible architecture

🧮 High-Level Architecture

[User / App]
     |
     v
[Semantic Query]
     |
     v
[Supabase RPC]
     |
     v
[Vector Search]
     |
     v
[Relevant Chunks]
     |
     v
[LLM / RAG Layer]

🗂️ Project Structure

AGENTIC-RAG-N8N-PIPELINE/
├── architecture/
│   ├── phase-1-client-db-creation.md
│   ├── phase-2-drive-monitoring.md
│   ├── phase-3-document-processing.md
│   ├── phase-4-data-aggregation.md
│   ├── phase-5-embedding-processing.md
│   └── phase-6-vector-storage.md
├── database/
│   ├── schema.sql
│   └── match-function.sql
├── screenshots/
│   ├── connections/
│   │   ├── 01-google-drive-credentials-and-permissions.png
│   │   ├── 02-supabase-database-connection-settings.png
│   │   ├── 03-postgres-node-configuration.png
│   │   ├── 04-openai-embeddings-node-settings.png
│   │   └── 05-n8n-global-and-workflow-settings.png
│   ├── phase-1-client-db-creation/
│   │   ├── overview.png
│   │   └── workflow-nodes.png
│   ├── phase-2-drive-monitoring/
│   │   ├── cleanup-logic.png
│   │   └── triggers.png
│   ├── phase-3-document-processing/
│   │   ├── extraction-nodes.png
│   │   └── switch-routing.png
│   ├── phase-4-data-aggregation/
│   │   ├── aggregation-flow.png
│   │   └── schema-handling.png
│   ├── phase-5-embedding-processing/
│   │   ├── chunking.png
│   │   └── embeddings.png
│   └── phase-6-vector-storage/
│       └── supabase-insert.png
├── workflows/
│   └── agentic-rag-ingestion.n8n.json
├── .env.example
├── .gitignore
└── README.html

💡 Summary, Closure & Compliance

This project delivers a robust, compliant, and enterprise-ready Agentic RAG ingestion platform. The architecture ensures scalability, security, and correctness while aligning with modern AI, data engineering, and cloud best practices.

It is suitable for production deployment, client demonstrations, and long-term extension as an organizational knowledge intelligence backbone.

End of README

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
architecture		architecture
database		database
screenshots		screenshots
workflows		workflows
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

anshwysmcbel2710/agentic-rag-n8n-ingestion-pipeline

Folders and files

Latest commit

History

Repository files navigation