AGENTIC RAG N8N PIPELINE
A multi-tenant, production-grade Agentic Retrieval-Augmented Generation (RAG) ingestion pipeline built using n8n, Supabase (PostgreSQL + pgvector), and AI embeddings.
This project delivers an end-to-end automated document intelligence pipeline that continuously monitors client-specific Google Drive folders, processes documents across multiple formats, converts them into semantic embeddings, and stores them in isolated vector databases for AI-powered retrieval.
The system is designed with enterprise principles:
- Strict client-level data isolation
- Event-driven ingestion and reprocessing
- Hybrid support for unstructured and structured data
- Scalable, auditable, and production-ready architecture
- ๐งฉ Project Overview
- ๐ฏ Objectives & Goals
- โ Acceptance Criteria
- ๐ป Prerequisites
- โ๏ธ Installation & Setup
- ๐ API Documentation
- ๐ฅ๏ธ UI / Frontend
- ๐ข Status Codes
- ๐ Features
- ๐งฑ Tech Stack & Architecture
- ๐ ๏ธ Workflow & Implementation
- ๐งช Testing & Validation
- ๐ Validation Summary
- ๐งฐ Verification Testing Tools & Commands
- ๐งฏ Troubleshooting & Debugging
- ๐ Security & Secrets
- โ๏ธ Deployment (Vercel)
- โก Quick-Start Cheat Sheet
- ๐งพ Usage Notes
- ๐ง Performance & Optimization
- ๐ Enhancements & Features
- ๐งฉ Maintenance & Future Work
- ๐ Key Achievements
- ๐งฎ High-Level Architecture
- ๐๏ธ Project Structure
- ๐งญ How to Demonstrate Live
- ๐ก Summary, Closure & Compliance
The Agentic RAG n8n Pipeline functions as a backend intelligence system that transforms raw enterprise documents into a searchable, AI-ready knowledge base.
It is fully automated and designed to operate continuously with minimal operational overhead.
- Automate document ingestion without human intervention
- Enable semantic search and RAG-based querying
- Maintain strong tenant isolation per client
- Support frequent document updates and re-indexing
- Ensure data traceability and auditability
- New files trigger ingestion automatically
- Updated files reprocess without duplication
- Vectors are stored only in client-specific tables
- Semantic queries return relevant results
- No cross-client data exposure is possible
- n8n (self-hosted or cloud)
- Supabase project with pgvector enabled
- Google Drive API credentials
- OpenAI embedding API access
- PostgreSQL 14 or newer
- Clone the repository
- Configure environment variables using
.env.example - Deploy Supabase and enable pgvector
- Execute
schema.sqlandmatch-function.sqlper client - Import the n8n workflow JSON
- Configure Google Drive folder paths
- Semantic search is exposed via Supabase RPC
- Function name pattern:
match_[client]_documents - Inputs: query embedding, match count, metadata filter
- Outputs: ranked content chunks with similarity scores
This repository intentionally excludes a frontend.
- Designed as a backend intelligence layer
- Consumable by web apps, chatbots, or internal tools
- UI teams interact via Supabase APIs or custom services
- 200 โ Successful request
- 400 โ Invalid input
- 401 โ Unauthorized access
- 500 โ Internal system error
The AGENTIC-RAG-N8N-PIPELINE provides a comprehensive, enterprise-grade feature set designed for scalable, multi-tenant AI document intelligence systems.
- Automated Document Ingestion
- Event-driven ingestion triggered by Google Drive file creation and updates
- Zero manual intervention after initial configuration
- Multi-Format Document Support
- PDF, Google Docs, Google Sheets
- Excel (XLSX), CSV, Plain Text
- Hybrid Data Handling
- Unstructured text processed for semantic understanding
- Structured tabular data preserved row-by-row for analytical queries
- Client-Specific Isolation
- Dedicated tables and vector stores per client
- No cross-tenant access at any stage
- High-quality semantic embeddings generation
- Cosine similarityโbased vector search
- Metadata-filtered retrieval for scoped RAG responses
- Re-indexing and cleanup on document updates
| Layer | Technology | Purpose |
|---|---|---|
| Workflow Orchestration | n8n | Event-driven automation and control flow |
| Storage | Supabase (PostgreSQL) | Metadata, rows, and vector persistence |
| Vector Engine | pgvector | Semantic similarity search |
| AI Embeddings | OpenAI | Text-to-vector transformation |
| Source System | Google Drive | Document repository |
+--------------------+
| Google Drive |
| (Client Folder) |
+---------+----------+
|
v
+--------------------+
| n8n Workflow |
| (Event Triggers) |
+---------+----------+
|
v
+----------------------------+
| Document Processing Layer |
| - Extract Text |
| - Extract Tables |
+---------+------------------+
|
v
+----------------------------+
| Embedding & Chunking |
| - Text Splitter |
| - Embedding Generator |
+---------+------------------+
|
v
+----------------------------+
| Supabase (pgvector) |
| - Client Vector Tables |
| - Metadata & Rows |
+----------------------------+
- Client Initialization
- Create client-specific database tables
- Create semantic match function
- Drive Monitoring
- File Created and File Updated triggers
- Loop over affected files
- Cleanup Logic
- Delete old document rows
- Delete old vectors for reprocessing
- Document Download & Routing
- Download file securely
- Route by MIME type
- Extraction & Aggregation
- Extract text and tabular data
- Aggregate and summarize content
- Embedding & Storage
- Generate embeddings
- Insert into client vector table
| ID | Area | Command / Action | Expected Output | Explanation |
|---|---|---|---|---|
| T01 | Ingestion | Upload document | Vectors created | Validates full pipeline |
| T02 | Update | Edit document | Old vectors replaced | Ensures idempotency |
- All workflow phases executed successfully
- No duplicate vectors detected
- Semantic relevance verified
- n8n execution logs
- Supabase SQL editor
- REST clients for RPC testing
| Issue | Root Cause | Resolution |
|---|---|---|
| No ingestion triggered | Incorrect Drive folder path | Verify trigger configuration and permissions |
| Embeddings not stored | Dimension mismatch | Ensure embedding model = 1536 dimensions |
| Duplicate vectors | Cleanup logic skipped | Verify delete nodes execute before insert |
- Inspect n8n execution logs per node
- Validate Supabase tables after each phase
- Test match function manually via SQL editor
- Secrets managed via environment variables
- No credentials committed to repository
- Client isolation enforced by schema design
The pipeline backend runs independently of frontend deployment.
- n8n deployed via Docker, VM, or managed cloud
- Supabase hosted or self-managed
- Frontend applications deployed on Vercel consume APIs
- Deploy Supabase and enable pgvector
- Deploy n8n and configure environment variables
- Import workflow JSON
- Run client onboarding SQL
- Create client schema and match function
- Configure Google Drive folder
- Upload a document
- Wait for ingestion completion
- Query semantic search via RPC
- Use one Drive folder per client
- Normalize client identifiers before onboarding
- Re-uploading files safely triggers re-indexing
- ivfflat index tuning for large datasets
- Optimal chunk sizing for retrieval accuracy
- Batch processing control in n8n
- Multi-language embedding support
- Streaming ingestion pipelines
- Advanced ranking and reranking logic
- Introduce Row-Level Security (RLS)
- Add ingestion health monitoring
- Support additional storage providers
- Fully automated Agentic RAG ingestion system
- Enterprise-grade multi-tenant isolation
- Scalable and extensible architecture
[User / App]
|
v
[Semantic Query]
|
v
[Supabase RPC]
|
v
[Vector Search]
|
v
[Relevant Chunks]
|
v
[LLM / RAG Layer]
AGENTIC-RAG-N8N-PIPELINE/ โโโ architecture/ โ โโโ phase-1-client-db-creation.md โ โโโ phase-2-drive-monitoring.md โ โโโ phase-3-document-processing.md โ โโโ phase-4-data-aggregation.md โ โโโ phase-5-embedding-processing.md โ โโโ phase-6-vector-storage.md โโโ database/ โ โโโ schema.sql โ โโโ match-function.sql โโโ screenshots/ โ โโโ connections/ โ โ โโโ 01-google-drive-credentials-and-permissions.png โ โ โโโ 02-supabase-database-connection-settings.png โ โ โโโ 03-postgres-node-configuration.png โ โ โโโ 04-openai-embeddings-node-settings.png โ โ โโโ 05-n8n-global-and-workflow-settings.png โ โโโ phase-1-client-db-creation/ โ โ โโโ overview.png โ โ โโโ workflow-nodes.png โ โโโ phase-2-drive-monitoring/ โ โ โโโ cleanup-logic.png โ โ โโโ triggers.png โ โโโ phase-3-document-processing/ โ โ โโโ extraction-nodes.png โ โ โโโ switch-routing.png โ โโโ phase-4-data-aggregation/ โ โ โโโ aggregation-flow.png โ โ โโโ schema-handling.png โ โโโ phase-5-embedding-processing/ โ โ โโโ chunking.png โ โ โโโ embeddings.png โ โโโ phase-6-vector-storage/ โ โโโ supabase-insert.png โโโ workflows/ โ โโโ agentic-rag-ingestion.n8n.json โโโ .env.example โโโ .gitignore โโโ README.html
This project delivers a robust, compliant, and enterprise-ready Agentic RAG ingestion platform. The architecture ensures scalability, security, and correctness while aligning with modern AI, data engineering, and cloud best practices.
It is suitable for production deployment, client demonstrations, and long-term extension as an organizational knowledge intelligence backbone.
End of README