Skip to content

siddhant230/federated_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Federated-RAG

Introducing a Federated, encrypted and private Retrieval Augmented Generation pipeline.

Table of Contents

Introduction

Welcome to the Federated RAG app! This project aims to build a encrypted end-to-end RAG flow on top of Syftbox. The projects aims to serve as building block of many downstream use-cases like peer-finder, code-quality-checker, code/project search tool, etc.

Features

  • Decentralized Architecture: Eliminates dependency on a central server; every participant runs the API locally.
  • Privacy-Preserving Retrieval: Individual data remain confidential through homomorphic encryption.
  • Open-source Tools: The entire pipeline is fuelled by open-source models and workflows.
  • Automated Processing: The API automatically aggregates info for every participant over the network.
  • SyftBox Integration: Leverages SyftBox's privacy controls and synchronization capabilities.

How It Works

  1. Expects an public/about_me.json (containing individual information and links to relevant pages)
  2. Inbuilt web-scraper for information retrieval.
  3. Uses open-source LLM and Embedding model
  4. For each user creates a index and store in public folder
  5. For a given query makes a multi-query engine through a composed graph.
  6. GUI based interaction and result consolidation.

Data Flow

Each participant's SyftBox directory structure includes:

SyftBox/Datasites/<your_email>/api_data/fed-rag/

  • public/: Public folder where the API writes the generated responses.

API Operation Cycle

Every 10 @irina seconds, the Fed-RAG API performs the following tasks:

  1. Data Scraping: Reads about_me JSON file from the public/ folder, performs web scraping and saves the scraped data to public/bio.txt.
  2. Index creation : The public/bio.txt is fetched, converted to index and saved in public/vector_index. The vector embeddings are encrypted and saved alongside.
  3. Query Engine: If the user-query is found/input through GUI then -> Retrieval and generation module
    1. Embedding Creation : The user query is converted to embedding vector and encrypted through HE.
    2. Index Aggregation: Aggregates encrypted and stored indexes using homomorphic properties and loads in memory.
    3. Retrieval: Perform top-k search using the encrypted query embedding and encypted indexes.
    4. Generation: Top-k fetched indexes are used to fetch corresponding text-blobs which is then put in an local LLM for generation process.

Getting Started

Prerequisites

  • Operating System: MacOS or Linux.
  • SyftBox: Installed and set up on your machine.

Installation

1. Install SyftBox

SyftBox is required to run the Fed-RAG API. Install it by running the following command in your terminal:

curl -LsSf https://syftbox.openmined.org/install.sh | sh

For more information, refer to the SyftBox Documentation.

2. Set Up the Fed-RAG

After installing SyftBox:

Option A: Using Git

Navigate to your SyftBox/apis/ directory and clone the Fed-RAG API repository:

cd ~/Desktop/SyftBox/apis

git clone https://github.com/siddhant230/federated_rag.git

Option B: Download ZIP

  1. Click below to download the ZIP file of the repository: Download Repository
  2. Extract the ZIP file.
  3. Move the extracted fed-rag folder into the SyftBox/apis/ directory.

SyftBox will automatically detect and install the new API during its execution cycle.

Usage

Accessing the Web Interface

  1. Open in Browser: Paste the below URL into your browser’s address bar to access the fed-rag interface.
http://127.0.0.1:7860

Creating the about_me.json

  1. Copy and paste this template into your public/about_me.json; if not already present, create a new about_me.json in the public/ folder.
    {
     "info" : "",
     "links":[],
     "resume_path":""
     }
    
  2. Fill required information in each of the fields. For example,
    {
     "info" : "Hi, this is xyz. I work on abcd and my cat name is Mr.CAT",
     "links":[https://github.com/xyz, <path to google scholar>, <path to portfolio website>
             <path_to_twitter>,
             <personal website>...],
     "resume_path":"my_resume.pdf"
     }
    

Making Query

It could be done in two ways:

  1. Through GUI with chat interface.
    • Open the web-app, type in your query and hit enter.
    • The found results and generated responses along with your system performance would be displayed on the web-app.
  2. Using file level interactions
    • @irina

Contributing

We welcome contributions to enhance the Fed-RAG API:

  1. Fork the Repository: Click "Fork" on GitHub to create your copy.
  2. Create a Branch: Develop your feature or fix in a new branch.
  3. Submit a Pull Request: Provide a detailed description for review.

License

NA

Acknowledgments

SyftBox: For providing the platform enabling decentralized applications. • Tenseal: For the homomorphic encryption. • OpenMined Community: Thank you to everyone who has contributed to this project during the 30DaysOfFLCode challenge!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6