Introducing a Federated, encrypted and private Retrieval Augmented Generation pipeline.
Welcome to the Federated RAG app! This project aims to build a encrypted end-to-end RAG flow on top of Syftbox. The projects aims to serve as building block of many downstream use-cases like peer-finder, code-quality-checker, code/project search tool, etc.
- Decentralized Architecture: Eliminates dependency on a central server; every participant runs the API locally.
- Privacy-Preserving Retrieval: Individual data remain confidential through homomorphic encryption.
- Open-source Tools: The entire pipeline is fuelled by open-source models and workflows.
- Automated Processing: The API automatically aggregates info for every participant over the network.
- SyftBox Integration: Leverages SyftBox's privacy controls and synchronization capabilities.
- Expects an
public/about_me.json(containing individual information and links to relevant pages) - Inbuilt web-scraper for information retrieval.
- Uses open-source LLM and Embedding model
- For each user creates a index and store in public folder
- For a given query makes a multi-query engine through a composed graph.
- GUI based interaction and result consolidation.
Each participant's SyftBox directory structure includes:
SyftBox/Datasites/<your_email>/api_data/fed-rag/
public/: Public folder where the API writes the generated responses.
Every 10 @irina seconds, the Fed-RAG API performs the following tasks:
- Data Scraping: Reads about_me JSON file from the
public/folder, performs web scraping and saves the scraped data topublic/bio.txt. - Index creation : The
public/bio.txtis fetched, converted to index and saved inpublic/vector_index. The vector embeddings are encrypted and saved alongside. - Query Engine: If the user-query is found/input through GUI then -> Retrieval and generation module
- Embedding Creation : The user query is converted to embedding vector and encrypted through HE.
- Index Aggregation: Aggregates encrypted and stored indexes using homomorphic properties and loads in memory.
- Retrieval: Perform top-k search using the encrypted query embedding and encypted indexes.
- Generation: Top-k fetched indexes are used to fetch corresponding text-blobs which is then put in an local LLM for generation process.
- Operating System: MacOS or Linux.
- SyftBox: Installed and set up on your machine.
SyftBox is required to run the Fed-RAG API. Install it by running the following command in your terminal:
curl -LsSf https://syftbox.openmined.org/install.sh | shFor more information, refer to the SyftBox Documentation.
After installing SyftBox:
• Option A: Using Git
Navigate to your SyftBox/apis/ directory and clone the Fed-RAG API repository:
cd ~/Desktop/SyftBox/apis
git clone https://github.com/siddhant230/federated_rag.git• Option B: Download ZIP
- Click below to download the ZIP file of the repository: Download Repository
- Extract the ZIP file.
- Move the extracted fed-rag folder into the
SyftBox/apis/directory.
SyftBox will automatically detect and install the new API during its execution cycle.
- Open in Browser: Paste the below URL into your browser’s address bar to access the fed-rag interface.
http://127.0.0.1:7860
- Copy and paste this template into your
public/about_me.json; if not already present, create a new about_me.json in thepublic/folder.{ "info" : "", "links":[], "resume_path":"" } - Fill required information in each of the fields. For example,
{ "info" : "Hi, this is xyz. I work on abcd and my cat name is Mr.CAT", "links":[https://github.com/xyz, <path to google scholar>, <path to portfolio website> <path_to_twitter>, <personal website>...], "resume_path":"my_resume.pdf" }
It could be done in two ways:
- Through GUI with chat interface.
- Open the web-app, type in your query and hit enter.
- The found results and generated responses along with your system performance would be displayed on the web-app.
- Using file level interactions
- @irina
We welcome contributions to enhance the Fed-RAG API:
- Fork the Repository: Click "Fork" on GitHub to create your copy.
- Create a Branch: Develop your feature or fix in a new branch.
- Submit a Pull Request: Provide a detailed description for review.
NA
• SyftBox: For providing the platform enabling decentralized applications. • Tenseal: For the homomorphic encryption. • OpenMined Community: Thank you to everyone who has contributed to this project during the 30DaysOfFLCode challenge!

