Skip to content

KAVACH is a powerful, privacy-centric text redaction and sanitation system built to shield sensitive user data from exposure to Large Language Models (LLMs) ๐Ÿค–.

Notifications You must be signed in to change notification settings

thechiranjeevvyas/Kavach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ KAVACH: Privacy-First AI Interaction

๐Ÿ” "Your Personal Redaction Shield Before Talking to AI"


๐Ÿ“– Project Overview

KAVACH is a powerful, privacy-centric text redaction and sanitation system built to shield sensitive user data from exposure to Large Language Models (LLMs) ๐Ÿค–.

It acts as a smart intermediary โ€” identifying and redacting Personally Identifiable Information (PII) from your text so you can interact with AI without compromising your privacy ๐Ÿ”.

๐ŸŽฏ Engineered using a multi-layered AI stack, KAVACH achieves over 89% accuracy in PII detection, making it a reliable privacy protector in digital communication.


โœจ Features

  • ๐Ÿ” Intelligent PII Redaction
    Detects and replaces sensitive entities like names, phone numbers, addresses, emails, and more.

  • ๐Ÿ‡ฎ๐Ÿ‡ณ Support for Indian PII
    Recognizes Aadhaar, PAN, and other India-specific identifiers.

  • ๐Ÿง  Contextual Data Protection
    Uses NER models to identify sensitive data even without clear patterns (like uncommon names or cities).

  • ๐Ÿท๏ธ Text Sensitivity Classification
    Labels input as confidential, personal, or public using LLM-based text classification.

  • ๐Ÿ”— Seamless LLM Integration
    Ensures only redacted & safe text is passed to AI models.


๐Ÿงฌ How It Works: Layered Protection System

KAVACH uses a 3-layer defense strategy ๐Ÿงฑ for robust privacy:

1๏ธโƒฃ Pattern-Based Scanning

๐Ÿ› ๏ธ Utilizes Presidio to scan and tag PII with recognizable formats:

  • Phone โ†’ [PHONE_NUMBER]
  • Email โ†’ [EMAIL_ADDRESS]
  • Aadhaar, PAN โ†’ [ID_NUMBER]

2๏ธโƒฃ Contextual Detection (NER)

๐Ÿค– Employs ai4bharat/IndicNER from HuggingFace for Named Entity Recognition:

  • Detects names, locations, organizations, etc. without pattern reliance.

3๏ธโƒฃ Sensitivity Classification

๐Ÿง  Leverages facebook/bart-large-mnli to assign sensitivity tags:

  • Confidential ๐Ÿ›ก๏ธ
  • Personal ๐Ÿ™‹โ€โ™‚๏ธ
  • Public ๐ŸŒ

Only after these steps is the sanitized text released for AI use.


๐Ÿ’ป Local Setup & Run

Want to try it out? Hereโ€™s how you can run it locally ๐Ÿงช:

๐Ÿ”ง Prerequisites

  • Python 3.8+
  • pip (Python package installer)
  • Git

๐Ÿ› ๏ธ Setup Steps

๐Ÿ“… Step 1: Clone the Repository

git clone https://github.com/thechiranjeevvyas/Kavach.git
cd RAKSHak

๐Ÿงฑ Step 2: Create and Activate a Virtual Environment

Create a virtual environment:

python -m venv venv

Activate it:

  • macOS / Linux:
source venv/bin/activate
  • Windows (Command Prompt):
.\venv\Scripts\activate
  • Windows (PowerShell):
.\venv\Scripts\Activate.ps1

โœ… Youโ€™ll see (venv) in your terminal prompt when activated.

๐Ÿ“ฆ Step 3: Install Dependencies

Install required Python packages:

pip install -r requirements.txt

Sample requirements.txt content:

streamlit
presidio-analyzer
python-dotenv
transformers
torch
groq
sentencepiece
accelerate

๐Ÿ”‘ Step 4: Set Up Groq API Key

To connect with the LLM securely:

Create a .streamlit folder:

mkdir .streamlit

Inside it, create secrets.toml:

GROQ_API_KEY = "your_groq_api_key_here"

๐Ÿ” Replace with your real API key from Groq Console.

โ–ถ๏ธ Step 5: Run the Streamlit App

streamlit run main2.py

๐ŸŒ Opens in your browser at: http://localhost:8501


๐Ÿงช Testing Your Application

๐Ÿ”น Comprehensive Input

Urgent internal memo: This document contains highly confidential information. Patient Anjali Sharma (DOB: 15/03/1988) visited Apollo Hospital on 2024-06-20 for follow-up. Her unique patient ID is PX7890123. The physician Dr. Rajesh Kumar (Mobile: +919876543210) noted her AADHAAR number 9876 5432 1098. She works for TechSolutions India. Employee ID E12ABU5678 is assigned to Mr. Vikram Singh, a senior analyst at State Bank of India. His PAN is ABCDE1234F and email is vikram.singh@examplebank.com. We also received a query from a Ministry of Defense official regarding vehicle registration DL01CD1234 for a new project located near the Air Force Station, Hindon. Please ensure all PII and sensitive project details are redacted before sharing any summaries. Contact our legal department at legal@techsolutions.com for further clarification. Voting ID of Ms. Priya Patel: ABC1234567. Passport No. K1234567.

๐Ÿ”น Short Input: Personal Info

Could you please help me confirm my identity? My full name is Sarah Miller, and my private phone number is +1-202-555-0100. Thanks.

๐Ÿ”น Short Input: Government Sensitive Info

Please redact details from this secure document. The official government ID for the operation is GVT-SEC-98765, linked to agent E78XYZ4321.

๐Ÿค Contributing

  • Fork the repository
  • Create a new branch
  • Commit changes
  • Submit a pull request

For significant changes, open an issue to start a discussion.


About

KAVACH is a powerful, privacy-centric text redaction and sanitation system built to shield sensitive user data from exposure to Large Language Models (LLMs) ๐Ÿค–.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published