KAVACH is a powerful, privacy-centric text redaction and sanitation system built to shield sensitive user data from exposure to Large Language Models (LLMs) ๐ค.
It acts as a smart intermediary โ identifying and redacting Personally Identifiable Information (PII) from your text so you can interact with AI without compromising your privacy ๐.
๐ฏ Engineered using a multi-layered AI stack, KAVACH achieves over 89% accuracy in PII detection, making it a reliable privacy protector in digital communication.
-
๐ Intelligent PII Redaction
Detects and replaces sensitive entities like names, phone numbers, addresses, emails, and more. -
๐ฎ๐ณ Support for Indian PII
Recognizes Aadhaar, PAN, and other India-specific identifiers. -
๐ง Contextual Data Protection
Uses NER models to identify sensitive data even without clear patterns (like uncommon names or cities). -
๐ท๏ธ Text Sensitivity Classification
Labels input asconfidential,personal, orpublicusing LLM-based text classification. -
๐ Seamless LLM Integration
Ensures only redacted & safe text is passed to AI models.
KAVACH uses a 3-layer defense strategy ๐งฑ for robust privacy:
๐ ๏ธ Utilizes Presidio to scan and tag PII with recognizable formats:
- Phone โ
[PHONE_NUMBER] - Email โ
[EMAIL_ADDRESS] - Aadhaar, PAN โ
[ID_NUMBER]
๐ค Employs ai4bharat/IndicNER from HuggingFace for Named Entity Recognition:
- Detects names, locations, organizations, etc. without pattern reliance.
๐ง Leverages facebook/bart-large-mnli to assign sensitivity tags:
Confidential๐ก๏ธPersonal๐โโ๏ธPublic๐
Only after these steps is the sanitized text released for AI use.
Want to try it out? Hereโs how you can run it locally ๐งช:
- Python 3.8+
- pip (Python package installer)
- Git
git clone https://github.com/thechiranjeevvyas/Kavach.git
cd RAKSHakpython -m venv venv- macOS / Linux:
source venv/bin/activate- Windows (Command Prompt):
.\venv\Scripts\activate- Windows (PowerShell):
.\venv\Scripts\Activate.ps1โ Youโll see
(venv)in your terminal prompt when activated.
Install required Python packages:
pip install -r requirements.txtstreamlit
presidio-analyzer
python-dotenv
transformers
torch
groq
sentencepiece
accelerateTo connect with the LLM securely:
mkdir .streamlitGROQ_API_KEY = "your_groq_api_key_here"๐ Replace with your real API key from Groq Console.
streamlit run main2.py๐ Opens in your browser at: http://localhost:8501
Urgent internal memo: This document contains highly confidential information. Patient Anjali Sharma (DOB: 15/03/1988) visited Apollo Hospital on 2024-06-20 for follow-up. Her unique patient ID is PX7890123. The physician Dr. Rajesh Kumar (Mobile: +919876543210) noted her AADHAAR number 9876 5432 1098. She works for TechSolutions India. Employee ID E12ABU5678 is assigned to Mr. Vikram Singh, a senior analyst at State Bank of India. His PAN is ABCDE1234F and email is vikram.singh@examplebank.com. We also received a query from a Ministry of Defense official regarding vehicle registration DL01CD1234 for a new project located near the Air Force Station, Hindon. Please ensure all PII and sensitive project details are redacted before sharing any summaries. Contact our legal department at legal@techsolutions.com for further clarification. Voting ID of Ms. Priya Patel: ABC1234567. Passport No. K1234567.Could you please help me confirm my identity? My full name is Sarah Miller, and my private phone number is +1-202-555-0100. Thanks.Please redact details from this secure document. The official government ID for the operation is GVT-SEC-98765, linked to agent E78XYZ4321.- Fork the repository
- Create a new branch
- Commit changes
- Submit a pull request
For significant changes, open an issue to start a discussion.