A comprehensive Python application for tracking and analyzing company-level changes in Ministry of Corporate Affairs (MCA) data with AI-powered insights and conversational query capabilities.
The MCA Insights Engine consolidates state-wise MCA data, detects daily company-level changes, enriches company information using public web sources, and provides an intelligent interface for data exploration through AI-powered summaries and conversational queries.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Data Sources β β Data Pipeline β β AI Features β
β β β β β β
β β’ Maharashtra βββββΆβ β’ Integration βββββΆβ β’ Summary Gen β
β β’ Gujarat β β β’ Change Detect β β β’ Chat Engine β
β β’ Delhi β β β’ Web Enrichmentβ β β
β β’ Tamil Nadu β β β β β
β β’ Karnataka β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β User Interfaceβ
β β
β β’ Streamlit UI β
β β’ REST API β
β β’ Chat Interfaceβ
βββββββββββββββββββ
- Data Integration: Consolidates and normalizes state-wise MCA CSV files
- Change Detection: Tracks daily company-level changes (incorporations, deregistrations, field updates)
- Web Enrichment: Enriches company data using public APIs (ZaubaCorp, MCA API Setu, GST Portal)
- AI-Powered Insights: Generates automated daily summaries and conversational query interface
- Interactive Dashboard: Streamlit-based web interface with search, filters, and visualizations
- REST API: External integration endpoints for third-party applications
- Daily Summary Generation: Automated AI summaries of company changes
- Conversational Chat: Natural language queries about MCA data
- Trend Analysis: Pattern recognition and insights generation
- Python 3.8+
- Required Python packages (see requirements.txt)
-
Clone the repository
git clone <repository-url> cd MCA_Insights_Engine
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables (optional)
# Create .env file for OpenAI API key echo "OPENAI_API_KEY=your_api_key_here" > .env
python main.py --mode full# Data integration only
python main.py --mode data
# Change detection only
python main.py --mode changes
# Web enrichment only
python main.py --mode enrichment --sample-size 100
# Start dashboard
python main.py --mode dashboard
# Start API server
python main.py --mode apiMCA_Insights_Engine/
βββ main.py # Main orchestration script
βββ data_integration.py # Data consolidation and cleaning
βββ change_detection.py # Change tracking and logging
βββ web_enrichment.py # Web-based data enrichment
βββ ai_features.py # AI summary and chat functionality
βββ dashboard.py # Streamlit web interface
βββ api.py # REST API endpoints
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .env # Environment variables (optional)
βββ mca_insights.log # Application logs
βββ mca_insights.db # SQLite database
βββ data/ # Data files
βββ maharashtra.csv
βββ gujarat.csv
βββ delhi.csv
βββ tamil_nadu.csv
βββ karnataka.csv
βββ snapshot_day1.csv
βββ snapshot_day2.csv
βββ snapshot_day3.csv
OPENAI_API_KEY: OpenAI API key for AI features (optional, uses mock data if not provided)
- Default SQLite database:
mca_insights.db - Tables:
companies,company_changes
- Default port: 5000
- CORS enabled for cross-origin requests
- Loads state-wise CSV files
- Standardizes column structures
- Handles missing values and duplicates
- Creates consolidated master dataset
- Stores in SQLite database
- Compares daily snapshots
- Identifies new incorporations
- Tracks deregistrations/strike-offs
- Monitors field-level changes
- Generates structured change logs
- Samples companies with recent changes
- Enriches from multiple sources:
- ZaubaCorp (director information)
- MCA API Setu (company details)
- GST Portal (tax information)
- Saves enriched data to CSV
- Generates daily summaries using AI
- Provides conversational query interface
- Supports natural language questions
- Returns structured insights
- Dashboard Overview: Key metrics, charts, and recent changes
- Company Search: Search by CIN or company name
- Change Analysis: Visualizations of change patterns
- AI Chat: Conversational interface for data queries
- Reports: Export options and AI-generated summaries
GET /api/healthGET /api/search_company?q=<search_term>&type=<name|cin>GET /api/company/<cin>GET /api/dashboard/statsGET /api/changes/analysis?days=<days>POST /api/chat
Content-Type: application/json
{
"query": "Show new incorporations in Maharashtra"
}GET /api/companies?page=<page>&per_page=<per_page>&state=<state>&status=<status>Automatically generates concise daily reports highlighting:
- Total changes (incorporations, deregistrations, updates)
- State-wise breakdown
- Top fields modified
- Key insights and trends
Supports natural language queries such as:
- "Show new incorporations in Maharashtra"
- "How many companies were struck off last month?"
- "What are the top manufacturing sectors?"
- "List companies with authorized capital above βΉ10 lakh"
# Search for a specific company
GET /api/search_company?q=ANURIUSWELL&type=name
# Get company details
GET /api/company/U24299PN2019PTC181506
# Get dashboard statistics
GET /api/dashboard/stats# Natural language queries
POST /api/chat
{
"query": "Show me companies in the pharmaceutical sector"
}
POST /api/chat
{
"query": "What's the average authorized capital in Gujarat?"
}CIN: Corporate Identification NumberCompany_Name: Company nameState: State of registrationStatus: Company status (Active, Strike Off, etc.)Authorized_Capital: Authorized capital amountPaidup_Capital: Paid-up capital amountRegistration_Date: Date of incorporationIndustry_Classification: NIC code classification
CIN: Corporate Identification NumberChange_Type: Type of change (New Incorporation, Deregistration, Field Update)Field_Changed: Specific field that changedOld_Value: Previous valueNew_Value: New valueDate: Date of changeCompany_Name: Company nameState: StateStatus: Current status
- Update
web_enrichment.pywith new source methods - Add source configuration in
enrich_company()method - Update data schema if needed
- Modify
ai_features.pyfor new AI capabilities - Update prompt templates for different query types
- Add new conversational patterns
- Modify
dashboard.pyfor UI changes - Add new visualizations using Plotly
- Update filters and search functionality
The application uses Python's logging module with:
- File logging:
mca_insights.log - Console output
- Different log levels (INFO, ERROR, DEBUG)
# Test data integration
python data_integration.py
# Test change detection
python change_detection.py
# Test web enrichment
python web_enrichment.py
# Test AI features
python ai_features.pyUse tools like Postman or curl to test API endpoints:
curl -X GET "http://localhost:5000/api/health"
curl -X GET "http://localhost:5000/api/search_company?q=ANURIUSWELL&type=name"-
Database Connection Error
- Ensure SQLite database exists
- Check file permissions
-
Missing Dependencies
- Run
pip install -r requirements.txt - Check Python version compatibility
- Run
-
API Key Issues
- Verify OpenAI API key in
.envfile - Application will use mock data if key is missing
- Verify OpenAI API key in
-
Data Loading Errors
- Verify CSV files exist in correct location
- Check file formats and column names
Check mca_insights.log for detailed error information and debugging.
This project is developed as part of an assignment for MCA Insights Engine implementation.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review application logs
- Create an issue in the repository
Note: This is a working proxy implementation demonstrating the intended logic and integration with appropriate data sources. The system is designed to be extensible and can be enhanced with additional features as needed.