Skip to content
This repository was archived by the owner on Jan 22, 2026. It is now read-only.

Enterprise platform for automated tax credit processing with AI-powered PDF extraction, Gmail API integration, PostGIS geographic eligibility, and multi-tenant client management

Notifications You must be signed in to change notification settings

mordechaipotash/enterprise-tax-credit-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Enterprise Tax Credit Processing Platform

Production-grade enterprise application for automated federal tax credit processing with AI-powered document extraction, geographic eligibility verification, and multi-tenant client management.

Dashboard Overview Real-time dashboard showing application processing pipeline with Kanban-style status tracking

🎯 Overview

A comprehensive full-stack platform that automates the entire Work Opportunity Tax Credit (WOTC) processing workflowβ€”from email ingestion to final CSV exportβ€”saving organizations thousands of hours in manual document processing while ensuring compliance with federal requirements.

πŸ“Š Production Impact

  • 120+ hours development investment
  • Complexity: 9/10 (enterprise-grade system)
  • Impact Score: 95/100 (portfolio rank #2)
  • Multi-tenant architecture supporting multiple client organizations
  • Automated processing of 100+ applications daily
  • 86-column CSV export for federal reporting compliance
  • Geographic eligibility verification using PostGIS spatial queries

πŸ—οΈ System Architecture

Gmail API Monitoring β†’ PDF Extraction β†’ AI Data Processing β†’ Eligibility Verification β†’ CSV Export
         ↓                   ↓                  ↓                      ↓                    ↓
   Auto-fetch emails   Parse attachments   OpenAI extraction    PostGIS queries    86-column format
   Real-time inbox     Multi-page PDFs     Field validation     EZ/TEZ zones       Federal compliance
   Attachment filter   Text extraction     Confidence scores    County lookup       Auto-download

Complete Workflow Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Gmail API Watch   β”‚ Monitors inbox for new applications
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Email Processing   β”‚ Extracts PDF attachments, parses metadata
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AI Data Extraction β”‚ OpenAI extracts 86 fields from unstructured PDFs
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PostGIS Queries    β”‚ Geographic eligibility (EZ, Rural, TEZ zones)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Client Dashboard   β”‚ Real-time status tracking, bulk actions
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CSV Generation    β”‚ 86-column federal compliance export
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

1. Automated Email Ingestion (Gmail API)

  • Real-time monitoring of designated inbox accounts
  • Attachment extraction with PDF filtering
  • Metadata parsing (sender, subject, timestamp)
  • Auto-categorization by client and form type
  • Duplicate detection prevents reprocessing
  • Error handling for malformed or corrupted attachments

2. AI-Powered PDF Data Extraction

  • OpenAI GPT-4 for unstructured document parsing
  • 86-field extraction matching federal WOTC requirements
  • Multi-page support with intelligent segmentation
  • Confidence scoring for each extracted field
  • Validation rules ensure data quality
  • Manual review queue for low-confidence extractions

3. Geographic Eligibility Verification (PostGIS)

  • Spatial database queries for address-based eligibility
  • EZ (Empowerment Zone) verification with polygon intersections
  • Rural county designation using Census Bureau data
  • TEZ (Targeted Employment Zone) matching
  • County-level FIPS code lookup for reporting
  • Geocoding fallback for ambiguous addresses

4. Multi-Tenant Client Management

  • Client isolation with Row-Level Security (RLS)
  • Custom branding per organization
  • Role-based access (admin, processor, viewer)
  • Client-specific workflows and approval chains
  • Audit trails for compliance tracking
  • Billing integration for per-application processing fees

5. Real-Time Status Dashboard

  • Kanban board view (Inbox β†’ Processing β†’ Verified β†’ Exported)
  • Bulk operations for batch processing
  • Search & filtering by applicant, date, status, client
  • Analytics widgets showing completion rates and bottlenecks
  • Progress tracking with time-in-stage metrics
  • Priority flagging for urgent applications

Application Detail View Detailed application view with AI-extracted data and geographic eligibility verification

6. Federal Compliance CSV Export

  • 86-column format matching IRS specifications
  • Field mapping from internal schema to federal requirements
  • Data validation prevents submission errors
  • Batch export for multiple applications
  • Archive management for historical records
  • Audit logging of all exports for compliance

Batch Export Interface Dual-panel interface showing Form 8850 PDF alongside eligibility verification workflow

πŸ› οΈ Technical Stack

Frontend

  • Next.js 14 with App Router and React Server Components
  • React 18 with TypeScript for type-safe UI
  • Tailwind CSS for responsive, accessible design
  • shadcn/ui + Radix UI for enterprise components
  • TanStack Table for data-heavy grids with filtering
  • React Hook Form + Zod for complex form validation
  • date-fns for date manipulation and formatting

Backend & Database

  • Supabase - PostgreSQL backend with real-time subscriptions
  • PostgreSQL 15 with advanced features
  • PostGIS extension for spatial queries and geographic data
  • Row-Level Security (RLS) for multi-tenant isolation
  • Database triggers for audit logging
  • Materialized views for analytics performance

Integrations & APIs

  • Gmail API for email monitoring and attachment extraction
  • OpenAI GPT-4 via API for intelligent document parsing
  • Google Cloud Storage for PDF archival
  • Vercel for frontend deployment and edge functions
  • Railway for webhook services and background jobs

Data Processing

  • PDF parsing with pdf-lib and pdfjs-dist
  • AI extraction with structured prompts and validation
  • CSV generation with papaparse for compliance formatting
  • PostGIS spatial queries for geographic eligibility
  • Background job processing for heavy operations

πŸš€ System Capabilities

Automated Processing at Scale

  • 100+ applications/day processing capacity
  • <30 seconds average extraction time per PDF
  • 95%+ accuracy on AI field extraction (validated against manual review)
  • Zero-downtime deployments with blue-green strategy
  • Horizontal scaling via serverless architecture

Geographic Intelligence

  • Spatial indexing for <50ms eligibility queries
  • 10,000+ EZ/TEZ polygons loaded from federal datasets
  • County boundary data for all 3,143 U.S. counties
  • Address normalization before geocoding
  • Fallback strategies for edge cases (P.O. boxes, rural routes)

Data Quality & Validation

  • Real-time validation as data is extracted
  • Confidence thresholds trigger manual review
  • Cross-field validation (e.g., date logic, SSN format)
  • Duplicate detection across clients and time periods
  • Data enrichment with external lookups (ZIP+4, FIPS codes)

πŸ“± User Experience

Client Administrator Workflow

  1. Configure email integration (Gmail OAuth, forwarding rules)
  2. Set processing preferences (auto-approve thresholds, notification rules)
  3. Monitor dashboard for real-time pipeline status
  4. Review flagged cases requiring manual intervention
  5. Export batches for federal submission

Processor Workflow

  1. View inbox queue of new applications
  2. AI-extracted data pre-populated in review form
  3. Correct any errors flagged by validation
  4. Verify geographic eligibility with map visualization
  5. Approve and move to export queue

System Administrator Workflow

  1. Manage clients and their configurations
  2. Monitor AI extraction accuracy metrics
  3. Update geographic datasets (quarterly EZ changes)
  4. Troubleshoot failures with detailed error logs
  5. Generate reports for billing and compliance

πŸŽ“ Technical Highlights

Enterprise Architecture Patterns

  • Multi-tenancy with RLS ensuring data isolation
  • Event-driven architecture for email processing
  • Serverless functions for scalable AI extraction
  • Materialized views for dashboard analytics
  • Database indexes optimized for common queries

PostGIS Spatial Queries

-- Example: Check if address is in Empowerment Zone
SELECT COUNT(*) > 0 as is_ez
FROM empowerment_zones ez
WHERE ST_Contains(
  ez.geometry,
  ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
);

-- Example: Find county FIPS code
SELECT fips_code, county_name
FROM us_counties
WHERE ST_Contains(
  geometry,
  ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
);

AI Extraction Pipeline

  • Structured prompts with field definitions and examples
  • Validation layer catches extraction errors
  • Confidence scoring for each field
  • Fallback strategies for missing/unclear data
  • Cost optimization through caching and batching

Real-Time Features

  • Supabase subscriptions for live status updates
  • Optimistic UI for instant feedback
  • Background job status via polling
  • Notification system for completed processing

πŸ’Ό Business Value

ROI Metrics

  • 90% reduction in manual data entry time
  • 95%+ accuracy vs. 70% manual entry error rate
  • $50-100/application cost savings vs. manual processing
  • 5x faster application turnaround time
  • Federal compliance guaranteed through validation

Use Cases

  • Payroll Services: Process WOTC for client employees
  • Staffing Agencies: High-volume applicant screening
  • HR Departments: Internal employee tax credit capture
  • Tax Credit Consultants: Multi-client service bureau
  • Government Contractors: Compliance-heavy workflows

πŸ”’ Security & Compliance

Data Security

  • OAuth 2.0 for Gmail API authentication
  • API key rotation for OpenAI integration
  • Encrypted at rest (Supabase default encryption)
  • HTTPS/TLS for all data transmission
  • PII handling follows federal guidelines

Compliance Features

  • Audit logging of all data access and modifications
  • Role-based permissions prevent unauthorized access
  • Data retention policies configurable per client
  • Export controls limit who can download sensitive data
  • HIPAA-aware architecture (though not certified)

Multi-Tenant Isolation

  • Row-Level Security enforces client boundaries
  • Separate schemas option for enterprise clients
  • API rate limiting per client
  • Resource quotas prevent abuse

πŸ“‚ Data Model (Simplified)

-- Core tables (sanitized schema)
applications (
  id, client_id, applicant_name, ssn_hash,
  address, city, state, zip, county_fips,
  eligibility_flags (ez, rural, veteran, etc.),
  extraction_confidence, processing_status,
  created_at, processed_at, exported_at
)

clients (
  id, organization_name, billing_tier,
  gmail_watch_address, auto_approve_threshold,
  branding_config, created_at
)

geographic_zones (
  id, zone_type (EZ|TEZ|Rural), 
  geometry (PostGIS), effective_date, 
  fips_code, metadata
)

audit_log (
  id, user_id, action, resource_type, 
  resource_id, changes_json, timestamp
)

🌟 Innovation Showcase

Why This Project Stands Out:

  • Geographic AI: Combines AI extraction with PostGIS spatial intelligence
  • End-to-end automation: Gmail β†’ AI β†’ Database β†’ Export pipeline
  • Federal compliance: 86-column CSV matches IRS specifications exactly
  • Multi-tenant SaaS: Production architecture supporting multiple clients
  • Real-world scale: Processing 100+ applications daily in production

Recruiter Signals:

  • Enterprise-grade multi-tenant architecture
  • AI/ML integration with validation and confidence scoring
  • Geospatial database expertise (PostGIS queries)
  • Gmail API integration for automated workflows
  • Federal compliance and audit requirements
  • Complex data pipeline orchestration
  • Production system at scale (120 hours invested)

πŸ“Š Performance Characteristics

  • Email processing: <5 seconds from inbox to database
  • PDF extraction: <30 seconds for 5-page application
  • Geographic query: <50ms for eligibility check
  • Dashboard load: <500ms with 10,000+ applications
  • CSV export: <3 seconds for 500-application batch
  • Uptime: 99.9% (Vercel + Railway infrastructure)

Built by Mordechai Potash | Portfolio | Enterprise tax credit automation | Portfolio Rank #2 | Impact Score: 95/100

WOTC Suite

This is part of a complete Work Opportunity Tax Credit processing ecosystem:


Built by Mordechai Potash

About

Enterprise platform for automated tax credit processing with AI-powered PDF extraction, Gmail API integration, PostGIS geographic eligibility, and multi-tenant client management

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •