Skip to content

An AI-driven PDF validation tool that compares model-generated answers with PDF content via embeddings and FAISS similarity search, producing automated correctness labels for each Q&A pair.

Notifications You must be signed in to change notification settings

SrutikNandaniya/GenAI-validation-layer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

✨ AI PDF Answer Validation System

πŸš€ Overview

This project validates AI-generated answers against a financial loan document (PDF).

For every question–answer pair, the system determines whether the answer is:

βœ… SUPPORTED β€” fully matches PDF

⚠️ PARTIALLY_SUPPORTED β€” some match, some mismatch

❌ NOT_SUPPORTED β€” no relevant match found

The detection uses semantic embeddings, numeric extraction, and similarity search.

πŸ“ Project Structure

βš™οΈ Tech Stack

Component Purpose
Python 3 Core programming language
PyPDF2 PDF text extraction
SentenceTransformers (MiniLM) Embedding generation
FAISS Fast vector similarity search
NumPy Numerical processing
JSON Input/output formats

πŸ“¦ Installation

Install required libraries:

pip install PyPDF2 sentence-transformers faiss-cpu numpy

For Windows FAISS:

pip install faiss-cpu-windows

▢️ How to Run the Validator Step 1 β€” Navigate to src

cd src

Step 2 β€” Execute the script

python validator.py --pdf ../input-pdfs/axis_loan1.pdf --qa qa_samples.json --out ../validation_results.json

πŸ” Argument Meaning

Argument Meaning
--pdf Path to source PDF
--qa JSON file containing questions & answers
--out Output file where validation results save

πŸ“€ Output Format (validation_results.json)

Each entry looks like:

{
  "question": "What is the sanctioned loan amount?",
  "ai_answer": "The sanctioned loan amount is Rs. 15,00,000.",
  "validation_result": "SUPPORTED",
  "confidence_score": 0.82,
  "supporting_text": "[Page X] ... Facility Amount Rupees: 1,500,000 ..."
}

πŸ“Έ Screenshots Included

Inside /screenshots, the following proof screenshots are available:

πŸ—‚ Project folder structure

πŸ–₯ Command-line execution of validator.py

These confirm the application works end-to-end as required.

🧠 How the System Works (Simplified)

Extract text from the PDF

Break it into meaningful chunks

Convert chunks β†’ embeddings (MiniLM)

Convert Q&A β†’ embeddings

Compare semantic similarity

Perform numeric extraction & matching

Generate decision label:

  • SUPPORTED

  • PARTIALLY_SUPPORTED

  • NOT_SUPPORTED

🎯 Submission Summary

βœ” Complete folder structure
βœ” Full PDF β†’ Q&A β†’ Validation pipeline
βœ” Final output JSON included
βœ” Screenshots provided
βœ” Easy-to-run instructions documented

About

An AI-driven PDF validation tool that compares model-generated answers with PDF content via embeddings and FAISS similarity search, producing automated correctness labels for each Q&A pair.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages