Skip to content

An extractive Question Answering system built with Hugging Face Transformers and Streamlit. Fine-tuned a DistilBERT model on the SQuAD dataset and deployed it as an interactive web app for real-time QA from user-provided context.

Notifications You must be signed in to change notification settings

manishwaraprabhu/genai-nlp-llms-question-answering-huggingface

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Building and Deploying a Question Answering System with Hugging Face

πŸ“Œ Project Overview

This project implements an Extractive Question Answering (QA) System using Hugging Face Transformers and Streamlit. While the QA system is extractive in nature, it is built on a transformer-based Large Language Model (LLM) and is inspired by advances in Natural Language Processing (NLP) and Generative AI (GenAI). A pre-trained DistilBERT model was fine-tuned on the SQuAD dataset to improve accuracy in extracting answers from a given context. The final model was deployed as an interactive Streamlit web application for real-time user interaction.


πŸš€ Project Workflow

βœ… Step 1: Dataset Preparation

  • Loaded and preprocessed the SQuAD dataset for fine-tuning.
  • Extracted context, questions, and answers, ensuring accurate start-end token mappings.
  • Converted the data into Hugging Face Dataset format for training.
  • Saved processed data as train_data.csv and validation_data.csv for reuse.

βœ… Step 2: Baseline Model Evaluation (Pretrained DistilBERT)

  • Loaded train_data.csv and validation_data.csv for model benchmarking.
  • Used DistilBERT (distilbert-base-cased) for zero-shot QA evaluation.
  • Evaluated performance before fine-tuning to establish baseline metrics.
  • Observed limitations in handling domain-specific or complex queries.
  • Saved the baseline model as baseline_model.

βœ… Step 3: Fine-Tuning the DistilBERT Model

  • Loaded the baseline_model and fine-tuned it using the Hugging Face Trainer API.
  • Used AdamW optimizer with learning rate scheduling.
  • Configured batch size, epochs, and gradient accumulation for optimal performance.
  • Trained the model on SQuAD dataset and saved the best version as best_fine_tuned_model.

βœ… Step 4: Model Evaluation

  • Compared baseline vs fine-tuned model performance.
  • Evaluated using Exact Match (EM) and F1 Score.
  • Achieved significant improvements post fine-tuning:
    • EM: 74.50%
    • F1: 83.07%
  • Generated evaluation_report.json summarizing all key metrics.
  • Visualized improvement using bar charts via matplotlib.
  • Saved incorrect predictions for further error analysis.

βœ… Step 5: Deployment via Streamlit

  • Developed a Streamlit web app for real-time QA interaction.
  • Integrated best_fine_tuned_model to power the backend inference.
  • Enabled users to input both context and question for on-the-fly answers.
  • Implemented robust error handling for invalid inputs and edge cases.
  • Tested thoroughly with real-world examples for stability and usability.

πŸ“Š Final Results: Performance Comparison

Model Exact Match (EM) F1 Score
Baseline Model 71.75% 80.65%
Fine-Tuned Model 74.50% 83.07%

βœ… Fine-tuning improved EM by ~3% and F1 Score by ~2.5%
βœ… Real-time QA interaction successfully enabled through the deployed UI


βœ… Conclusion

This project demonstrates the complete lifecycle of building an NLP solutionβ€”from dataset preparation and model fine-tuning to deployment and evaluation. Using Hugging Face Transformers and PyTorch, we improved a DistilBERT model’s performance on SQuAD and deployed it via Streamlit, making it accessible for real-time use cases such as knowledge assistants, intelligent search engines, and chatbot integrations. By leveraging techniques from Generative AI, the project highlights the real-world application of Large Language Models (LLMs) in building intelligent systems capable of natural language understanding and interactive deployment.


πŸ› οΈ Technologies & Tools

  • Hugging Face Transformers
  • PyTorch
  • Hugging Face Datasets
  • Streamlit
  • SQuAD Dataset
  • Matplotlib, Pandas
  • JSON, CSV, Tokenizers

πŸ“‚ Project Structure

πŸ“‚ Project Structure
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ train_data.csv
β”‚   └── validation_data.csv
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ baseline_model/
β”‚   └── best_fine_tuned_model/
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ evaluation_report.json
β”‚   └── performance_plots.png
β”œβ”€β”€ app.py               # Streamlit Application
β”œβ”€β”€ train.py             # Fine-tuning Script
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“– Sample Dataset Entry

{
  "context": "The Amazon rainforest is one of the world's most biodiverse habitats. It plays a critical role in regulating the global climate.",
  "question": "What role does the Amazon rainforest play in the climate?",
  "answer": "regulating the global climate"
}

πŸ“š Learning Outcomes

  • Fine-tuning pre-trained transformer models for extractive QA
  • Understanding tokenization and label alignment for QA
  • Evaluating NLP models using EM and F1 metrics
  • Building and deploying interactive ML apps using Streamlit
  • Exposure to Hugging Face's ecosystem and Trainer API
  • Gained practical experience with Large Language Models (LLMs) and exposure to the foundations of Generative AI (GenAI) by fine-tuning and deploying a transformer-based question answering system.

About

An extractive Question Answering system built with Hugging Face Transformers and Streamlit. Fine-tuned a DistilBERT model on the SQuAD dataset and deployed it as an interactive web app for real-time QA from user-provided context.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages