This project implements an Extractive Question Answering (QA) System using Hugging Face Transformers and Streamlit. While the QA system is extractive in nature, it is built on a transformer-based Large Language Model (LLM) and is inspired by advances in Natural Language Processing (NLP) and Generative AI (GenAI). A pre-trained DistilBERT model was fine-tuned on the SQuAD dataset to improve accuracy in extracting answers from a given context. The final model was deployed as an interactive Streamlit web application for real-time user interaction.
- Loaded and preprocessed the SQuAD dataset for fine-tuning.
- Extracted context, questions, and answers, ensuring accurate start-end token mappings.
- Converted the data into Hugging Face
Datasetformat for training. - Saved processed data as
train_data.csvandvalidation_data.csvfor reuse.
- Loaded
train_data.csvandvalidation_data.csvfor model benchmarking. - Used DistilBERT (distilbert-base-cased) for zero-shot QA evaluation.
- Evaluated performance before fine-tuning to establish baseline metrics.
- Observed limitations in handling domain-specific or complex queries.
- Saved the baseline model as
baseline_model.
- Loaded the
baseline_modeland fine-tuned it using the Hugging Face Trainer API. - Used AdamW optimizer with learning rate scheduling.
- Configured batch size, epochs, and gradient accumulation for optimal performance.
- Trained the model on SQuAD dataset and saved the best version as
best_fine_tuned_model.
- Compared baseline vs fine-tuned model performance.
- Evaluated using Exact Match (EM) and F1 Score.
- Achieved significant improvements post fine-tuning:
- EM: 74.50%
- F1: 83.07%
- Generated
evaluation_report.jsonsummarizing all key metrics. - Visualized improvement using bar charts via
matplotlib. - Saved incorrect predictions for further error analysis.
- Developed a Streamlit web app for real-time QA interaction.
- Integrated
best_fine_tuned_modelto power the backend inference. - Enabled users to input both context and question for on-the-fly answers.
- Implemented robust error handling for invalid inputs and edge cases.
- Tested thoroughly with real-world examples for stability and usability.
| Model | Exact Match (EM) | F1 Score |
|---|---|---|
| Baseline Model | 71.75% | 80.65% |
| Fine-Tuned Model | 74.50% | 83.07% |
β
Fine-tuning improved EM by ~3% and F1 Score by ~2.5%
β
Real-time QA interaction successfully enabled through the deployed UI
This project demonstrates the complete lifecycle of building an NLP solutionβfrom dataset preparation and model fine-tuning to deployment and evaluation. Using Hugging Face Transformers and PyTorch, we improved a DistilBERT modelβs performance on SQuAD and deployed it via Streamlit, making it accessible for real-time use cases such as knowledge assistants, intelligent search engines, and chatbot integrations. By leveraging techniques from Generative AI, the project highlights the real-world application of Large Language Models (LLMs) in building intelligent systems capable of natural language understanding and interactive deployment.
- Hugging Face Transformers
- PyTorch
- Hugging Face Datasets
- Streamlit
- SQuAD Dataset
- Matplotlib, Pandas
- JSON, CSV, Tokenizers
π Project Structure
βββ data/
β βββ train_data.csv
β βββ validation_data.csv
βββ models/
β βββ baseline_model/
β βββ best_fine_tuned_model/
βββ evaluation/
β βββ evaluation_report.json
β βββ performance_plots.png
βββ app.py # Streamlit Application
βββ train.py # Fine-tuning Script
βββ requirements.txt
βββ README.md
{
"context": "The Amazon rainforest is one of the world's most biodiverse habitats. It plays a critical role in regulating the global climate.",
"question": "What role does the Amazon rainforest play in the climate?",
"answer": "regulating the global climate"
}- Fine-tuning pre-trained transformer models for extractive QA
- Understanding tokenization and label alignment for QA
- Evaluating NLP models using EM and F1 metrics
- Building and deploying interactive ML apps using Streamlit
- Exposure to Hugging Face's ecosystem and Trainer API
- Gained practical experience with Large Language Models (LLMs) and exposure to the foundations of Generative AI (GenAI) by fine-tuning and deploying a transformer-based question answering system.