Skip to content

This repo contains the Jupyter notebook used to Fine-Tune Meta's Llama 3 model to develop a LLM specialist in answering questions related to the financial sector.

Notifications You must be signed in to change notification settings

MayankD409/Finance-QA-Finetune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Finance Specialist AI - LLaMA 3 Fine-tuning Project

This project demonstrates the complete pipeline for fine-tuning a large language model (LLM) on financial domain data. The implementation uses state-of-the-art techniques including QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning, enabling training on consumer-grade hardware while maintaining competitive performance.

Dataset

The project uses the Sujet-Finance-Instruct-177k dataset, a comprehensive collection of 177,000 finance-related instruction-response pairs covering:

  • Question & Answer: General financial Q&A
  • Sentiment Analysis: Financial sentiment classification
  • Named Entity Recognition: Financial entity extraction
  • Topic Classification: Financial topic categorization
  • Conversational QA: Multi-turn financial conversations

Dataset Statistics

  • Total Size: ~337MB
  • Training Samples: 34,920 (90%)
  • Test Samples: 3,881 (10%)
  • Task Focus: Question-answering for financial domain

Architecture

  • Base Model: Meta LLaMA 3-8B
  • Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
  • Quantization: 4-bit quantization using bitsandbytes
  • Attention Mechanism: Flash Attention 2 for memory efficiency
  • Training Framework: Hugging Face Transformers + TRL (Transformer Reinforcement Learning)

Key Libraries

transformers==4.36.2    # Model architecture and tokenization
datasets==2.16.1        # Dataset loading and processing
accelerate==0.26.1      # Distributed training support
bitsandbytes==0.42.0    # Quantization support
peft                    # Parameter-efficient fine-tuning
trl                     # Training utilities for LLMs
torch>=2.2.1           # Deep learning framework
flash-attn             # Optimized attention implementation
wandb                  # Experiment tracking

Implementation Details

1. Data Preprocessing

  • Format Conversion: Transform raw data into conversational format
  • Chat Template: Custom template optimized for financial Q&A
  • Data Cleaning: Lowercase normalization, prefix removal, standardization
  • Quality Assurance: System prompt standardization across all samples

2. Model Configuration

# QLoRA Configuration
LoraConfig(
    r=16,                         # Rank for low-rank adaptation
    lora_alpha=8,                 # Scaling parameter
    lora_dropout=0.05,            # Dropout for regularization
    target_modules="all-linear",  # Apply to all linear layers
    task_type="CAUSAL_LM"
)

# 4-bit Quantization
BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

3. Training Configuration

  • Epochs: 3
  • Batch Size: 1 per device with gradient accumulation (effective batch size: 2)
  • Learning Rate: 2e-4 with constant schedule
  • Optimizer: AdamW with warmup
  • Precision: Mixed precision (bfloat16)
  • Sequence Length: 512 tokens
  • Gradient Clipping: 0.3

4. Memory Optimization Techniques

  • Gradient Checkpointing: Reduces memory at the cost of computation
  • Parameter Freezing: Only LoRA adapters are trainable
  • Flash Attention: Optimized attention computation
  • 4-bit Quantization: Reduces model size by 75%

📈 Training Results

Performance Metrics

  • Training Loss: Progressive decrease across epochs
  • Validation Loss: Monitored for overfitting detection
  • Trainable Parameters: ~33M parameters (0.4% of original model)
  • Memory Usage: Significantly reduced compared to full fine-tuning

Key Observations

  • Overfitting Detected: Training loss decreased while validation loss increased
  • Model Convergence: Achieved good performance on training data
  • Inference Quality: Generated coherent, contextually appropriate responses

Basic Inference

# Load the fine-tuned model
pipe = pipeline("text-generation", model="Marina-C/llama-3-8B-finance-qa")

# Format question
question = "Explain the difference between a debit card and a credit card."
formatted_question = question_formatter(question)

# Generate response
outputs, answer = generate_answer(pipe, formatted_question)

Sample Q&A

Question: "How does the yield on a bond relate to its price?"

Answer: The model provides comprehensive explanations of financial concepts, demonstrating understanding of:

  • Bond pricing mechanisms
  • Yield-price inverse relationship
  • Market dynamics
  • Risk factors

References & Resources

Academic Papers

Technical Resources

Models & Datasets

License

This project is open-source and available under the MIT License. Please note that the base LLaMA 3 model has its own license terms that must be respected.

About

This repo contains the Jupyter notebook used to Fine-Tune Meta's Llama 3 model to develop a LLM specialist in answering questions related to the financial sector.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published