This project demonstrates the complete pipeline for fine-tuning a large language model (LLM) on financial domain data. The implementation uses state-of-the-art techniques including QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning, enabling training on consumer-grade hardware while maintaining competitive performance.
The project uses the Sujet-Finance-Instruct-177k dataset, a comprehensive collection of 177,000 finance-related instruction-response pairs covering:
- Question & Answer: General financial Q&A
- Sentiment Analysis: Financial sentiment classification
- Named Entity Recognition: Financial entity extraction
- Topic Classification: Financial topic categorization
- Conversational QA: Multi-turn financial conversations
- Total Size: ~337MB
- Training Samples: 34,920 (90%)
- Test Samples: 3,881 (10%)
- Task Focus: Question-answering for financial domain
- Base Model: Meta LLaMA 3-8B
- Fine-tuning Method: QLoRA (Quantized Low-Rank Adaptation)
- Quantization: 4-bit quantization using bitsandbytes
- Attention Mechanism: Flash Attention 2 for memory efficiency
- Training Framework: Hugging Face Transformers + TRL (Transformer Reinforcement Learning)
transformers==4.36.2 # Model architecture and tokenization
datasets==2.16.1 # Dataset loading and processing
accelerate==0.26.1 # Distributed training support
bitsandbytes==0.42.0 # Quantization support
peft # Parameter-efficient fine-tuning
trl # Training utilities for LLMs
torch>=2.2.1 # Deep learning framework
flash-attn # Optimized attention implementation
wandb # Experiment tracking
- Format Conversion: Transform raw data into conversational format
- Chat Template: Custom template optimized for financial Q&A
- Data Cleaning: Lowercase normalization, prefix removal, standardization
- Quality Assurance: System prompt standardization across all samples
# QLoRA Configuration
LoraConfig(
r=16, # Rank for low-rank adaptation
lora_alpha=8, # Scaling parameter
lora_dropout=0.05, # Dropout for regularization
target_modules="all-linear", # Apply to all linear layers
task_type="CAUSAL_LM"
)
# 4-bit Quantization
BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)- Epochs: 3
- Batch Size: 1 per device with gradient accumulation (effective batch size: 2)
- Learning Rate: 2e-4 with constant schedule
- Optimizer: AdamW with warmup
- Precision: Mixed precision (bfloat16)
- Sequence Length: 512 tokens
- Gradient Clipping: 0.3
- Gradient Checkpointing: Reduces memory at the cost of computation
- Parameter Freezing: Only LoRA adapters are trainable
- Flash Attention: Optimized attention computation
- 4-bit Quantization: Reduces model size by 75%
- Training Loss: Progressive decrease across epochs
- Validation Loss: Monitored for overfitting detection
- Trainable Parameters: ~33M parameters (0.4% of original model)
- Memory Usage: Significantly reduced compared to full fine-tuning
- Overfitting Detected: Training loss decreased while validation loss increased
- Model Convergence: Achieved good performance on training data
- Inference Quality: Generated coherent, contextually appropriate responses
# Load the fine-tuned model
pipe = pipeline("text-generation", model="Marina-C/llama-3-8B-finance-qa")
# Format question
question = "Explain the difference between a debit card and a credit card."
formatted_question = question_formatter(question)
# Generate response
outputs, answer = generate_answer(pipe, formatted_question)Question: "How does the yield on a bond relate to its price?"
Answer: The model provides comprehensive explanations of financial concepts, demonstrating understanding of:
- Bond pricing mechanisms
- Yield-price inverse relationship
- Market dynamics
- Risk factors
- QLoRA: Efficient Finetuning of Quantized LLMs
- LoRA: Low-Rank Adaptation of Large Language Models
- Attention: Attention Is All You Need
This project is open-source and available under the MIT License. Please note that the base LLaMA 3 model has its own license terms that must be respected.