Post training of LLMs - Deep Learning AI course
This repository contains Jupyter notebooks and resources accompanying my completion of the DeepLearning.AI Post-training of LLMs course. These materials explore hands-on techniques for post-training large language models (LLMs), including Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL).
The notebooks here are designed to help you understand and experiment with the core methods that take a pre-trained LLM from “generalist” to “specialist,” making it more useful, reliable, and aligned with human intent.
- Supervised Fine-Tuning (SFT): Training models on curated prompt-response pairs.
- Direct Preference Optimization (DPO): Aligning models using preferred vs. rejected outputs.
- Online RL: Iteratively improving model outputs using reward signals.
The examples use accessible models (like HuggingFaceTB/SmolLM2-135M) and small datasets so you can run the full training process even on modest hardware. If you have a GPU, you can experiment with larger models such as Qwen/Qwen3-0.6B-Base for more advanced results[1].
Lesson_3.ipynb: Introduction to SFT and dataset preparationLesson_4.ipynb: Implementing DPO for model alignmentLesson_5.ipynb: Full SFT workflow with a small model and dataset (see sample)[1]Lesson_6.ipynb: Online RL basics and reward modeling
Each notebook is self-contained and includes comments to guide you through the process.
-
Clone the repo
git clone https://github.com/your-username/llm-post-training-course-notebooks.git cd llm-post-training-course-notebooks -
Set up your environment
- Recommended: Python 3.12+, Jupyter, and Hugging Face Transformers.
- Install dependencies (see
pyproject.tomlfor details):uv venv && uv sync
-
Run the notebooks
- Launch Jupyter Notebook and open any notebook to start exploring.
Post-training is what makes LLMs genuinely useful for real-world applications—aligning them with human preferences, improving safety, and customizing for specific business needs. This repo is part of my ongoing commitment to staying at the forefront of AI advancements and bringing practical, cutting-edge solutions to client projects. For more on why this matters, see my write-up: Post-training of LLMs: What, Why, and How.
I’ve completed the DeepLearning.AI course on Post-training of LLMs—see my certificate here.
MIT License. See LICENSE for details.
If you have feedback or want to discuss practical applications of LLM post-training, feel free to reach out!