post-train-llms-dlai

Post training of LLMs - Deep Learning AI course

This repository contains Jupyter notebooks and resources accompanying my completion of the DeepLearning.AI Post-training of LLMs course. These materials explore hands-on techniques for post-training large language models (LLMs), including Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL).

About

The notebooks here are designed to help you understand and experiment with the core methods that take a pre-trained LLM from “generalist” to “specialist,” making it more useful, reliable, and aligned with human intent.

Supervised Fine-Tuning (SFT): Training models on curated prompt-response pairs.
Direct Preference Optimization (DPO): Aligning models using preferred vs. rejected outputs.
Online RL: Iteratively improving model outputs using reward signals.

The examples use accessible models (like HuggingFaceTB/SmolLM2-135M) and small datasets so you can run the full training process even on modest hardware. If you have a GPU, you can experiment with larger models such as Qwen/Qwen3-0.6B-Base for more advanced results[1].

Notebooks

Lesson_3.ipynb: Introduction to SFT and dataset preparation
Lesson_4.ipynb: Implementing DPO for model alignment
Lesson_5.ipynb: Full SFT workflow with a small model and dataset (see sample)[1]
Lesson_6.ipynb: Online RL basics and reward modeling

Each notebook is self-contained and includes comments to guide you through the process.

Getting Started

Clone the repo

git clone https://github.com/your-username/llm-post-training-course-notebooks.git
cd llm-post-training-course-notebooks

Set up your environment
- Recommended: Python 3.12+, Jupyter, and Hugging Face Transformers.
- Install dependencies (see pyproject.toml for details):
```
uv venv && uv sync
```
Run the notebooks
- Launch Jupyter Notebook and open any notebook to start exploring.

Why Post-training?

Post-training is what makes LLMs genuinely useful for real-world applications—aligning them with human preferences, improving safety, and customizing for specific business needs. This repo is part of my ongoing commitment to staying at the forefront of AI advancements and bringing practical, cutting-edge solutions to client projects. For more on why this matters, see my write-up: Post-training of LLMs: What, Why, and How.

Certificate

I’ve completed the DeepLearning.AI course on Post-training of LLMs—see my certificate here.

License

MIT License. See LICENSE for details.

If you have feedback or want to discuss practical applications of LLM post-training, feel free to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
notebooks		notebooks
scripts		scripts
src/post_train_llms_dlai		src/post_train_llms_dlai
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

post-train-llms-dlai

About

Notebooks

Getting Started

Why Post-training?

Certificate

License

About

Uh oh!

Releases

Packages

Languages

License

DataBooth/post-train-llms-dlai

Folders and files

Latest commit

History

Repository files navigation

post-train-llms-dlai

About

Notebooks

Getting Started

Why Post-training?

Certificate

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages