A project for preference alignment of language models using techniques like DPO (Direct Preference Optimization), RLHF (Reinforcement Learning from Human Feedback), and PEPO.
The installation process is simplified for CUDA-enabled systems (Linux/Windows):
-
Edit
pyproject.tomlto set the correct CUDA version for your system: Openpyproject.tomlUpdate theurlin the[tool.uv.index]section to match your CUDA version:[[tool.uv.index]] name = "pytorch" url = "https://download.pytorch.org/whl/cu126" # Change cu126 to your CUDA version (e.g., cu118, cu121)
-
Install dependencies:
uv sync
This will create a virtual environment and install all required packages, including PyTorch with the specified CUDA version.
-
The
alpaca_evallibrary is included in this package. If you cloned this repo, run:git submodule update --init --recursive
To add new dependencies to the project, use uv add:
uv add package-nameThis will automatically update pyproject.toml and uv.lock with the new dependency.
The project uses environment variables for configuration. Create a .env file in the project root (.env is already in .gitignore):
cp .env.example .envThen edit .env and fill in all the values:
# HuggingFace Token (needs WRITE permissions for pushing models)
# Get from: https://huggingface.co/settings/tokens
HF_TOKEN=your_huggingface_token_here
# Weights & Biases Configuration (for experiment tracking)
WANDB_API_KEY=your_wandb_token_here
WANDB_ENTITY=your_wandb_entity
# HuggingFace Hub Base Directory (custom cache/storage location)
HF_HUB_BASE_DIR=your_hf_hub_base_dirRun scripts using uv run:
uv run scripts/eval.pyOr run the training script:
uv run scripts/train.pyFor running on SLURM clusters, use the scripts in scripts/slurm/:
get_interactive.sh: Allocates an interactive node with 4 GPUs for 12 hours. Use this to get a shell on a compute node for development and testing.*.slurm: Batch job scripts for training and evaluation (e.g.,train.slurm,eval.slurm).connect_to_node.sh: Connects to an existing interactive job. Shows a menu of all active jobs with details (job name, node, start time, time remaining) to help you choose which node to connect to. (can connect to batch jobs too)
This project uses Hydra for configuration management. Configuration files are stored in the configs directory, with configs/train.yaml as the default (specified in scripts/train.py).
Override any configuration parameter from the command line using dot notation:
python scripts/train.py hub.push=false L=1 log_level=debugThis example:
- Disables pushing models to HuggingFace Hub (
hub.push=false) - Sets the number of ensemble networks to 1 (
L=1) - Sets the log level to debug (
log_level=debug)
For more details, see the Hydra documentation.
The configs/train.yaml file uses Hydra's defaults: to compose configurations:
defaults:
- model: smollm
- dataset: ultrafeedback
- _self_This loads:
- Model configuration from
configs/model/smollm.yaml(directory name matches the field name) - Dataset configuration from
configs/dataset/ultrafeedback.yaml - The
train.yamlconfig itself (via_self_), which can override the defaults
For example, train.yaml overrides the number of ensemble networks using a variable reference:
model:
num_networks: ${L}The values in the train.yaml config file can be overridden by the command line arguments.
This project uses pre-commit hooks to ensure code quality and consistency.
# Install development dependencies
uv sync --group dev
# Install pre-commit hooks
uv run pre-commit installNote for macOS/non-CUDA users:
While you cannot run the training scripts locally without a CUDA device, you can still contribute to the codebase. The pre-commit hooks are configured to use uvx, so they will run in isolated environments without requiring you to install the project's heavy CUDA dependencies.