PEPO

A project for preference alignment of language models using techniques like DPO (Direct Preference Optimization), RLHF (Reinforcement Learning from Human Feedback), and PEPO.

Installation

The installation process is simplified for CUDA-enabled systems (Linux/Windows):

Edit pyproject.toml to set the correct CUDA version for your system: Open pyproject.toml Update the url in the [tool.uv.index] section to match your CUDA version:
```
[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu126"  # Change cu126 to your CUDA version (e.g., cu118, cu121)
```
Install dependencies:
```
uv sync
```
This will create a virtual environment and install all required packages, including PyTorch with the specified CUDA version.
The alpaca_eval library is included in this package. If you cloned this repo, run:
```
git submodule update --init --recursive
```

Adding Dependencies

To add new dependencies to the project, use uv add:

uv add package-name

This will automatically update pyproject.toml and uv.lock with the new dependency.

Environment Variables

The project uses environment variables for configuration. Create a .env file in the project root (.env is already in .gitignore):

cp .env.example .env

Then edit .env and fill in all the values:

# HuggingFace Token (needs WRITE permissions for pushing models)
# Get from: https://huggingface.co/settings/tokens
HF_TOKEN=your_huggingface_token_here

# Weights & Biases Configuration (for experiment tracking)
WANDB_API_KEY=your_wandb_token_here
WANDB_ENTITY=your_wandb_entity

# HuggingFace Hub Base Directory (custom cache/storage location)
HF_HUB_BASE_DIR=your_hf_hub_base_dir

Running the Project

Basic Usage

Run scripts using uv run:

uv run scripts/eval.py

Or run the training script:

uv run scripts/train.py

SLURM Scripts

For running on SLURM clusters, use the scripts in scripts/slurm/:

get_interactive.sh: Allocates an interactive node with 4 GPUs for 12 hours. Use this to get a shell on a compute node for development and testing.
*.slurm: Batch job scripts for training and evaluation (e.g., train.slurm, eval.slurm).
connect_to_node.sh: Connects to an existing interactive job. Shows a menu of all active jobs with details (job name, node, start time, time remaining) to help you choose which node to connect to. (can connect to batch jobs too)

Configuration Management

This project uses Hydra for configuration management. Configuration files are stored in the configs directory, with configs/train.yaml as the default (specified in scripts/train.py).

Overriding Parameters

Override any configuration parameter from the command line using dot notation:

python scripts/train.py hub.push=false L=1 log_level=debug

This example:

Disables pushing models to HuggingFace Hub (hub.push=false)
Sets the number of ensemble networks to 1 (L=1)
Sets the log level to debug (log_level=debug)

For more details, see the Hydra documentation.

Configuration File Structure

The configs/train.yaml file uses Hydra's defaults: to compose configurations:

defaults:
  - model: smollm
  - dataset: ultrafeedback
  - _self_

This loads:

Model configuration from configs/model/smollm.yaml (directory name matches the field name)
Dataset configuration from configs/dataset/ultrafeedback.yaml
The train.yaml config itself (via _self_), which can override the defaults

For example, train.yaml overrides the number of ensemble networks using a variable reference:

model:
  num_networks: ${L}

The values in the train.yaml config file can be overridden by the command line arguments.

Development

Pre-commit hooks

This project uses pre-commit hooks to ensure code quality and consistency.

# Install development dependencies
uv sync --group dev

# Install pre-commit hooks
uv run pre-commit install

Note for macOS/non-CUDA users: While you cannot run the training scripts locally without a CUDA device, you can still contribute to the codebase. The pre-commit hooks are configured to use uvx, so they will run in isolated environments without requiring you to install the project's heavy CUDA dependencies.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github/workflows		.github/workflows
alpaca_eval @ 31c967b		alpaca_eval @ 31c967b
configs		configs
figures		figures
notebooks		notebooks
scripts		scripts
src/pepo		src/pepo
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEPO

Installation

Adding Dependencies

Environment Variables

Running the Project

Basic Usage

SLURM Scripts

Configuration Management

Overriding Parameters

Configuration File Structure

Development

Pre-commit hooks

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

adambarla/pepo

Folders and files

Latest commit

History

Repository files navigation

PEPO

Installation

Adding Dependencies

Environment Variables

Running the Project

Basic Usage

SLURM Scripts

Configuration Management

Overriding Parameters

Configuration File Structure

Development

Pre-commit hooks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages