Let's explore the world of Reinforcement Learning through implementation using python as simplified as possible.
I'm assuming you have basic foundational knowledge of Markov Decision Processes (MDPs) and Dynamic Programming (DP). Most RL algorithms can be viewed as attempts to achieve much the same effect as DP, only with less computation.
You'll see the implementation of the classical reinforcement learning algorithms from Reinforcement Learning: An Introduction on various environments
- Dynamic Programming (Policy and Value Iteration)
- Monte Carlo Methods (Prediction and Control)
- Temporal Difference (SARSA and Q-Learning)
- Value Function Approximation (DQN, DDQN)
- Policy gradient methods (REINFORCE)
- Actor Critic methods (DDPG, PPO, TRPO, A2C, TD3, SAC, RPO, AMP)
- Model Based methods (Dyna-Q, PETS)
core/algorithms/monte_carlo: Blackjack Monte Carlo prediction and control.core/algorithms/tabular: Dynamic programming and temporal-difference methods for grid worlds.core/env: The grid world environment and configs.core/utils: Small numeric helpers.examples: Runnable scripts demonstrating the algorithms.tests: Placeholder suite ready for real unit tests.results: Saved figures produced by the examples.
We intentionally DO NOT list torch, torchvision, or torchaudio in pyproject.toml.
Reason: CUDA wheels for PyTorch are platform- and GPU-sensitive. Installing torch-related packages explicitly from the correct PyTorch index URL avoids resolver issues and version mismatches.
uv.lock will pin only the non-torch Python dependencies.
- Create a virtual environment:
uv venv --python 3.11 - Install PyTorch (example for CUDA 12.8):
UV_HTTP_TIMEOUT=1000 uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 - Install project deps:
uv pip install -e . - (Optional) Dev tools:
uv pip install -e .[dev] - Lint/format/test:
uv run ruff format --check . && uv run ruff check . && uv run pytest