A comprehensive tutorial for High Performance Computing workflows using Singularity/Apptainer containers, SLURM job scheduling, and machine learning workloads.
This workshop covers:
- Container creation and management with Singularity/Apptainer
- SLURM job submission (single and array jobs)
- GPU and CPU workload optimization
- Data transfer and bind mounting
- Environment setup with Conda
- Basic Linux command line knowledge
- Access to an HPC cluster with SLURM
- Singularity/Apptainer installed
- SSH access to remote cluster
HPC_workshop/
├── containers/
│ ├── anaconda_recipe.def # Conda-based container definition
│ ├── python-ml.def # Basic Python ML container
│ ├── recipe.def # CUDA-enabled container
│ └── workshop.yml # Conda environment specification
├── scripts/
│ ├── bind_script.py # Test bind mount functionality
│ ├── script.py # Container environment testing
│ ├── cpu_single_demo.py # Single CPU job demo
│ ├── cpu_array_demo.py # CPU array job demo
│ ├── gpu_single_demo.py # Single GPU job demo
│ └── gpu_array_demo.py # GPU array job demo
├── slurm/
│ ├── sbatch_cpu_single.sh # SLURM script for single CPU job
│ ├── sbatch_cpu_array.sh # SLURM script for CPU array jobs
│ ├── sbatch_gpu_single.sh # SLURM script for single GPU job
│ └── sbatch_gpu_array.sh # SLURM script for GPU array jobs
├── data_transfer/
│ ├── copy_code.sh # Transfer code to cluster
│ ├── copy_data.sh # Transfer data to cluster
│ ├── copy_image.sh # Transfer container images
│ └── rsync_ignore.text # Files to exclude from transfer
├── csv/ # Sample datasets
│ ├── ai4i2020.csv
│ ├── data.csv
│ └── used_car_price_dataset_extended.csv
├── local/ # Local files (not transferred)
└── log/ # Job output logs
└── README # This file
# Build basic Python container
sudo singularity build python-ml.sif containers/python-ml.def# Build conda-based container with full data science stack
sudo singularity build workshop.sif containers/anaconda_recipe.def# Build GPU-enabled container
sudo singularity build gpu-ml.sif containers/recipe.def# Test container environment
singularity exec python-ml.sif python scripts/script.py
# Test bind mounting with CSV data
singularity exec --bind ./scripts/csv:/data python-ml.sif python scripts/bind_script.py# Edit copy_code.sh with your cluster details
./data_transfer/copy_code.sh# Edit copy_data.sh with your paths
./data_transfer/copy_data.sh# Edit copy_image.sh with your container path
./data_transfer/copy_image.shsbatch slurm/sbatch_cpu_single.shsbatch slurm/sbatch_cpu_array.shsbatch slurm/sbatch_gpu_single.shsbatch slurm/sbatch_gpu_array.shcontainers/python-ml.def
) Simple container with essential ML libraries:
- numpy, pandas, scikit-learn, matplotlib
containers/anaconda_recipe.def
) Full data science environment using conda:
- Jupyter Lab, seaborn, scipy, and more
- Based on
workshop.yml
- Includes 200+ packages for comprehensive data science workflows
containers/recipe.def
) CUDA-enabled container for GPU workloads:
- PyTorch with CUDA support
- NVIDIA runtime base image
cpu_single_demo.py
- Single-threaded CPU intensive task using
cpu_intensive_task
cpu_array_demo.py
- Parallel CPU tasks with different parameters
gpu_single_demo.py
- Single GPU computation with
check_gpu
and
gpu_computation
gpu_array_demo.py
- Multiple GPU tasks with task-specific matrix sizes
script.py
- Environment and package testing with
test_python_info
,
test_environment
, and
test_package_imports
bind_script.py
- Bind mount and CSV file testing with
test_bind_mount
and
test_csv_files
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --time=01:00:00#SBATCH --gres=gpu:1
#SBATCH --mem=16G
#SBATCH --time=02:00:00#SBATCH --array=1-10
#SBATCH --cpus-per-task=2# Mount data directory
singularity exec --bind /path/to/data:/data container.sif python script.py
# Multiple bind mounts
singularity exec --bind /data:/data,/results:/output container.sif python analysis.pyThe workshop includes sample datasets in
csv
ai4i2020.csv
- Industrial AI dataset
used_car_price_dataset_extended.csv
- Automotive pricing data
data.csv
- General dataset for testing
# 1. Build container
sudo singularity build ml-pipeline.sif containers/anaconda_recipe.def
# 2. Transfer to cluster
./data_transfer/copy_code.sh
./data_transfer/copy_image.sh
# 3. Submit job
sbatch slurm/sbatch_gpu_single.sh# Submit array job for hyperparameter tuning
sbatch --array=1-100 slurm/sbatch_cpu_array.sh# Test container locally
singularity exec --bind ./scripts/csv:/data workshop.sif python scripts/bind_script.py
# Test GPU functionality
singularity exec --nv workshop.sif python scripts/gpu_single_demo.pyFor detailed conda environment management, see
conda_workflow.md
which covers:
- Creating conda environments
- Exporting to YAML
- Building Apptainer images
- Adding new packages in sandbox mode
The
data_transfer
scripts use rsync with optimized settings:
- Compression for faster transfer
- Progress monitoring
- Exclude patterns from
rsync_ignore.text
- Error handling and logging
- Container build fails: Check sudo privileges and disk space
- Bind mount not working: Verify path exists and permissions
- GPU not detected: Ensure NVIDIA drivers and
--nvflag - SLURM job fails: Check resource requests and queue limits
# Check container
singularity inspect container.sif
# Test interactively
singularity shell container.sif
# Check SLURM status
squeue -u $USER
sacct -j <job_id>
# Test GPU availability
singularity exec --nv container.sif python -c "import torch; print(torch.cuda.is_available())"Job outputs are stored in the
log
directory:
- CPU jobs:
cpu_single_test<job_id>.out - GPU jobs:
gpu_single_test<job_id>.out - Array jobs:
cpu_array_test<job_id>_<array_id>.out
- Complete Conda Workflow Guide - Detailed conda environment setup
- SLURM Documentation
- Singularity User Guide
- Container Basics - Building and testing containers with
python-ml.def
- Data Management - Bind mounts and file transfers using
bind_script.py
- Job Scheduling - SLURM single and array jobs in
slurm
- GPU Computing - CUDA containers and GPU jobs with
gpu_single_demo.py
- Workflow Optimization - Best practices and troubleshooting
The workshop.yml
includes:
- Core Libraries: numpy, pandas, matplotlib, seaborn, scipy, scikit-learn
- Jupyter Stack: jupyterlab, jupyter, ipython, ipykernel
- Development Tools: python 3.10.18, pip, setuptools
- Scientific Computing: mkl, numexpr, threadpoolctl
- Visualization: plotly, matplotlib-base, qt libraries
This workshop material is provided for educational purposes. Please check individual dataset licenses in the csv directory.