R-based analysis for forecasting GPU demand and storage capacity needs using historical SLURM job data and polynomial regression.
oscar-gpu-storage-forecasting/
├── README.md
├── src/
│ └── gpu_needs_forecast.R
└── data/
├── slurm_gpu_daily_counts_2021-2023.csv
└── slurm_gpu_daily_counts_2023-2025.csv
The CSV files in data/ contain daily GPU job metrics extracted from SLURM:
| Column | Description |
|---|---|
job_date |
Date of the jobs |
gpu_jobs_count |
Number of GPU jobs submitted |
total_gpus_requested |
Total GPUs requested across all jobs |
avg_gpus_per_job |
Average GPUs requested per job |
max_gpus_requested |
Maximum GPUs requested in a single job |
Storage capacity data (2020-2026) is defined directly in the analysis script.
- Aggregates daily job counts to yearly median values
- Fits a quadratic polynomial regression model
- Generates predictions for future years with confidence intervals
- Calculates GPU capacity ratios to inform hardware planning
- Compares linear and polynomial regression models
- Evaluates model fit and selects the best model
- Produces predictions with uncertainty estimates
The analysis generates:
- GPU demand plot: Historical yearly medians with regression line and future predictions
- Storage capacity plot: Historical storage growth with trend line and projections
- Residual diagnostics: Model comparison plots for evaluating fit quality
install.packages(c("ggplot2", "dplyr", "lubridate", "gridExtra"))-
Open
src/gpu_needs_forecast.Rin R or RStudio -
Update file paths: The script contains hardcoded paths that need to be modified to match your local setup. Update the
read.csv()calls near the top of the script to point to the CSV files in yourdata/directory. -
Run the script to generate forecasts and visualizations
Running the analysis produces:
- Model summary statistics printed to the console
- GPU demand forecast visualization
- Storage capacity forecast visualization
- Model comparison metrics and residual plots