=======
Local inference loaders · diagnostic utilities · PyTorch 2.9 + CUDA 12.9
This repository provides a minimal, diagnostic-ready loader for NVIDIA Nemotron Nano 9B v2 (NVFP4) — a quantized, mixed-precision language model optimized for efficient inference on consumer GPUs.
The included loader performs safe model initialization, mapping, and validation under PyTorch 2.9 and CUDA 12.9, ensuring complete reproducibility without invoking any proprietary inference logic.
Public-safe design:
Beginning with v2.3 “Stalwart,” the loader uses a TOML-based configuration system and a minimalconfig.jsonstub for Hugging Face compatibility while excluding proprietary assets.
| Phase | Description | Status |
|---|---|---|
| 1–6 | Environment setup, loader construction, testing | ✅ Complete |
| 7 | Diagnostic-only reference build (v2.4.2b) | ✅ Stable |
Key file: nemotron_loader.py (v2.4.2b “Reference-Diagnostic”)
Performs meta-safe initialization, mapping verification, and clean termination after validation.
- Linux Mint 22 / Ubuntu 24 LTS
- Python 3.12 + virtual environment
- PyTorch 2.9 with CUDA 12.9 toolkit
- Hugging Face Transformers ≥ 5.0.0.dev0
- Maintain a fully reproducible, meta-safe loader for NVFP4 models.
- Provide transparent, reference-grade diagnostics for researchers and hobbyists.
- Ensure compatibility with current PyTorch and Transformers toolchains.
- Preserve simplicity — minimal code, maximum clarity.
repo/
├── nemotron_loader.py # v2.4.2b reference-diagnostic baseline
├── config/
│ └── nemotron.toml # runtime configuration (public-safe)
├── model/
│ ├── model.safetensors.index.json
│ ├── model-00001-of-00010.safetensors
│ └── config.json # minimal HF compatibility stub
├── docs/
│ ├── nemotron_nano_9b-v2-nvfp4_instructions_v1-6_public.md
│ └── release_prep/ # public hygiene & migration docs
└── offload/ (deprecated runtime cache)
# 1️⃣ Activate your venv
source ~/.venvs/nemotron/bin/activate
cd <project root>
# 2️⃣ Run the diagnostic loader
python nemotron_loader.pyThe loader validates all safetensor shards, confirms key-rename coverage, and terminates once diagnostics complete successfully.
- The loader is diagnostic-only — it does not execute inference.
- All configuration parameters are stored in
/config/nemotron.toml. - Intended strictly for research and educational use.
- Respect NVIDIA and Hugging Face licensing for associated model weights.
Version 2.4.2b of the Nemotron NVFP4 Loader intentionally terminates after completing initialization, mapping, and validation.
This “reference-diagnostic” design verifies model accessibility, mapping coverage, and device dispatch without invoking any proprietary inference logic.
The loader exits only after confirming all diagnostics pass and the model is in eval() mode, signaling readiness for external callers.
This ensures full reproducibility, meta-safety, and legal compliance while providing a clear upgrade path to future integration releases.
- Transformers / Hugging Face for the flexible dispatch framework.
- NVIDIA for releasing the Nemotron Nano series.
- Grok and GPT-5 for technical insights and debugging collaboration.
LJA-TX
Offline AI research & local-inference enthusiast
“Minimalism means preserve context — only add what’s strictly necessary.”