Skip to content

Local inference loaders and diagnostic utilities for NVIDIA Nemotron Nano 9B v2 (NVFP4), built for PyTorch 2.9 + CUDA 12.9.

License

Notifications You must be signed in to change notification settings

LJA-TX/nemotron-nano-9b-v2-nvfp4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=======

NVIDIA Nemotron Nano 9B v2 (NVFP4) Integration

Local inference loaders · diagnostic utilities · PyTorch 2.9 + CUDA 12.9


📘 Overview

This repository provides a minimal, diagnostic-ready loader for NVIDIA Nemotron Nano 9B v2 (NVFP4) — a quantized, mixed-precision language model optimized for efficient inference on consumer GPUs.

The included loader performs safe model initialization, mapping, and validation under PyTorch 2.9 and CUDA 12.9, ensuring complete reproducibility without invoking any proprietary inference logic.

Public-safe design:
Beginning with v2.3 “Stalwart,” the loader uses a TOML-based configuration system and a minimal config.json stub for Hugging Face compatibility while excluding proprietary assets.


🧩 Current Status (November 2025)

Phase Description Status
1–6 Environment setup, loader construction, testing ✅ Complete
7 Diagnostic-only reference build (v2.4.2b) ✅ Stable

Key file: nemotron_loader.py (v2.4.2b “Reference-Diagnostic”)
Performs meta-safe initialization, mapping verification, and clean termination after validation.


🔧 System Requirements

  • Linux Mint 22 / Ubuntu 24 LTS
  • Python 3.12 + virtual environment
  • PyTorch 2.9 with CUDA 12.9 toolkit
  • Hugging Face Transformers ≥ 5.0.0.dev0

🧠 Project Goals

  1. Maintain a fully reproducible, meta-safe loader for NVFP4 models.
  2. Provide transparent, reference-grade diagnostics for researchers and hobbyists.
  3. Ensure compatibility with current PyTorch and Transformers toolchains.
  4. Preserve simplicity — minimal code, maximum clarity.

📂 Repository Structure

repo/
├── nemotron_loader.py              # v2.4.2b reference-diagnostic baseline
├── config/
│   └── nemotron.toml               # runtime configuration (public-safe)
├── model/
│   ├── model.safetensors.index.json
│   ├── model-00001-of-00010.safetensors
│   └── config.json                 # minimal HF compatibility stub
├── docs/
│   ├── nemotron_nano_9b-v2-nvfp4_instructions_v1-6_public.md
│   └── release_prep/               # public hygiene & migration docs
└── offload/ (deprecated runtime cache)

▶️ How to Run

# 1️⃣ Activate your venv
source ~/.venvs/nemotron/bin/activate
cd <project root>
# 2️⃣ Run the diagnostic loader
python nemotron_loader.py

The loader validates all safetensor shards, confirms key-rename coverage, and terminates once diagnostics complete successfully.


⚠️ Notes

  • The loader is diagnostic-only — it does not execute inference.
  • All configuration parameters are stored in /config/nemotron.toml.
  • Intended strictly for research and educational use.
  • Respect NVIDIA and Hugging Face licensing for associated model weights.

🧭 Shipping State Justification

Version 2.4.2b of the Nemotron NVFP4 Loader intentionally terminates after completing initialization, mapping, and validation.
This “reference-diagnostic” design verifies model accessibility, mapping coverage, and device dispatch without invoking any proprietary inference logic.
The loader exits only after confirming all diagnostics pass and the model is in eval() mode, signaling readiness for external callers.
This ensures full reproducibility, meta-safety, and legal compliance while providing a clear upgrade path to future integration releases.


🗾 Acknowledgments

  • Transformers / Hugging Face for the flexible dispatch framework.
  • NVIDIA for releasing the Nemotron Nano series.
  • Grok and GPT-5 for technical insights and debugging collaboration.

👤 Maintainer

LJA-TX
Offline AI research & local-inference enthusiast

“Minimalism means preserve context — only add what’s strictly necessary.”

About

Local inference loaders and diagnostic utilities for NVIDIA Nemotron Nano 9B v2 (NVFP4), built for PyTorch 2.9 + CUDA 12.9.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages