Skip to content

alexzaitsev/iron-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 

Repository files navigation

Iron-Inference

LLM inference server built from scratch. C++ model loader + Go gRPC API.

Status: 🚧 In Development (Q1 2026)

What This Is

A learning project that removes the "magic" from LLM inference by building each layer manually:

  1. Foundations β€” Transformer math in NumPy, validated against GPT2 model
  2. C++ Core β€” GGUF parser + mmap weight loader
  3. Go Server β€” gRPC API with metrics and security middleware
  4. Production β€” Docker deployment with KV cache optimization

Current Progress

  • Single-head attention implementation
  • Multi-head attention with PyTorch validation
  • GGUF format parser (C++)
  • Memory-mapped weight loading
  • Go gRPC server
  • Observability (Prometheus/Grafana)
  • Security middleware
  • Docker deployment
  • KV cache optimization

Requirements

  • Python 3.10+ (foundations)
  • CMake 3.20+, CUDA 12.x (C++ core)
  • Go 1.21+ (server)
  • Docker with NVIDIA runtime (deployment)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages