Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
week-01-self-attention		week-01-self-attention
README.md		README.md

Repository files navigation

Iron-Inference

LLM inference server built from scratch. C++ model loader + Go gRPC API.

Status: 🚧 In Development (Q1 2026)

What This Is

A learning project that removes the "magic" from LLM inference by building each layer manually:

Foundations — Transformer math in NumPy, validated against GPT2 model
C++ Core — GGUF parser + mmap weight loader
Go Server — gRPC API with metrics and security middleware
Production — Docker deployment with KV cache optimization

Current Progress

Requirements

Python 3.10+ (foundations)
CMake 3.20+, CUDA 12.x (C++ core)
Go 1.21+ (server)
Docker with NVIDIA runtime (deployment)

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%