emx-onnx-cgen compiles ONNX models to portable, deterministic C code for deeply embedded systems. The generated code is designed to run without dynamic memory allocation, operating-system services, or external runtimes, making it suitable for safety-critical and resource-constrained targets.
Key characteristics:
- No dynamic memory allocation (
malloc,free, heap usage) - Static, compile-time known memory layout for parameters, activations, and temporaries
- Deterministic control flow (explicit loops, no hidden dispatch or callbacks)
- No OS dependencies, using only standard C headers (for example,
stdint.handstddef.h) - Single-threaded execution model
- Bitwise-stable code generation for reproducible builds
- Readable, auditable C code suitable for certification and code reviews
- Generated C output format spec:
docs/output-format.md - Designed for bare-metal and RTOS-based systems
For PyTorch models, see the related project emx-pytorch-cgen.
- Correctness-first compilation with outputs comparable to ONNX Runtime.
- Deterministic and reproducible C code generation.
- Clean, pass-based compiler architecture (import → normalize → optimize → lower → emit).
- Minimal C runtime with explicit, predictable data movement.
- Aggressive performance optimizations in generated C.
- Implicit runtime dependencies or dynamic loading.
- Training/backpropagation support.
- CLI for ONNX-to-C compilation and verification.
- Deterministic codegen with explicit tensor shapes and loop nests.
- Minimal C runtime templates in
src/emx_onnx_cgen/templates/. - ONNX Runtime comparison for end-to-end validation.
- Official ONNX operator coverage tracking.
- Support for a wide range of ONNX operators (see
SUPPORT_OPS.md). - Supported data types:
bfloat16,float16,float,doubleint8,uint8,int16,uint16,int32,uint32,int64,uint64boolstring(fixed-size'\0'-terminated C strings; seedocs/output-format.md)optional(<tensor type>)(optional tensors represented via an extra_Bool <name>_presentflag; seedocs/output-format.md)
- Optional support for dynamic dimensions using C99 variable-length arrays (VLAs), when the target compiler supports them.
The generated C code can be embedded directly into a bare-metal C firmware or application where all model weights and parameters are compiled into the C source.
Typical characteristics:
- No file system or OS required.
- All weights stored as
static constarrays in flash/ROM. - Deterministic memory usage with no runtime allocation.
- Suitable for:
- Microcontrollers
- Safety-critical firmware
- Systems with strict certification requirements
This scenario is enabled via --large-weight-threshold 0, forcing all weights to be embedded directly into the generated C code.
The generated C code can be embedded into C or C++ applications where large model weights are stored externally and loaded from a binary file at runtime.
Typical characteristics:
- Code and control logic compiled into the application.
- Large constant tensors packed into a separate
.binfile. - Explicit, generated loader functions handle weight initialization.
- Suitable for:
- Embedded Linux or RTOS systems
- Applications with limited flash but available external storage
- Larger models where code size must be minimized
This scenario is enabled automatically once the cumulative weight size exceeds --large-weight-threshold (default: 102400 bytes).
In both of the above scenarios, the generated C code can serve as input to emmtrix source-to-source compilation and optimization tools, enabling target-specific optimizations while preserving functional correctness.
Examples of applied transformations include:
- Kernel fusion and loop restructuring
- Memory layout optimization and buffer reuse
- Reduction of internal temporary memory
- Utilization of SIMD / vector instruction sets
- Offloading of large weights to external memory
- Dynamic loading of weights or activations via DMA
This workflow allows a clear separation between:
- Correctness-first, deterministic ONNX lowering, and
- Target-specific performance and memory optimization,
while keeping the generated C code readable, auditable, and traceable.
The generated C code is intentionally structured to make such transformations explicit and analyzable, rather than relying on opaque backend-specific code generation.
Install the package directly from PyPI (recommended):
pip install emx-onnx-cgenRequired at runtime (both compile and verify):
onnxnumpyjinja2
Optional for verification and tests:
onnxruntime- A C compiler (
cc,gcc,clangor via--cc)
Compile an ONNX model into a C source file:
emx-onnx-cgen compile path/to/model.onnx build/model.cVerify an ONNX model end-to-end against ONNX Runtime (default):
emx-onnx-cgen verify path/to/model.onnxemx-onnx-cgen provides two subcommands: compile and verify.
These options are accepted by both compile and verify:
--model-base-dir: Base directory for resolving the model path (and related paths).--color: Colorize CLI output (auto,always,never; default:auto).--large-weight-threshold: Store weights in a binary file once the cumulative byte size exceeds this threshold (default:102400; set to0to disable).--large-temp-threshold: Mark temporary buffers larger than this threshold as static (default:1024).--fp32-accumulation-strategy: Accumulation strategy for float32 inputs (simpleuses float32,fp64uses double; default:fp64).--fp16-accumulation-strategy: Accumulation strategy for float16 inputs (simpleuses float16,fp32uses float; default:fp32).
emx-onnx-cgen compile <model.onnx> <output.c> [options]Options:
--model-name: Override the generated model name (default: output file stem).--emit-testbench: Emit a JSON-producingmain()testbench for validation.--emit-data-file: Emit constant data arrays into a companion_dataC file.--no-restrict-arrays: Disablerestrictqualifiers on generated array parameters.
emx-onnx-cgen verify <model.onnx> [options]Options:
--cc: Explicit C compiler command for building the testbench binary.--max-ulp: Maximum allowed ULP distance for floating outputs (default:100).--atol-eps: Absolute tolerance as a multiple of machine epsilon for floating outputs (default:1.0).--runtime: Runtime backend for verification (onnxruntimeoronnx-reference, default:onnxruntime).--temp-dir-root: Root directory in which to create a temporary verification directory (default: system temp dir).--temp-dir: Exact directory to use for temporary verification files (default: create a temporary directory).--keep-temp-dir: Keep the temporary verification directory instead of deleting it.
How verification works:
- Compile with a testbench: the compiler is invoked with
--emit-testbench, generating a C program that runs the model and prints inputs/outputs as JSON. - Build and execute: the testbench is compiled with the selected C compiler
(
--cc,CC, or a detectedcc/gcc/clang) and executed in a temporary directory. - Run runtime backend: the JSON inputs from the testbench are fed to the
selected runtime (
onnxruntimeoronnx-reference) using the same model. The compiler no longer ships a Python runtime evaluator. - Compare outputs: floating outputs are compared by maximum ULP distance. Floating-point verification first ignores very small differences up to --atol-eps × machine epsilon of the evaluated floating-point type, treating such values as equal. For values with a larger absolute difference, the ULP distance is computed, and the maximum ULP distance is reported; non-floating outputs must match exactly. Missing outputs or mismatches are treated as failures.
- ORT unsupported models: when using
onnxruntime, if ORT reportsNOT_IMPLEMENTED, verification is skipped with a warning (exit code 0).
See ONNX_SUPPORT.md for the generated support matrix.
See SUPPORT_OPS.md for operator-level support derived from the expectation JSON files.
- emx-pytorch-cgen
A PyTorch-to-C compiler following the same design principles as emx-onnx-cgen, but operating directly on PyTorch models instead of ONNX graphs.
https://github.com/emmtrix/emx-pytorch-cgen - onnx2c
An ONNX-to-C code generator with a different design focus and code generation approach.
https://github.com/kraiskil/onnx2c
This project is maintained by emmtrix.
