Skip to content

Deterministic-mode checks for LLM inference: measure run/batch variance, generate repro packs, and explain why outputs differ.

License

Notifications You must be signed in to change notification settings

tommasocerruti/detllm

detLLM logo

Deterministic and verifiable LLM inference

CI License

About

detLLM verifies reproducibility for LLM inference and produces a minimal repro pack when outputs diverge. It measures run-to-run variance and batch-size variance, and reports results with explicit, capability-gated guarantees (only claimed when the backend can actually enforce them).

Demo

detLLM demo

Quickstart

pip install detllm
detllm check --backend hf --model <model_id> \
  --prompt "Choose one: A or B. Answer with a single letter." \
  --tier 1 --runs 5 --batch-size 1

Note: some shells (like zsh) require quotes when installing extras, e.g. pip install 'detllm[test,hf]'.

Verification

See docs/verification.md for the full local verification procedure and expected outputs.

Tiers

  • Tier 0: artifacts + deterministic diff/report (no equality guarantees)
  • Tier 1: repeatability across runs for a fixed batch size
  • Tier 2: Tier 1 + score/logprob equality (capability-gated)

Tier 1 guarantees repeatability only for a fixed batch size; batch invariance is measured separately.

Tier 2 scores are captured when the backend supports stable score/logprob output. See docs/verification.md for how to verify scores appear in traces.

Artifacts (minimal repro pack)

Each run writes an artifacts/<run_id>/ folder:

  • env.json
  • run_config.json
  • determinism_applied.json
  • trace.jsonl
  • report.json + report.txt
  • diffs/first_divergence.json

Python API

from detllm import check, run

run(
    backend="hf",
    model="distilgpt2",
    prompts=["Hello"],
    tier=1,
    out_dir="artifacts/run1",
)

report = check(
    backend="hf",
    model="distilgpt2",
    prompts=["Hello"],
    runs=3,
    batch_size=1,
    out_dir="artifacts/check1",
)

print(report.status, report.category)

CLI

  • detllm env
  • detllm run
  • detllm check
  • detllm diff
  • detllm report

Known limitations

  • GPU determinism is conditional; results can change across drivers/kernels.
  • Batch invariance is not guaranteed; it is measured separately.
  • Strict guarantees depend on backend capabilities.
  • Distributed/multiprocess inference is out of scope for now.

Docs

Versioning

See docs/versioning.md for compatibility guarantees.

About

Deterministic-mode checks for LLM inference: measure run/batch variance, generate repro packs, and explain why outputs differ.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published