Adding terraform generation to openbench. by kishorealliiita · Pull Request #343 · groq/openbench

kishorealliiita · 2026-01-31T11:14:01Z

Summary

Adds a Terraform Generation benchmark to openbench: the model gets natural-language prompts and must produce Terraform (HCL) for two tasks (VPC + 3 subnets + 3 EC2, and S3 bucket + bucket policy). The scorer extracts .tf code blocks from the model output, runs terraform fmt, terraform init -backend=false, and terraform validate, and scores 1.0 only if init and validate succeed (no plan/apply or LocalStack). Includes dataset, eval, scorer, config entry, registry import, unit tests for extraction/dataset/scorer, and an integration test for bench eval terraform_generation.

What are you adding?

Changes Made

Dataset (src/openbench/datasets/terraform_generation.py): load_dataset() returns a MemoryDataset of 2 samples (VPC+3 subnets+3 EC2, S3 bucket+bucket policy) with prompt text, target="pass", and task_id in metadata.
Scorer (src/openbench/scorers/terraform_generation.py): terraform_generation_scorer() parses .tf code blocks from the last assistant message, writes them to a temp dir, runs terraform fmt, terraform init -backend=false, and terraform validate; returns Score(1.0) only if init and validate succeed, otherwise Score(0.0) (no plan/apply or LocalStack).
Eval (src/openbench/evals/terraform_generation.py): @task terraform_generation() builds a Task with the above dataset, solver [generate()], and the custom scorer.
Config (src/openbench/config.py): Added terraform_generation to _BUILTIN_BENCHMARKS with BenchmarkMetadata (name, description, category, tags, module_path, function_name).
Registry (src/openbench/_registry.py): Import and re-export of terraform_generation from openbench.evals.terraform_generation.
Tests (tests/test_terraform_generation.py): Unit tests for _extract_tf_blocks, dataset loading, and scorer (no output, no blocks, invalid Terraform, valid minimal Terraform).
Integration (tests/integration/test_cli.py): test_basic_terraform_generation() runs bench eval terraform_generation --limit 1 with a Groq model.

Testing

I have run the existing test suite (pytest)
I have added tests for my changes
I have tested with multiple model providers (if applicable)
I have run pre-commit hooks (pre-commit run --all-files)

Checklist

My code follows the project's style guidelines
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (if applicable)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Adding terraform generation to openbench.

e9af38e

kishorealliiita requested review from AarushSah and nmayorga7 as code owners January 31, 2026 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding terraform generation to openbench.#343

Adding terraform generation to openbench.#343
kishorealliiita wants to merge 1 commit intogroq:mainfrom
kishorealliiita:feature/adding-benchmarking-for-terraform-generation

kishorealliiita commented Jan 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

kishorealliiita commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What are you adding?

Changes Made

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

kishorealliiita commented Jan 31, 2026 •

edited

Loading