ai-benchmark-cli

CLI to benchmark multiple LLMs on a question set and show a live table in the terminal.

Installation

bun install
export NEBIUS_API_KEY="..."
#or
export OPENROUTER_API_KEY="..."

Images

Generated table

Generated web graph

Usage

bun run start [command]

Commands

--run - Run benchmark tests on the existing question set
--create - Interactive question creator to generate a new question set

Run web ui

cd webui

# Start the react webui
bun run dev
# web ui at http://localhost:3000/app

Examples

# Run benchmarks with existing question set
bun run start --run

# Create a new question set with ai
bun run start --create

Configure

Models: edit constants/index.ts (models)
Provider: configure .env and constants/index.ts (BASE_URL) according to your provider.
Questions: edit/add questions/test.json
Logs: written to logs/<model-name>.log

TO-DOs

Better caching
Image gen benchmarks (can't test it)
add options to run benchmark from available quesitons and do caching from that
ask for question path in makeQuestion
ask for question path in --run

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.cursor/rules		.cursor/rules
assets		assets
bin		bin
cache		cache
questions		questions
src		src
webui		webui
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
output.example.json		output.example.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-benchmark-cli

Installation

Images

Generated table

Generated web graph

Usage

Commands

Run web ui

Examples

Configure

TO-DOs

About

Uh oh!

Releases

Packages

Languages

Kartik-2239/ai-benchmark-cli

Folders and files

Latest commit

History

Repository files navigation

ai-benchmark-cli

Installation

Images

Generated table

Generated web graph

Usage

Commands

Run web ui

Examples

Configure

TO-DOs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages