Skip to content

Benchmarking tool which allows you to make benchmarks of any llms with your own data

Notifications You must be signed in to change notification settings

Kartik-2239/ai-benchmark-cli

Repository files navigation

ai-benchmark-cli

CLI to benchmark multiple LLMs on a question set and show a live table in the terminal.

Installation

bun install
export NEBIUS_API_KEY="..."
#or
export OPENROUTER_API_KEY="..."

Images

Generated table

cli-table

Generated web graph

web-graph

Usage

bun run start [command]

Commands

  • --run - Run benchmark tests on the existing question set
  • --create - Interactive question creator to generate a new question set

Run web ui

cd webui

# Start the react webui
bun run dev
# web ui at http://localhost:3000/app

Examples

# Run benchmarks with existing question set
bun run start --run

# Create a new question set with ai
bun run start --create

Configure

  • Models: edit constants/index.ts (models)
  • Provider: configure .env and constants/index.ts (BASE_URL) according to your provider.
  • Questions: edit/add questions/test.json
  • Logs: written to logs/<model-name>.log

TO-DOs

  • Better caching
  • Image gen benchmarks (can't test it)
  • add options to run benchmark from available quesitons and do caching from that
  • ask for question path in makeQuestion
  • ask for question path in --run

About

Benchmarking tool which allows you to make benchmarks of any llms with your own data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages