CLI to benchmark multiple LLMs on a question set and show a live table in the terminal.
bun install
export NEBIUS_API_KEY="..."
#or
export OPENROUTER_API_KEY="..."bun run start [command]--run- Run benchmark tests on the existing question set--create- Interactive question creator to generate a new question set
cd webui
# Start the react webui
bun run dev
# web ui at http://localhost:3000/app# Run benchmarks with existing question set
bun run start --run
# Create a new question set with ai
bun run start --create- Models: edit
constants/index.ts(models) - Provider: configure
.envandconstants/index.ts(BASE_URL) according to your provider. - Questions: edit/add
questions/test.json - Logs: written to
logs/<model-name>.log
- Better caching
- Image gen benchmarks (can't test it)
- add options to run benchmark from available quesitons and do caching from that
- ask for question path in makeQuestion
- ask for question path in --run

