Skip to content

Releases: vercel-labs/agent-eval

@vercel/agent-eval@0.0.12

06 Feb 21:11
1b009bb

Choose a tag to compare

Patch Changes

@vercel/agent-eval-playground@0.0.5

06 Feb 22:39
8999b85

Choose a tag to compare

Patch Changes

  • 6159d01 Thanks @allenzhou101! - Run playground in production mode (next start) instead of dev mode (next dev) to fix React version conflicts and "Cannot read properties of null (reading 'useInsertionEffect')" errors when running via npx.

@vercel/agent-eval-playground@0.0.4

06 Feb 22:33

Choose a tag to compare

Patch Changes

  • 23e2d43 Thanks @allenzhou101! - Add repository field to package.json to fix npm provenance verification error during publishing.

v0.0.11

05 Feb 18:14
b536715

Choose a tag to compare

Patch Changes

v0.0.8

04 Feb 00:11

Choose a tag to compare

Version 0.0.8

v0.0.7

03 Feb 20:58

Choose a tag to compare

What's Changed

Fixes

  • CLI loads .env.local: CLI now loads .env.local before .env (matching integration test behavior)
  • CLI version from package.json: Version is now read dynamically instead of being hardcoded

Changes

  • Default timeout increased: Changed from 5 minutes to 10 minutes (600s)

v0.0.6

03 Feb 20:15

Choose a tag to compare

What's Changed

Fixes

  • [OpenCode] Fix model format: OpenCode now requires vercel/ prefix in model strings (e.g., vercel/minimax/minimax-m2.1)
  • [Docker] Fix file permissions: Added chown after file upload so agents can edit files in the sandbox
  • Timeout enforcement: Added Promise.race at runner level and signal abort to agent on timeout for proper resource cleanup

New

  • Minimax model support: Added vercel/minimax/minimax-m2.1 to supported models
  • Parallel integration tests: All agents now tested on both Docker and Vercel sandboxes concurrently
  • [Config] Sandbox backend selection: Moved sandbox backend selection from env var to experiment config (sandbox: 'docker' | 'vercel' | 'auto')

Documentation

  • Updated README with correct model format and examples

v0.0.5

03 Feb 17:49

Choose a tag to compare

What's Changed

  • Added Docker sandbox as alternative to Vercel Sandbox
  • Added test:integration:docker and test:integration:vercel scripts