Skip to content
@rhesis-ai

Rhesis AI

Open-source testing & evaluation for LLM & agentic applications.

Rhesis AI Logo

Rhesis: Collaborative Testing for LLM & Agentic Applications

License PyPI Version Python Versions codecov Discord LinkedIn Hugging Face Documentation

Website · Docs · Discord · Changelog

More than just evals.
Collaborative agent testing for teams.

Generate tests from requirements, simulate conversation flows, detect adversarial behaviors, evaluate with 60+ metrics, and trace failures with OpenTelemetry. Engineers and domain experts, working together.

Rhesis Platform Overview


About Rhesis AI

Built by developers who needed better LLM testing tools

We built Rhesis because existing LLM testing tools didn't meet our needs for testing agentic applications. If you face the same challenges, contributions are welcome.

Collaborative testing for cross-functional teams

Testing shouldn't be limited to engineers. Legal teams understand compliance requirements. Marketing knows brand guidelines. Domain experts identify edge cases. Rhesis enables everyone to contribute their expertise without writing code.

From requirements to automated test execution

Define requirements in plain language. Rhesis generates test scenarios based on your team's collective knowledge. Execute tests automatically via UI, SDK, or CI/CD. Get detailed results showing exactly how your LLM & agentic applications perform.

Open source with a clear license model

MIT licensed. Enterprise version lives in ee/ folders and remain separate.


Get started

Check out our main repository and documentation to get started.

Quick start options:

  • Cloud - app.rhesis.ai - Managed service, just connect your app
  • Self-hosted - Run locally with Docker in 5 minutes
  • Python SDK - Integrate directly into your codebase

Made with Rhesis logo in Potsdam, Germany 🇩🇪

Learn more at rhesis.ai

Pinned Loading

  1. rhesis rhesis Public

    Open-source testing platform & SDK for LLM and agentic applications. Define what your app should and shouldn't do, and Rhesis generates hundreds of test scenarios, runs them, and shows you where it…

    Python 274 17

Repositories

Showing 4 of 4 repositories

Top languages

Loading…

Most used topics

Loading…