Skip to content

test(docs): add end to end evaluation doc tests#2442

Open
BloggerBust wants to merge 9 commits intoconfident-ai:mainfrom
BloggerBust:test/docs-end-to-end-llm-evals
Open

test(docs): add end to end evaluation doc tests#2442
BloggerBust wants to merge 9 commits intoconfident-ai:mainfrom
BloggerBust:test/docs-end-to-end-llm-evals

Conversation

@BloggerBust
Copy link
Contributor

@BloggerBust BloggerBust commented Jan 19, 2026

  • add deterministic/offline end-to-end coverage for single-turn + multi-turn evaluate() flows
  • validate EvaluationResult/TestResult shape plus dataset JSON and CSV export schemas
  • add dedicated cache end-to-end tests covering write_cache/use_cache and expected on-disk artifacts
  • add end-to-end tests for evaluate() configs (AsyncConfig, ErrorConfig, DisplayConfig) using deterministic metrics
  • introduce top-level test fixtures (telemetry opt-out, isolated .deepeval dir, settings reset, tracing cleanup) and keep core-only env sandboxing in tests/test_core
  • Add CLI smoke test

- end to end tests for docs/docs/evaluation-end-to-end-llm-evals.mdx
- add deterministic offline E2E tests covering single-turn and multi-turn flows
- validate EvaluationResult/TestResult shape and dataset JSON/CSV artifact schemas
- add offline fixtures to disable dotenv loading and browser opening
- add networked CLI smoke test gated on OPENAI_API_KEY
- add dedicated GitHub Actions workflow to run docs-based tests
- run DeepEval end-to-end documentation tests in CI with secrets
- support maintainer-only PRs, main branch pushes, and manual dispatch
- temporarily disable Confident docs tests pending fixes
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 19, 2026

Skipped: This PR was not opened by one of your configured authors: (tanayvaswani, trevor-cai, kritinv, ...)

@vercel
Copy link

vercel bot commented Jan 19, 2026

@BloggerBust is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

- add deterministic metrics for missing-param and raising error scenarios
- add ErrorConfig tests for skip_on_missing_params and ignore_errors (incl precedence)
- add AsyncConfig, CacheConfig, and DisplayConfig behavior/validation coverage
…ites

- Extract generic evaluate() e2e flows into dedicated test files
- Add cache behavior coverage for write_cache/use_cache and on-disk artifacts
- Add evaluate config coverage for AsyncConfig/ErrorConfig/DisplayConfig
- Introduce top-level test fixtures for telemetry opt-out, settings reset, and tracing cleanup
- Remove the monolithic end-to-end test file and reorganize fixtures between tests/ and tests/test_core/
Confident tests can all go under tests/confident, so we can flatten
this test suite
@trevor-cai trevor-cai changed the title test(docs): add end-to-end evaluation doc tests test(docs): add component evaluation doc tests Jan 19, 2026
@trevor-cai trevor-cai changed the title test(docs): add component evaluation doc tests test(docs): add end to end evaluation doc tests Jan 19, 2026
@A-Vamshi A-Vamshi force-pushed the test/docs-end-to-end-llm-evals branch from 760fe18 to dca7006 Compare January 20, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants