Replies: 1 comment
-
|
Hey @zoulou00, we're happy to accept a PR that allows someone to run this benchmark using G-Eval instead. The reason why it should be an option is because the squad score you've linked to is much less costly to run. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Looking at squad_score, it currently does a plain LLM call with a JSON instruction for evaluation:
deepeval/deepeval/scorer/scorer.py
Line 435 in bdd1f69
I was wondering: why isn’t G-Eval used here instead of a raw LLM call?
I thought I saw in the DeepEval / Confident AI docs that G-Eval can be more reliable for these kinds of evaluation tasks.
Beta Was this translation helpful? Give feedback.
All reactions