Why G-Eval isn't used to generate the squad score ? #2105

zoulou00 · 2025-09-26T21:42:50Z

zoulou00
Sep 26, 2025

Hello,

Looking at squad_score, it currently does a plain LLM call with a JSON instruction for evaluation:

Line 435 in bdd1f69

def squad_score(

I was wondering: why isn’t G-Eval used here instead of a raw LLM call?
I thought I saw in the DeepEval / Confident AI docs that G-Eval can be more reliable for these kinds of evaluation tasks.

kritinv · 2026-01-27T06:20:29Z

kritinv
Jan 27, 2026
Maintainer

Hey @zoulou00, we're happy to accept a PR that allows someone to run this benchmark using G-Eval instead. The reason why it should be an option is because the squad score you've linked to is much less costly to run.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why G-Eval isn't used to generate the squad score ? #2105

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why G-Eval isn't used to generate the squad score ? #2105

Uh oh!

zoulou00 Sep 26, 2025

Replies: 1 comment

Uh oh!

kritinv Jan 27, 2026 Maintainer

zoulou00
Sep 26, 2025

kritinv
Jan 27, 2026
Maintainer