LLM-As-a-Judge

A Streamlit-based application for evaluating and comparing responses generated by Large Language Models (LLMs). This tool supports multiple evaluation methods, including pairwise comparisons, reference-based evaluations, criteria-based evaluations, hallucination detection, and traditional NLP metrics.

Features

Non-LLM Evaluation:
- Compare bot responses against ground truth using traditional NLP metrics like BLEU, ROUGE, BERTScore, and Edit Distance.
Pairwise Comparison:
- Compare two LLM responses directly to determine which is better based on a detailed evaluation.
Reference-Free Criteria Evaluation:
- Evaluate responses on criteria such as accuracy, coherence, creativity, and relevance without requiring a ground truth.
Reference-Based Evaluation:
- Evaluate responses against a reference or ground truth answer with detailed scoring and explanations.
Hallucination Detection:
- Detect hallucinations in LLM-generated responses by comparing them to a provided context.

Usage

Enter your OpenAI API key in the sidebar to enable LLM-based evaluations.
Select an evaluation method from the sidebar:

Non-LLM Evaluation
Pairwise Comparison
Reference-Free Criteria Evaluation
Reference-Based Evaluation
Hallucination Detection

Follow the prompts in the main interface to input your data and generate evaluations.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
QA_context.txt		QA_context.txt
README.md		README.md
app.py		app.py
openai_utils.py		openai_utils.py
prompts.py		prompts.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-As-a-Judge

Features

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM-As-a-Judge

Features

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages