Recipe Chatbot - AI Evaluations Course

This repository contains a complete AI evaluations course built around a Recipe Chatbot. Through 5 progressive homework assignments, you'll learn practical techniques for evaluating and improving AI systems.

Quick Start

Clone & Setup

git clone https://github.com/ai-evals-course/recipe-chatbot.git
cd recipe-chatbot
uv sync
source .venv/bin/activate

Configure Environment

cp env.example .env
# Edit .env to add your model and API keys

Run the Chatbot

uv run uvicorn backend.main:app --reload
# Open http://127.0.0.1:8000

Course Overview

Bonus: Using AI Assisted Coding to Tackle Homework Problems

Homework Progression

HW1: Basic Prompt Engineering (homeworks/hw1/)
- Write system prompts and expand test queries
- Walkthrough: See HW2 walkthrough for HW1 content
HW2: Error Analysis & Failure Taxonomy (homeworks/hw2/)
- Systematic error analysis and failure mode identification
- Interactive Walkthrough:
  - Code: homeworks/hw2/hw2_walkthrough.ipynb
  - video 1: walkthrough of code
  - video 2: open & axial coding walkthrough
HW3: LLM-as-Judge Evaluation (homeworks/hw3/)
- Automated evaluation using the judgy library
- Interactive Walkthrough:
  - Code: homeworks/hw3/hw3_walkthrough.ipynb
  - video: walkthrough of solution
HW4: RAG/Retrieval Evaluation (homeworks/hw4/)
- BM25 retrieval system with synthetic query generation
- Interactive Walkthrough:
  - Code: homeworks/hw4/hw4_walkthrough.ipynb
  - video: walkthrough of solution
HW5: Agent Failure Analysis (homeworks/hw5/)
- Analyze conversation traces and failure patterns
- Interactive Walkthrough:
  - Code: homeworks/hw5/hw5_walkthrough.ipynb
  - video: walkthrough of solution

Key Features

Backend: FastAPI with LiteLLM (multi-provider LLM support)
Frontend: Simple chat interface with conversation history
Annotation Tool: FastHTML-based interface for manual evaluation (annotation/)
Retrieval: BM25-based recipe search (backend/retrieval.py)
Query Rewriting: LLM-powered query optimization (backend/query_rewrite_agent.py)
Evaluation Tools: Automated metrics, bias correction, and analysis scripts

Project Structure

recipe-chatbot/
├── backend/               # FastAPI app & core logic
├── frontend/              # Chat UI (HTML/CSS/JS)
├── homeworks/             # 5 progressive assignments
│   ├── hw1/              # Prompt engineering
│   ├── hw2/              # Error analysis (with walkthrough)
│   ├── hw3/              # LLM-as-Judge (with walkthrough)
│   ├── hw4/              # Retrieval eval (with walkthroughs)
│   └── hw5/              # Agent analysis
├── annotation/            # Manual annotation tools
├── scripts/               # Utility scripts
├── data/                  # Datasets and queries
└── results/               # Evaluation outputs

Running Homework Walkthroughs

Each homework (HW2-HW5) includes a self-contained Jupyter notebook walkthrough:

cd homeworks/hw2
jupyter notebook hw2_walkthrough.ipynb

The walkthroughs use data from reference_files/ and can be run without any external scripts. Each notebook includes:

Data loading and exploration
Step-by-step solution code
Expected outputs and analysis

Additional Resources

Annotation Interface: Run python annotation/annotation.py for manual evaluation
Bulk Testing: Use python scripts/bulk_test.py to test multiple queries
Trace Analysis: All conversations saved as JSON for analysis

Environment Variables

Configure your .env file with:

MODEL_NAME: LLM model for chatbot (e.g., openai/gpt-5-chat-latest, anthropic/claude-3-sonnet-20240229)
MODEL_NAME_JUDGE: LLM model for judge, which can be smaller than the chatbot model (e.g., openai/gpt-5-mini, anthropic/claude-3-haiku-20240307)
API keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.

See LiteLLM docs for supported providers.

Course Philosophy

This course emphasizes:

Practical experience over theory
Systematic evaluation over "vibes"
Progressive complexity - each homework builds on previous work
Industry-standard techniques for real-world AI evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
annotation		annotation
backend		backend
data		data
frontend		frontend
homeworks		homeworks
lesson-4		lesson-4
lesson-7		lesson-7
lesson-8		lesson-8
scripts		scripts
.gitignore		.gitignore
README.md		README.md
env.example		env.example
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipe Chatbot - AI Evaluations Course

Quick Start

Course Overview

Homework Progression

Key Features

Project Structure

Running Homework Walkthroughs

Additional Resources

Environment Variables

Course Philosophy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

ai-evals-course/recipe-chatbot

Folders and files

Latest commit

History

Repository files navigation

Recipe Chatbot - AI Evaluations Course

Quick Start

Course Overview

Homework Progression

Key Features

Project Structure

Running Homework Walkthroughs

Additional Resources

Environment Variables

Course Philosophy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages