Course: CSE3063 - Object-Oriented Analysis and Design
Term Project: Iteration 2 (Extensibility & Evaluation)
Language: Python 3.11+
Test Coverage: 88 passing tests
This is the Python implementation of a modular Retrieval-Augmented Generation (RAG) chatbot designed to answer questions about the Marmara University Computer Engineering department (staff, courses, policies).
- AI-Powered Answer Generation: Gemini API integration for natural language responses
- Comprehensive Test Suite: 88 unit tests covering all major components
- Evaluation Framework: Systematic testing with EvalHarness
- Batch Processing: CLI mode for evaluating multiple queries with performance metrics
- Multiple Reranking Strategies: Jaccard similarity, Cosine similarity, and Simple proximity-based
- Production-Ready Logging: JSONL trace logs for debugging and analysis
| Component | Implementation | Description |
|---|---|---|
| Controller | ChatBot | GRASP Controller - orchestrates the full pipeline |
| Intent Detection | RuleIntentDetector | Rule-based keyword matching |
| Query Writing | HeuristicQueryWriter | Stopword filtering & intent boosting |
| Retrieval | KeywordRetriever | TF-based keyword retrieval |
| Reranking | Multiple Strategies | Jaccard, Cosine, Simple proximity |
| Answer Generation | GeminiAnswerAgent | AI-powered contextual responses |
The project is designed to run self-contained from the root directory.
.
├── main.py # Entry Point (Single & Batch Modes)
├── config.yaml # Main Configuration File
├── chunks.json # Document Data Store
├── index.json # Search Index
├── requirements.txt # Python Dependencies
├── env.env # Environment Variables (API Key) - DO NOT COMMIT
├── env.env.template # Template for env.env
├── eval_queries.json # Sample Evaluation Queries
├── CSE3063F25_Grp15_Iter2_7_CLI_output.txt # Persistent Output Log
├── README.md # This file
├── ENV_SETUP_GUIDE.md # Environment Setup Guide
│
├── evaluation/ # Evaluation Results
│ ├── eval_results_*.json # Individual query results
│ └── eval_report_*.json # Aggregate metrics
│
├── logs/ # Execution trace logs (.jsonl)
│ └── run-*.jsonl # Timestamped trace logs
│
├── config/ # Configuration Loading & Data Structures
│ ├── __init__.py
│ ├── app_config.py # Application Configuration Class
│ └── config_loader.py # YAML Config Parser
│
├── entities/ # Domain Objects
│ ├── __init__.py
│ ├── answer.py # Answer Entity
│ ├── chunk.py # Document Chunk
│ ├── context.py # Pipeline Context
│ ├── hit.py # Retrieval Hit
│ ├── intent.py # Intent Enum
│ ├── eval_query.py # Evaluation Query Entity
│ └── eval_result.py # Evaluation Result Entity
│
├── helpers/ # Controllers & Utilities
│ ├── __init__.py
│ ├── output_writer.py # File Output Handler
│ ├── chat_bot.py # Main Pipeline Controller
│ ├── eval_harness.py # Evaluation Framework
│ ├── batch_eval_runner.py # Batch Evaluation Runner
│ ├── reranker_factory.py # Reranker Factory Pattern
│ └── answer_agent_factory.py # Answer Agent Factory Pattern
│
├── service_interfaces/ # Interfaces for Pipeline Stages (Strategy Pattern)
│ ├── __init__.py
│ ├── i_answer_agent.py # Answer Generation Interface
│ ├── i_intent_detector.py # Intent Detection Interface
│ ├── i_query_writer.py # Query Writing Interface
│ ├── i_reranker.py # Reranking Interface
│ └── i_retriever.py # Retrieval Interface
│
├── services/ # Concrete Implementations of Strategies
│ ├── __init__.py
│ ├── heuristic_query_writer.py # Stopword Filtering & Intent Boosting
│ ├── keyword_retriever.py # TF-based Keyword Retrieval
│ ├── rule_intent_detector.py # Rule-based Intent Detection
│ ├── simple_reranker.py # Proximity-based Reranking
│ ├── jaccard_reranker.py # Jaccard Similarity Reranker
│ ├── template_answer_agent.py # Template-based Answer Generation
│ └── gemini_answer_agent.py # Gemini AI Answer Agent
│
├── tests/ # Unit Tests (88 tests)
│ ├── __init__.py
│ ├── test_answer_agent_factory.py # Factory pattern tests (5 tests)
│ ├── test_gemini_answer_agent.py # AI agent tests (23 tests)
│ ├── test_heuristic_query_writer.py # Query writer tests (6 tests)
│ ├── test_jaccard_reranker.py # Jaccard reranker tests (13 tests)
│ ├── test_keyword_retriever.py # Retriever tests (23 tests)
│ ├── test_reranker_factory.py # Reranker factory tests (12 tests)
│ └── test_rule_intent_detector.py # Intent detection tests (6 tests)
│
└── trace/ # Observer Pattern for Logging
├── __init__.py
├── jsonl_trace_sink.py # JSONL File Logger
├── trace_bus.py # Event Publisher
├── trace_event.py # Trace Event Model
└── trace_observer.py # Observer Interface
The application's logic is driven entirely by config.yaml. This fulfills the requirement for "Config-driven strategy selection."
config.yaml
├── strategies/ # Strategy Selection (Class Mapping)
│ ├── intentDetector # "RuleBased"
│ ├── queryWriter # "Heuristic"
│ ├── retriever # "Keyword"
│ ├── reranker # "jaccard" (simple, jaccard, cosine)
│ └── answerAgent # "gemini" (gemini, template)
│
├── parameters/ # Algorithm Tuning & Logic
│ ├── retrieverK # (int) Number of docs to fetch (default: 6)
│ ├── proximityBonus # (int) Score bonus for close terms (default: 5)
│ ├── titleBoost # (int) Score multiplier for titles (default: 3)
│ ├── proximityWindow # (int) Max distance for proximity check (default: 15)
│ └── intentPriority/ # (List) Tie-breaking order
│ ├── StaffLookup
│ ├── Registration
│ ├── PolicyFAQ
│ └── Course
│
├── stopwords/ # (List) Common words to ignore
│ ├── "a", "about", "am", "an", "and"...
│ └── ... (70+ words)
│
└── intentRules/ # (Map) Knowledge Base for Detection & Boosting
├── StaffLookup/ # Keywords for staff queries
│ ├── "professor", "staff", "instructor"
│ ├── "office", "email", "contact"
│ └── ...
├── Registration/ # Keywords for enrollment/admin
│ ├── "enroll", "register", "deadline"
│ └── ...
├── PolicyFAQ/ # Keywords for rules/exams
│ ├── "regulation", "grade", "exam"
│ └── ...
└── Course/ # Keywords for curriculum
├── "credit", "syllabus", "prerequisite"
└── ...- Python 3.11 or higher installed (
python --version) - Required libraries installed (
pip install -r requirements.txt) - Google API Key set in
env.envfile (for Gemini AI) - The files
main.py,config.yaml,chunks.json, andindex.jsonmust be in the same folder.
pip install -r requirements.txtCreate env.env file in project root:
# Copy template
cp env.env.template env.env
# Edit env.env and add your Google API key:
GOOGLE_API_KEY=your-actual-api-key-hereGet your API key from: https://aistudio.google.com/app/apikey
For detailed setup instructions, see ENV_SETUP_GUIDE.md
Run the application with a single question.
python main.py --config config.yaml --q "<Your Question Here>"python main.py --config config.yaml --q "Who is Professor Ganiz?"Run batch evaluation on multiple test queries.
python main.py --config config.yaml --batch <query_file.json> --k <coverage_k>python main.py --config config.yaml --batch eval_queries.json --k 5Output: Results saved to evaluation/ folder
Querying for a specific professor's details.
python main.py --config config.yaml --q "Who is Professor Ganiz?"Expected Output:
Intent.StaffLookup
===============================
Professor Murat Can Ganiz is a faculty member in the Computer Engineering department.
Office: M2-123
Email: mganiz@marmara.edu.tr
Research Areas: Machine Learning, Natural Language Processing
SOURCES:
[1] staff.txt:section1:100-250
===============================
Querying for specific course prerequisites or credits.
python main.py --config config.yaml --q "How many credits does CSE3063 have?"Expected Output:
Intent.Course
===============================
CSE3063 (Object-Oriented Analysis and Design) is a 4-credit course.
Prerequisites: CSE2034
Description: This course covers object-oriented programming principles...
SOURCES:
[1] courses.txt:section2:500-750
===============================
Running systematic evaluation on multiple test queries.
python main.py --config config.yaml --batch eval_queries.json --k 5Expected Output:
Running batch evaluation from: eval_queries.json
K value for coverage@k: 5
Loaded 5 evaluation queries
Evaluated: Who is Professor Ganiz?... (Intent: True, Coverage@5: 1.00, Latency: 1245ms)
Evaluated: What is the office of Murat Can Ganiz?... (Intent: True, Coverage@5: 1.00, Latency: 1189ms)
...
================================================================================
EVALUATION REPORT
================================================================================
Total Queries Evaluated: 5
K Value (for coverage@k): 5
--------------------------------------------------------------------------------
INTENT ACCURACY
--------------------------------------------------------------------------------
Accuracy: 100.00% (5/5)
--------------------------------------------------------------------------------
COVERAGE@5
--------------------------------------------------------------------------------
Average: 80.00%
Median: 100.00%
Min: 0.00%
Max: 100.00%
--------------------------------------------------------------------------------
LATENCY (milliseconds)
--------------------------------------------------------------------------------
Average: 1234 ms
Median: 1210 ms
Min: 987 ms
Max: 1456 ms
Results saved to: evaluation/eval_results_20251218-120000.json
Report saved to: evaluation/eval_report_20251218-120000.json
================================================================================
The batch evaluation mode expects a JSON file with the following structure:
[
{
"question": "Who is Professor Ganiz?",
"expected_intent": "StaffLookup",
"expected_docs": ["staff"],
"expected_answer": null
},
{
"question": "How many credits does CSE3063 have?",
"expected_intent": "Course",
"expected_docs": ["course_catalog"],
"expected_answer": null
}
]- question: The test question to evaluate
- expected_intent: Expected intent classification (StaffLookup, Course, Registration, PolicyFAQ, Unknown)
- expected_docs: List of expected relevant document IDs
- expected_answer: (Optional) Expected answer text for accuracy evaluation
The EvalHarness calculates the following metrics:
Percentage of queries where the detected intent matches the expected intent.
Intent Accuracy = (Correct Intent Classifications) / (Total Queries)
Measures how many of the expected relevant documents appear in the top-k retrieved results.
Coverage@k = (Expected Docs in Top-k) / (Total Expected Docs)
Time taken (in milliseconds) to process the entire RAG pipeline for a query.
- Average, median, min, and max latency are reported
All metrics are also computed per intent type for detailed analysis.
The system uses Google's Gemini API for natural language answer generation.
Benefits:
- Natural, contextual answers
- Better understanding of complex queries
- Citation integration with source references
Configuration:
Create env.env file in project root with your API key:
GOOGLE_API_KEY=your-api-key-here
The system automatically loads the API key from env.env at startup.
See ENV_SETUP_GUIDE.md for detailed instructions.
Choose from different reranking algorithms via config.yaml:
- simple: Proximity-based scoring (default)
- jaccard: Jaccard similarity coefficient
- cosine: Cosine similarity with TF-IDF vectors
Example configuration:
strategies:
reranker: "jaccard" # or "simple", "cosine"The EvalHarness provides systematic testing capabilities:
Features:
- Load test queries from JSON
- Run full pipeline for each query
- Calculate performance metrics
- Generate detailed reports
- Export results for analysis
Output Files (saved in evaluation/ folder):
evaluation/eval_results_<timestamp>.json: Individual query resultsevaluation/eval_report_<timestamp>.json: Aggregate metrics and statistics
All pipeline stages are logged to JSONL files in the logs/ directory for debugging and analysis.
Ensure all project files are in the same directory.
pip install -r requirements.txtDependencies include:
PyYAML>=6.0: Configuration file parsinggoogle-generativeai>=0.3.0: Gemini API integrationpython-dotenv>=0.19.0: Environment variable management
Important: The system requires a Google API key for the Gemini AI answer agent.
- Visit: https://aistudio.google.com/app/apikey
- Sign in with your Google account
- Click "Create API key" or "Get API key"
- Copy the generated key
- Copy the template file:
cp env.env.template env.env- Edit
env.envand add your API key:
GOOGLE_API_KEY=your-actual-api-key-here
- The system will automatically load the key from
env.envat startup.
For detailed setup instructions, see: ENV_SETUP_GUIDE.md
python --version # Should show 3.11 or higher
python -c "import yaml, google.generativeai" # Test importspython main.py --config config.yaml --q "Who is Professor Ganiz?"python main.py --config config.yaml --batch eval_queries.json --k 5The RAG pipeline follows a clear 5-stage flow, orchestrated by ChatBot:
User Question
↓
[1] Intent Detection (RuleIntentDetector)
↓ Intent
[2] Query Writing (HeuristicQueryWriter)
↓ Search Terms
[3] Retrieval (KeywordRetriever)
↓ Top-K Hits
[4] Reranking (Multiple Strategies Available)
↓ Ranked Hits
[5] Answer Generation (GeminiAnswerAgent)
↓
Final Answer with Citations
- Strategy Pattern: All pipeline stages implement interfaces (I*) for easy swapping
- Factory Pattern: RerankerFactory and AnswerAgentFactory create instances based on config
- Observer Pattern: TraceBus publishes events to TraceObservers for logging
- Controller Pattern: ChatBot coordinates the pipeline (GRASP)
- Information Expert: Each entity knows its own data and operations (GRASP)
- Class:
RuleIntentDetector - Logic: Keyword matching against
intentRulesin config - Output: One of:
StaffLookup,CourseLookup,RegulationLookup,GeneralInquiry
- Class:
HeuristicQueryWriter - Logic:
- Remove stopwords from user question
- Add intent-specific booster terms
- Validates input (raises
ValueErrorfor None parameters)
- Output: List of optimized search terms
- Class:
KeywordRetriever - Logic: Term Frequency (TF) lookup in inverted index
- Parameters:
retrieverK(number of documents to fetch) - Output: Top-K document chunks
- Class:
SimpleReranker - Logic:
- Proximity scoring (terms close together score higher)
- Title boost (terms in title get bonus)
- Parameters:
proximityWindow,proximityBonus,titleBoost - Output: Ranked list of hits with scores
- Class:
TemplateAnswerAgent - Logic: Format top-ranked chunk into readable answer with citations
- Output: Formatted answer string
- Controller:
ChatBotmanages the entire pipeline flow - Information Expert: Each service knows its own domain (e.g.,
KeywordRetrieverknows how to search) - Low Coupling: Services depend only on interfaces, not concrete classes
- High Cohesion: Each class has a single, well-defined responsibility
- Single Responsibility: Each service handles one pipeline stage
- Open/Closed: New strategies can be added without modifying existing code
- Liskov Substitution: Any implementation of an interface can replace another
- Interface Segregation: Small, focused interfaces (e.g.,
IIntentDetector) - Dependency Inversion: High-level orchestrator depends on abstractions (interfaces)
- Strategy Pattern: Pluggable algorithms via service interfaces
- Observer Pattern: Trace logging with
TraceBusand observers - Factory Pattern: Configuration-driven service instantiation
Results are appended to CSE3063F25_Grp15_Iter1_7_2CLI_output.txt:
[2025-11-27 11:14:21] Q: Where is Alkaya?
A: Name: Ali Fuat ALKAYA
Office: M2-249
...
Detailed execution logs are written to logs/run-YYYYMMDD-HHMMSS.jsonl:
{"timestamp": "2025-11-27T08:14:21.992447Z", "stage": "IntentDetector", "input": "where is alkaya", "output": "StaffLookup", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.992447Z", "stage": "QueryWriter", "input": "Intent: StaffLookup", "output": "['alkaya', 'staff', ...]", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.993447Z", "stage": "Retriever", "input": ["alkaya", "staff", ...], "output": "Hits found: 6", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.993447Z", "stage": "Reranker", "input": "Input Hits: 6", "output": "Top Score: 170.0", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.993447Z", "stage": "AnswerAgent", "input": "Top Hit: staff", "output": "Name: Ali Fuat ALKAYA...", "durationMs": 0}Each log entry contains:
timestamp: ISO 8601 formatstage: Pipeline stage nameinput: Input to the stageoutput: Output from the stagedurationMs: Execution time in milliseconds
While maintaining the same architecture and logic, the Python implementation uses language-appropriate idioms:
- Data Structures: Python
dict,list,setinstead of Java collections - Enums: Python
enum.Enuminstead of Java enums - Interfaces: Python ABC (Abstract Base Classes) instead of Java interfaces
- File I/O: Python's
open()with context managers instead of Java's BufferedReader/Writer - Configuration: PyYAML library instead of Jackson
- JSON: Python's built-in
jsonmodule instead of Jackson - Exceptions:
ValueError/TypeErrorfor validation instead of custom exceptions - Type Hints: Python type annotations for better code documentation
Problem: ModuleNotFoundError: No module named 'yaml'
# Solution: Install dependencies
pip install -r requirements.txtProblem: FileNotFoundError: config.yaml not found
# Solution: Ensure you're running from the project directory
cd python_version
python main.py --config config.yaml --q "your question"Problem: Tests fail with import errors
# Solution: Install pytest and other test dependencies
pip install pytest pytest-covProblem: Coverage report shows "No data was collected"
# Solution: Run tests without the --cov=src flag
# The project structure doesn't use a 'src' directory
pytest tests/ --cov=. --cov-report=term-missing --cov-report=htmlProblem: google.generativeai errors or API quota exceeded
- Explanation: Gemini API requires a valid API key and has rate limits
- Solution:
- Verify your API key is set correctly in
env.env - Check your API quota at https://aistudio.google.com/
- Consider using the template answer agent instead:
strategies: answerAgent: "template"
- Verify your API key is set correctly in
Problem: Zero duration in logs
- Explanation: Operations are very fast (< 1ms), so they round to 0
- Note: This is expected behavior for the baseline implementation