Marmara CSE RAG System (Python Version - Iteration 2)

Course: CSE3063 - Object-Oriented Analysis and Design
Term Project: Iteration 2 (Extensibility & Evaluation)
Language: Python 3.11+
Test Coverage: 88 passing tests

1. Overview

This is the Python implementation of a modular Retrieval-Augmented Generation (RAG) chatbot designed to answer questions about the Marmara University Computer Engineering department (staff, courses, policies).

Iteration 2 Highlights:

AI-Powered Answer Generation: Gemini API integration for natural language responses
Comprehensive Test Suite: 88 unit tests covering all major components
Evaluation Framework: Systematic testing with EvalHarness
Batch Processing: CLI mode for evaluating multiple queries with performance metrics
Multiple Reranking Strategies: Jaccard similarity, Cosine similarity, and Simple proximity-based
Production-Ready Logging: JSONL trace logs for debugging and analysis

Core Architecture Components:

Component	Implementation	Description
Controller	ChatBot	GRASP Controller - orchestrates the full pipeline
Intent Detection	RuleIntentDetector	Rule-based keyword matching
Query Writing	HeuristicQueryWriter	Stopword filtering & intent boosting
Retrieval	KeywordRetriever	TF-based keyword retrieval
Reranking	Multiple Strategies	Jaccard, Cosine, Simple proximity
Answer Generation	GeminiAnswerAgent	AI-powered contextual responses

2. Directory Structure

The project is designed to run self-contained from the root directory.

.
├── main.py                           # Entry Point (Single & Batch Modes)
├── config.yaml                       # Main Configuration File
├── chunks.json                       # Document Data Store
├── index.json                        # Search Index
├── requirements.txt                  # Python Dependencies
├── env.env                           # Environment Variables (API Key) - DO NOT COMMIT
├── env.env.template                  # Template for env.env
├── eval_queries.json                 # Sample Evaluation Queries
├── CSE3063F25_Grp15_Iter2_7_CLI_output.txt  # Persistent Output Log
├── README.md                         # This file
├── ENV_SETUP_GUIDE.md                # Environment Setup Guide
│
├── evaluation/                       # Evaluation Results
│   ├── eval_results_*.json          # Individual query results
│   └── eval_report_*.json           # Aggregate metrics
│
├── logs/                             # Execution trace logs (.jsonl)
│   └── run-*.jsonl                  # Timestamped trace logs
│
├── config/                           # Configuration Loading & Data Structures
│   ├── __init__.py
│   ├── app_config.py                 # Application Configuration Class
│   └── config_loader.py              # YAML Config Parser
│
├── entities/                         # Domain Objects
│   ├── __init__.py
│   ├── answer.py                     # Answer Entity
│   ├── chunk.py                      # Document Chunk
│   ├── context.py                    # Pipeline Context
│   ├── hit.py                        # Retrieval Hit
│   ├── intent.py                     # Intent Enum
│   ├── eval_query.py                 # Evaluation Query Entity
│   └── eval_result.py                # Evaluation Result Entity
│
├── helpers/                          # Controllers & Utilities
│   ├── __init__.py
│   ├── output_writer.py              # File Output Handler
│   ├── chat_bot.py                   # Main Pipeline Controller
│   ├── eval_harness.py               # Evaluation Framework
│   ├── batch_eval_runner.py          # Batch Evaluation Runner
│   ├── reranker_factory.py           # Reranker Factory Pattern
│   └── answer_agent_factory.py       # Answer Agent Factory Pattern
│
├── service_interfaces/               # Interfaces for Pipeline Stages (Strategy Pattern)
│   ├── __init__.py
│   ├── i_answer_agent.py             # Answer Generation Interface
│   ├── i_intent_detector.py          # Intent Detection Interface
│   ├── i_query_writer.py             # Query Writing Interface
│   ├── i_reranker.py                 # Reranking Interface
│   └── i_retriever.py                # Retrieval Interface
│
├── services/                         # Concrete Implementations of Strategies
│   ├── __init__.py
│   ├── heuristic_query_writer.py     # Stopword Filtering & Intent Boosting
│   ├── keyword_retriever.py          # TF-based Keyword Retrieval
│   ├── rule_intent_detector.py       # Rule-based Intent Detection
│   ├── simple_reranker.py            # Proximity-based Reranking
│   ├── jaccard_reranker.py           # Jaccard Similarity Reranker
│   ├── template_answer_agent.py      # Template-based Answer Generation
│   └── gemini_answer_agent.py        # Gemini AI Answer Agent
│
├── tests/                            # Unit Tests (88 tests)
│   ├── __init__.py
│   ├── test_answer_agent_factory.py  # Factory pattern tests (5 tests)
│   ├── test_gemini_answer_agent.py   # AI agent tests (23 tests)
│   ├── test_heuristic_query_writer.py # Query writer tests (6 tests)
│   ├── test_jaccard_reranker.py      # Jaccard reranker tests (13 tests)
│   ├── test_keyword_retriever.py     # Retriever tests (23 tests)
│   ├── test_reranker_factory.py      # Reranker factory tests (12 tests)
│   └── test_rule_intent_detector.py  # Intent detection tests (6 tests)
│
└── trace/                            # Observer Pattern for Logging
    ├── __init__.py
    ├── jsonl_trace_sink.py           # JSONL File Logger
    ├── trace_bus.py                  # Event Publisher
    ├── trace_event.py                # Trace Event Model
    └── trace_observer.py             # Observer Interface

3. Configuration Schema (config.yaml)

The application's logic is driven entirely by config.yaml. This fulfills the requirement for "Config-driven strategy selection."

config.yaml
├── strategies/             # Strategy Selection (Class Mapping)
│   ├── intentDetector      # "RuleBased"
│   ├── queryWriter         # "Heuristic"
│   ├── retriever           # "Keyword"
│   ├── reranker            # "jaccard" (simple, jaccard, cosine)
│   └── answerAgent         # "gemini" (gemini, template)
│
├── parameters/             # Algorithm Tuning & Logic
│   ├── retrieverK          # (int) Number of docs to fetch (default: 6)
│   ├── proximityBonus      # (int) Score bonus for close terms (default: 5)
│   ├── titleBoost          # (int) Score multiplier for titles (default: 3)
│   ├── proximityWindow     # (int) Max distance for proximity check (default: 15)
│   └── intentPriority/     # (List) Tie-breaking order
│       ├── StaffLookup
│       ├── Registration
│       ├── PolicyFAQ
│       └── Course
│
├── stopwords/              # (List) Common words to ignore
│   ├── "a", "about", "am", "an", "and"...
│   └── ... (70+ words)
│
└── intentRules/            # (Map) Knowledge Base for Detection & Boosting
    ├── StaffLookup/        # Keywords for staff queries
    │   ├── "professor", "staff", "instructor"
    │   ├── "office", "email", "contact"
    │   └── ...
    ├── Registration/       # Keywords for enrollment/admin
    │   ├── "enroll", "register", "deadline"
    │   └── ...
    ├── PolicyFAQ/          # Keywords for rules/exams
    │   ├── "regulation", "grade", "exam"
    │   └── ...
    └── Course/             # Keywords for curriculum
        ├── "credit", "syllabus", "prerequisite"
        └── ...

4. Commands (How to Run)

Prerequisites

Python 3.11 or higher installed (python --version)
Required libraries installed (pip install -r requirements.txt)
Google API Key set in env.env file (for Gemini AI)
The files main.py, config.yaml, chunks.json, and index.json must be in the same folder.

Quick Setup

1. Install Dependencies

pip install -r requirements.txt

2. Configure API Key

Create env.env file in project root:

# Copy template
cp env.env.template env.env

# Edit env.env and add your Google API key:
GOOGLE_API_KEY=your-actual-api-key-here

Get your API key from: https://aistudio.google.com/app/apikey

For detailed setup instructions, see ENV_SETUP_GUIDE.md

Single Query Mode

Run the application with a single question.

Syntax

python main.py --config config.yaml --q "<Your Question Here>"

Example

python main.py --config config.yaml --q "Who is Professor Ganiz?"

Batch Evaluation Mode

Run batch evaluation on multiple test queries.

Syntax

python main.py --config config.yaml --batch <query_file.json> --k <coverage_k>

Example

python main.py --config config.yaml --batch eval_queries.json --k 5

Output: Results saved to evaluation/ folder

Usage Examples

Scenario 1: Staff Lookup (Single Query)

Querying for a specific professor's details.

python main.py --config config.yaml --q "Who is Professor Ganiz?"

Expected Output:

Intent.StaffLookup
===============================
Professor Murat Can Ganiz is a faculty member in the Computer Engineering department.

Office: M2-123
Email: mganiz@marmara.edu.tr
Research Areas: Machine Learning, Natural Language Processing

SOURCES:
[1] staff.txt:section1:100-250
===============================

Scenario 2: Course Information (Single Query)

Querying for specific course prerequisites or credits.

python main.py --config config.yaml --q "How many credits does CSE3063 have?"

Expected Output:

Intent.Course
===============================
CSE3063 (Object-Oriented Analysis and Design) is a 4-credit course.

Prerequisites: CSE2034
Description: This course covers object-oriented programming principles...

SOURCES:
[1] courses.txt:section2:500-750
===============================

Scenario 3: Batch Evaluation

Running systematic evaluation on multiple test queries.

python main.py --config config.yaml --batch eval_queries.json --k 5

Expected Output:

Running batch evaluation from: eval_queries.json
K value for coverage@k: 5
Loaded 5 evaluation queries
Evaluated: Who is Professor Ganiz?... (Intent: True, Coverage@5: 1.00, Latency: 1245ms)
Evaluated: What is the office of Murat Can Ganiz?... (Intent: True, Coverage@5: 1.00, Latency: 1189ms)
...

================================================================================
EVALUATION REPORT
================================================================================

Total Queries Evaluated: 5
K Value (for coverage@k): 5

--------------------------------------------------------------------------------
INTENT ACCURACY
--------------------------------------------------------------------------------
  Accuracy: 100.00% (5/5)

--------------------------------------------------------------------------------
COVERAGE@5
--------------------------------------------------------------------------------
  Average:  80.00%
  Median:   100.00%
  Min:      0.00%
  Max:      100.00%

--------------------------------------------------------------------------------
LATENCY (milliseconds)
--------------------------------------------------------------------------------
  Average:  1234 ms
  Median:   1210 ms
  Min:      987 ms
  Max:      1456 ms

Results saved to: evaluation/eval_results_20251218-120000.json
Report saved to: evaluation/eval_report_20251218-120000.json
================================================================================

5. Evaluation Query Format

The batch evaluation mode expects a JSON file with the following structure:

[
  {
    "question": "Who is Professor Ganiz?",
    "expected_intent": "StaffLookup",
    "expected_docs": ["staff"],
    "expected_answer": null
  },
  {
    "question": "How many credits does CSE3063 have?",
    "expected_intent": "Course",
    "expected_docs": ["course_catalog"],
    "expected_answer": null
  }
]

Field Descriptions:

question: The test question to evaluate
expected_intent: Expected intent classification (StaffLookup, Course, Registration, PolicyFAQ, Unknown)
expected_docs: List of expected relevant document IDs
expected_answer: (Optional) Expected answer text for accuracy evaluation

6. Evaluation Metrics

The EvalHarness calculates the following metrics:

Intent Accuracy

Percentage of queries where the detected intent matches the expected intent.

Intent Accuracy = (Correct Intent Classifications) / (Total Queries)

Coverage@k

Measures how many of the expected relevant documents appear in the top-k retrieved results.

Coverage@k = (Expected Docs in Top-k) / (Total Expected Docs)

Latency

Time taken (in milliseconds) to process the entire RAG pipeline for a query.

Average, median, min, and max latency are reported

Per-Intent Breakdown

All metrics are also computed per intent type for detailed analysis.

7. Iteration 2 New Features

7.1 Gemini AI Answer Agent

The system uses Google's Gemini API for natural language answer generation.

Benefits:

Natural, contextual answers
Better understanding of complex queries
Citation integration with source references

Configuration: Create env.env file in project root with your API key:

GOOGLE_API_KEY=your-api-key-here

The system automatically loads the API key from env.env at startup. See ENV_SETUP_GUIDE.md for detailed instructions.

7.2 Multiple Reranking Strategies

Choose from different reranking algorithms via config.yaml:

simple: Proximity-based scoring (default)
jaccard: Jaccard similarity coefficient
cosine: Cosine similarity with TF-IDF vectors

Example configuration:

strategies:
  reranker: "jaccard"  # or "simple", "cosine"

7.3 Batch Evaluation Framework

The EvalHarness provides systematic testing capabilities:

Features:

Load test queries from JSON
Run full pipeline for each query
Calculate performance metrics
Generate detailed reports
Export results for analysis

Output Files (saved in evaluation/ folder):

evaluation/eval_results_<timestamp>.json: Individual query results
evaluation/eval_report_<timestamp>.json: Aggregate metrics and statistics

7.4 Trace Logging

All pipeline stages are logged to JSONL files in the logs/ directory for debugging and analysis.

8. Installation

Step 1: Clone or Download the Project

Ensure all project files are in the same directory.

Step 2: Install Python Dependencies

pip install -r requirements.txt

Dependencies include:

PyYAML>=6.0: Configuration file parsing
google-generativeai>=0.3.0: Gemini API integration
python-dotenv>=0.19.0: Environment variable management

Step 3: Set Up Google API Key

Important: The system requires a Google API key for the Gemini AI answer agent.

Get Your API Key

Visit: https://aistudio.google.com/app/apikey
Sign in with your Google account
Click "Create API key" or "Get API key"
Copy the generated key

Configure in env.env File

Copy the template file:

cp env.env.template env.env

Edit env.env and add your API key:

GOOGLE_API_KEY=your-actual-api-key-here

The system will automatically load the key from env.env at startup.

For detailed setup instructions, see: ENV_SETUP_GUIDE.md

Step 4: Verify Installation

python --version  # Should show 3.11 or higher
python -c "import yaml, google.generativeai"  # Test imports

Step 5: Run a Test Query

python main.py --config config.yaml --q "Who is Professor Ganiz?"

Step 6: (Optional) Run Batch Evaluation

python main.py --config config.yaml --batch eval_queries.json --k 5

9. Pipeline Architecture

The RAG pipeline follows a clear 5-stage flow, orchestrated by ChatBot:

User Question
     ↓
[1] Intent Detection (RuleIntentDetector)
     ↓ Intent
[2] Query Writing (HeuristicQueryWriter)
     ↓ Search Terms
[3] Retrieval (KeywordRetriever)
     ↓ Top-K Hits
[4] Reranking (Multiple Strategies Available)
     ↓ Ranked Hits
[5] Answer Generation (GeminiAnswerAgent)
     ↓
Final Answer with Citations

Design Patterns Used

Strategy Pattern: All pipeline stages implement interfaces (I*) for easy swapping
Factory Pattern: RerankerFactory and AnswerAgentFactory create instances based on config
Observer Pattern: TraceBus publishes events to TraceObservers for logging
Controller Pattern: ChatBot coordinates the pipeline (GRASP)
Information Expert: Each entity knows its own data and operations (GRASP)

Stage Details

1. Intent Detection

Class: RuleIntentDetector
Logic: Keyword matching against intentRules in config
Output: One of: StaffLookup, CourseLookup, RegulationLookup, GeneralInquiry

2. Query Writing

Class: HeuristicQueryWriter
Logic:
- Remove stopwords from user question
- Add intent-specific booster terms
- Validates input (raises ValueError for None parameters)
Output: List of optimized search terms

3. Retrieval

Class: KeywordRetriever
Logic: Term Frequency (TF) lookup in inverted index
Parameters: retrieverK (number of documents to fetch)
Output: Top-K document chunks

4. Reranking

Class: SimpleReranker
Logic:
- Proximity scoring (terms close together score higher)
- Title boost (terms in title get bonus)
Parameters: proximityWindow, proximityBonus, titleBoost
Output: Ranked list of hits with scores

5. Answer Generation

Class: TemplateAnswerAgent
Logic: Format top-ranked chunk into readable answer with citations
Output: Formatted answer string

10. Design Patterns & Principles

GRASP Patterns

Controller: ChatBot manages the entire pipeline flow
Information Expert: Each service knows its own domain (e.g., KeywordRetriever knows how to search)
Low Coupling: Services depend only on interfaces, not concrete classes
High Cohesion: Each class has a single, well-defined responsibility

SOLID Principles

Single Responsibility: Each service handles one pipeline stage
Open/Closed: New strategies can be added without modifying existing code
Liskov Substitution: Any implementation of an interface can replace another
Interface Segregation: Small, focused interfaces (e.g., IIntentDetector)
Dependency Inversion: High-level orchestrator depends on abstractions (interfaces)

Design Patterns Used

Strategy Pattern: Pluggable algorithms via service interfaces
Observer Pattern: Trace logging with TraceBus and observers
Factory Pattern: Configuration-driven service instantiation

11. Output & Logging

Answer Output

Results are appended to CSE3063F25_Grp15_Iter1_7_2CLI_output.txt:

[2025-11-27 11:14:21] Q: Where is Alkaya?
A: Name: Ali Fuat ALKAYA
   Office: M2-249
   ...

Trace Logs

Detailed execution logs are written to logs/run-YYYYMMDD-HHMMSS.jsonl:

{"timestamp": "2025-11-27T08:14:21.992447Z", "stage": "IntentDetector", "input": "where is alkaya", "output": "StaffLookup", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.992447Z", "stage": "QueryWriter", "input": "Intent: StaffLookup", "output": "['alkaya', 'staff', ...]", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.993447Z", "stage": "Retriever", "input": ["alkaya", "staff", ...], "output": "Hits found: 6", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.993447Z", "stage": "Reranker", "input": "Input Hits: 6", "output": "Top Score: 170.0", "durationMs": 0}
{"timestamp": "2025-11-27T08:14:21.993447Z", "stage": "AnswerAgent", "input": "Top Hit: staff", "output": "Name: Ali Fuat ALKAYA...", "durationMs": 0}

Each log entry contains:

timestamp: ISO 8601 format
stage: Pipeline stage name
input: Input to the stage
output: Output from the stage
durationMs: Execution time in milliseconds

12. Differences from Java Version

While maintaining the same architecture and logic, the Python implementation uses language-appropriate idioms:

Data Structures: Python dict, list, set instead of Java collections
Enums: Python enum.Enum instead of Java enums
Interfaces: Python ABC (Abstract Base Classes) instead of Java interfaces
File I/O: Python's open() with context managers instead of Java's BufferedReader/Writer
Configuration: PyYAML library instead of Jackson
JSON: Python's built-in json module instead of Jackson
Exceptions: ValueError/TypeError for validation instead of custom exceptions
Type Hints: Python type annotations for better code documentation

13. Troubleshooting

Common Issues

Problem: ModuleNotFoundError: No module named 'yaml'

# Solution: Install dependencies
pip install -r requirements.txt

Problem: FileNotFoundError: config.yaml not found

# Solution: Ensure you're running from the project directory
cd python_version
python main.py --config config.yaml --q "your question"

Problem: Tests fail with import errors

# Solution: Install pytest and other test dependencies
pip install pytest pytest-cov

Problem: Coverage report shows "No data was collected"

# Solution: Run tests without the --cov=src flag
# The project structure doesn't use a 'src' directory
pytest tests/ --cov=. --cov-report=term-missing --cov-report=html

Problem: google.generativeai errors or API quota exceeded

Explanation: Gemini API requires a valid API key and has rate limits
Solution:
1. Verify your API key is set correctly in env.env
2. Check your API quota at https://aistudio.google.com/
3. Consider using the template answer agent instead:
```
strategies:
  answerAgent: "template"
```

Problem: Zero duration in logs

Explanation: Operations are very fast (< 1ms), so they round to 0
Note: This is expected behavior for the baseline implementation

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.idea		.idea
__pycache__		__pycache__
config		config
entities		entities
helpers		helpers
logs		logs
service_interfaces		service_interfaces
services		services
tests		tests
trace		trace
.gitignore		.gitignore
CSE3063F25_Grp15_Iter2_7_CLI_output.txt		CSE3063F25_Grp15_Iter2_7_CLI_output.txt
ENV_SETUP_GUIDE.md		ENV_SETUP_GUIDE.md
README.md		README.md
chunks.json		chunks.json
config.yaml		config.yaml
env.env.template		env.env.template
eval_queries.json		eval_queries.json
index.json		index.json
main.py		main.py
requirements.txt		requirements.txt

Nuraddin0/RAG-LLM-mini-chatbot

Folders and files

Latest commit

History

Repository files navigation

Marmara CSE RAG System (Python Version - Iteration 2)

1. Overview

Iteration 2 Highlights:

Core Architecture Components:

2. Directory Structure

3. Configuration Schema (config.yaml)

4. Commands (How to Run)

Prerequisites

Quick Setup

1. Install Dependencies

2. Configure API Key

Single Query Mode

Syntax

Example

Batch Evaluation Mode

Syntax

Example

Usage Examples

Scenario 1: Staff Lookup (Single Query)

Scenario 2: Course Information (Single Query)

Scenario 3: Batch Evaluation

5. Evaluation Query Format

Field Descriptions:

6. Evaluation Metrics

Intent Accuracy

Coverage@k

Latency

Per-Intent Breakdown

7. Iteration 2 New Features

7.1 Gemini AI Answer Agent

7.2 Multiple Reranking Strategies

7.3 Batch Evaluation Framework

7.4 Trace Logging

8. Installation

Step 1: Clone or Download the Project

Step 2: Install Python Dependencies

Step 3: Set Up Google API Key

Get Your API Key

Configure in env.env File

Step 4: Verify Installation

Step 5: Run a Test Query

Step 6: (Optional) Run Batch Evaluation

9. Pipeline Architecture

Design Patterns Used

Stage Details

1. Intent Detection

2. Query Writing

3. Retrieval

4. Reranking

5. Answer Generation

10. Design Patterns & Principles

GRASP Patterns

SOLID Principles

Design Patterns Used

11. Output & Logging

Answer Output

Trace Logs

12. Differences from Java Version

13. Troubleshooting

Common Issues

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 5

Uh oh!

Languages