Skip to content

Comparative study of two Agentic AI architectures for automated data science: hidden-tool agents vs transparent code-generating agents. Built with CrewAI, OpenAI GPT-4o, tested on Titanic & House Prices datasets.

Notifications You must be signed in to change notification settings

bechir23/Agentic-AI-Data-Science-Assistant-

Repository files navigation

What This Project Is About

During this practical work, I explored how AI agents can automate data analysis tasks. I built and tested two different approaches to see which one works better for real-world data science problems. Think of it as having virtual data science assistants that can handle everything from data exploration to model training and report writing.

I used the famous Titanic dataset (predicting passenger survival) and a house pricing dataset to put both systems through their paces. The goal was simple: let the AI agents do the heavy lifting while I evaluate how well they perform and where they struggle.

The Two Systems I Tested

System 1: The Behind-the-Scenes Approach

This system works like a traditional pipeline with four specialized agents working in sequence. Each agent has specific tools at their disposal, but all the Python code runs in the background where you can't see it.

How it works:

  • Project Planner: Analyzes the business problem and decides on an approach
  • Data Analyst: Explores the dataset using pandas (you get statistics, but don't see the actual code)
  • Modeler: Trains machine learning models with scikit-learn
  • Report Writer: Puts everything together into a nice LaTeX report

The good parts:

  • Super easy to use - just run it and wait for results
  • Works autonomously without needing much intervention
  • Produces clean, professional reports

The not-so-good parts:

  • You can't see what's happening under the hood
  • If something goes wrong, it's hard to debug
  • Sometimes it makes mistakes (like including ID columns in training) and you won't notice until you check the results carefully

Files to run:

python main_classification.py    # For Titanic survival prediction
python main_regression.py        # For house price prediction

System 2: The Transparent Code Generator

This one takes a completely different approach. Instead of hiding everything, it generates Python code that you can actually read, modify, and reuse. It's like having a coding buddy who writes the analysis for you.

How it works:

  • Code Planner: Figures out what code needs to be written
  • Code Generator: Actually writes complete Python scripts
  • Code Executor: Runs the code and checks for errors
  • Results Interpreter: Explains what the results mean

The good parts:

  • Total transparency - you see every line of code
  • Can fix itself when it hits errors (I watched it correct 4 mistakes autonomously!)
  • You can extract the code and use it for other projects
  • Great for learning because you see the methodology

The not-so-good parts:

  • Uses more API tokens because it generates longer responses
  • Quality depends on how well you describe what you want
  • Takes longer to run because of the self-correction iterations

Files to run:

python main_code_interpreter.py classification    # For Titanic
python main_code_interpreter.py regression        # For house prices

What I Actually Discovered

The PassengerId Bug

Both systems initially made the same rookie mistake: they included the PassengerId column (just a number from 1 to 891) in the training features. This created fake correlations and inflated the accuracy scores. System 2 made it way easier to spot this bug because I could literally read the code line by line. With System 1, I had to dig through tool outputs to figure out what was happening.

Self-Correction (Self-healing)

The coolest thing I observed was System 2's ability to debug itself. During one test, it hit four errors in a row:

  1. Syntax error with a broken f-string
  2. Warning about escape sequences
  3. Tried to extract titles from the Name column... after already deleting it
  4. Finally figured out it needed to extract titles BEFORE dropping columns

Each iteration consumed API tokens, but watching an AI agent reason through its mistakes and fix them was genuinely impressive.

API Limits and Costs

I'm using OpenAI GPT-4o for this project, which doesn't have the strict rate limits that free services have. However, I did initially try Groq's free tier (100k tokens/day) and hit the limit pretty quickly - a single run with the self-correction iterations consumed about 40k tokens!

For production use or if you want to avoid API costs entirely, switching to Ollama with a local model would be the way to go. The code supports all these options through a simple config change in .env.

Quick Start Guide

Prerequisites

You'll need Python 3.12 and an OpenAI API key (I'm using GPT-4o for this project).

Setup

# Clone and navigate to the project
cd TP_Agentic_AI

# Create virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

# Configure your API key
# Edit .env and add your OPENAI_API_KEY
# The project is configured with LLM_MODE=openai by default

Run System 1 (Hidden Tools)

python main_classification.py
# Wait 3-5 minutes, generates outputs/titanic_report.tex

Run System 2 (Visible Code)

python main_code_interpreter.py classification
# Takes longer (5-10 min) but shows all code generation
# Generates outputs/titanic_code_report.tex

Compile Reports to PDF

# Using WSL with pdflatex installed
wsl pdflatex -interaction=nonstopmode outputs/titanic_report.tex

Project Structure

TP_Agentic_AI/
├── agents.py                      # System 1 agents (4 agents with hidden tools)
├── agents_code_interpreter.py     # System 2 agents (code generators)
├── crew_setup.py                  # System 1 task definitions
├── tools.py                       # Python execution tools for both systems
├── llama_llm.py                   # LLM configuration (OpenAI/Groq/Ollama)
├── main_classification.py         # System 1 entry point (Titanic)
├── main_regression.py             # System 1 entry point (House Prices)
├── main_code_interpreter.py       # System 2 entry point (both datasets)
├── data/
│   ├── titanic.csv               # Classification dataset (891 samples)
│   └── house_prices.csv          # Regression dataset (20640 samples)
├── outputs/
│   ├── titanic_report.tex        # System 1 classification report
│   └── titanic_code_report.tex   # System 2 classification report
└── Analysis_Crew_Systems.pdf     # Comparative analysis

LLM Configuration

For this project, I'm using OpenAI GPT-4o as the primary language model. The .env file is configured with:

LLM_MODE=openai
OPENAI_API_KEY=your_key_here

Alternative LLM Options

The system supports multiple LLM providers through llama_llm.py. You can switch by changing LLM_MODE in .env:

Option 1: OpenAI (Current Setup)

LLM_MODE=openai
OPENAI_API_KEY=sk-...
  • Best quality and reliability
  • Costs money but generous rate limits
  • GPT-4o gives excellent results

Option 2: Groq (Free Alternative)

LLM_MODE=groq
GROQ_API_KEY=gsk_...
  • Free tier with 100k tokens/day
  • Fast inference with Llama 3.3 70B
  • Hit rate limits during testing

Option 3: HuggingFace

LLM_MODE=huggingface
HUGGINGFACE_API_KEY=hf_...
  • Access to Llama 3.3 70B Instruct
  • Free tier available
  • Good for experimentation

Option 4: Ollama (Local)

LLM_MODE=ollama
# No API key needed, runs on your machine

Technologies Used

  • CrewAI 1.6.1: Multi-agent orchestration framework
  • OpenAI GPT-4o: Primary LLM used for testing both systems
  • Python 3.12: Core programming language
  • Pandas & Scikit-learn: Data analysis and machine learning
  • LaTeX: Professional report generation

Note: The critical analysis (Analysis_Crew_Systems.pdf) contains a detailed comparison of both systems based on actual test results.

About

Comparative study of two Agentic AI architectures for automated data science: hidden-tool agents vs transparent code-generating agents. Built with CrewAI, OpenAI GPT-4o, tested on Titanic & House Prices datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published