📊 WEBTraffic Analysis Agent

Agentic GA4 Traffic Analysis with Exact-Format LLM Summaries

An Agentic AI system that automatically analyzes Google Analytics (GA4) traffic data from Excel, computes YOY and Month-over-Month metrics, and generates strictly formatted, non-hallucinated business summaries using Groq LLM.

This project demonstrates safe, production-ready LLM integration by clearly separating deterministic computation from generative reasoning.

🚀 Key Features

✅ Multi-table GA4 Excel parsing
✅ Automatic YOY % and LM % calculations
✅ State-driven agent workflow (INIT → COMPLETE)
✅ LLM-generated summaries in exact predefined format
✅ Anti-hallucination design (LLM reasoning only)
✅ Professional, styled Excel output
✅ Fallback logic if LLM is unavailable

🧠 Why This Project?

GA4 (Google Analytics 4) traffic reports often:

Contain multiple tables in a single Excel sheet
Require manual calculations for YOY and MoM
Depend on human-written summaries, leading to inconsistency

LLMs alone cannot be trusted with numerical accuracy.

👉 This project solves that by:

Using Python for all calculations
Using LLM only for explanation
Enforcing strict output formats
Producing business-ready reports

🏗️ System Architecture

High-Level Flow

Excel Input (GA4 Data)
        │
        ▼
Traffic Analysis Agent (State Machine)
        │
 ┌──────┴────────┐
 │               │
 ▼               ▼
Table Parser   Metrics Engine
 │               │
 └──────┬────────┘
        ▼
LLM Summary Agent (Groq – LLaMA 3.3)
        │
        ▼
Styled Excel Report (Output)

Architectural Principles

Agentic orchestration via state machine
Deterministic math, generative reasoning
Fail-safe design with fallback summaries

🧩 Agent Workflow (State Machine)

State	Description
INIT	Initialize agent and config
LOAD_DATA	Load Excel GA4 data
PARSE_TABLES	Detect and parse multiple tables
CALCULATE_METRICS	Compute YOY %, LM %, totals
GENERATE_SUMMARIES	LLM-based reasoning
CREATE_OUTPUT	Styled Excel report
COMPLETE	Successful execution

📐 Metrics Calculation Logic

Year-over-Year (YOY %)

(Current Year − Previous Year) / Previous Year × 100

Last Month (LM %)

(Current Month − Previous Month) / Previous Month × 100

Special Handling

Totals calculated only till August
YOY applied to Total row
% Change row auto-updated
All numeric values validated before use

🤖 LLM Integration (Groq)

Model: llama-3.3-70b-versatile
Purpose: Narrative reasoning only
Temperature: 0.2
No calculations performed by LLM

Summary Format (Strict)

Reasoning (Aug vs July – 1 sentence)
Year-over-Year comparison
Trend analysis
Important notes

Anti-Hallucination Controls

Explicit data injection into prompt
Strict formatting rules
Low temperature
Python-computed values only

📄 Output Report

The final Excel report includes:

📌 Calculated GA4 tables
📌 LLM summaries placed above each table
📌 Color-coded trends
📌 Business-ready formatting

Ideal for:

Stakeholder reviews
Management presentations
Monthly analytics reporting

🛠️ Tech Stack

Python
Pandas / NumPy
OpenPyXL
Groq API
LLaMA 3.3 (70B)
Excel (GA4 Data)

⚙️ How It Works (Step-by-Step)

1️⃣ Input Ingestion

The agent loads a GA4 Excel file containing multiple traffic tables
No assumptions about table count or placement

2️⃣ Agentic Orchestration (State Machine)

The system progresses through explicit states
Each state has one responsibility
Failures are isolated and debuggable

This avoids tightly coupled scripts and makes the system predictable.

3️⃣ Intelligent Table Parsing

Tables are detected dynamically using "Month" headers
Supports inconsistent Excel formatting
Duplicate columns handled safely

4️⃣ Deterministic Metrics Engine (Python)

All calculations are done before invoking the LLM:

YOY %
Last Month %
Totals till August
% Change rows

✔ Reproducible ✔ Accurate ✔ Excel-equivalent

5️⃣ Controlled LLM Reasoning Layer

LLM receives pre-computed data only
Generates business summaries
Enforced exact format

❌ No math ❌ No data invention

6️⃣ Professional Output Generation

Summaries placed above each table
Styled Excel formatting
Stakeholder-ready output

flowchart TD
    Start([Start]) --> A[📊 GA4 Excel Input<br/>Multiple Traffic Tables]
    
    A --> B{🤖 Traffic Analysis Agent<br/>State Machine}
    
    B -->|State 1| C[📥 LOAD_DATA]
    C -->|pandas.read_excel| C1[Validate Structure]
    C1 --> C2[Load All Sheets]
    
    C2 -->|State 2| D[🔍 PARSE_TABLES]
    D --> D1[Detect Month Headers]
    D1 --> D2[Extract Table Ranges]
    D2 --> D3[Separate Multiple Tables]
    
    D3 -->|State 3| E[📐 CALCULATE_METRICS]
    E --> E1[🔢 YOY % Calculation<br/>Current vs Last Year]
    E --> E2[📊 LM % Calculation<br/>Current vs Last Month]
    E --> E3[➕ Total Rows<br/>Sum + % Change]
    
    E1 & E2 & E3 -->|State 4| F[🧠 GENERATE_SUMMARIES]
    
    F --> G[☁️ Groq API Call]
    G --> G1[LLaMA 3.3 70B Model]
    G1 --> G2[Prompt: Exact Format Instructions]
    
    G2 --> H[📝 Structured Summary]
    H --> H1[💡 Reasoning Section]
    H --> H2[📈 YOY Analysis]
    H --> H3[📉 Trend Observations]
    H --> H4[📌 Key Notes]
    
    H1 & H2 & H3 & H4 -->|State 5| I[📦 CREATE_OUTPUT]
    
    I --> I1[Build Excel Workbook]
    I1 --> I2[Apply Cell Styling]
    I2 --> I3[Format Headers]
    I3 --> I4[Add Summary Sections]
    
    I4 --> J[💾 Styled Excel Output<br/>Professional Report]
    
    J --> K([✅ COMPLETE])
    
    %% Styling
    classDef input fill:#E3F2FD,stroke:#1E88E5,stroke-width:3px
    classDef orchestrator fill:#E8F5E9,stroke:#2E7D32,stroke-width:3px
    classDef processing fill:#FFFDE7,stroke:#F9A825,stroke-width:2px
    classDef metrics fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px
    classDef llm fill:#FCE4EC,stroke:#C2185B,stroke-width:2px
    classDef output fill:#E0F2F1,stroke:#00695C,stroke-width:2px
    classDef terminal fill:#ECEFF1,stroke:#37474F,stroke-width:3px
    
    class Start,A input
    class B orchestrator
    class C,C1,C2,D,D1,D2,D3 processing
    class E,E1,E2,E3 metrics
    class F,G,G1,G2,H,H1,H2,H3,H4 llm
    class I,I1,I2,I3,I4,J output
    class K terminal

🧠 Why This Works (Design Rationale)

✅ Separation of Concerns

Responsibility	Component
Math & Logic	Python
Orchestration	Agent State Machine
Reasoning	LLM
Presentation	Excel Builder

✅ Anti-Hallucination Architecture

LLMs explain results — they do not compute them.

Data is injected directly
Strict formatting rules
Low temperature
Fallback summaries

✅ Agentic, Not Scripted

Dynamic table detection
Context-aware summaries
Multi-stage decision flow
Failure-tolerant execution

✅ Production-Ready

Deterministic execution
Observable states
Scalable to more tables
Business-aligned output

⚙️ Setup & Usage

1️⃣ Install Dependencies

pip install pandas numpy openpyxl python-dotenv

2️⃣ Set Environment Variable

export GROQ_API_KEY=your_api_key_here

(Windows)

set GROQ_API_KEY=your_api_key_here

3️⃣ Run the Agent

python agentic_main.py

🔐 Fallback Behavior

If no Groq API key is provided:

The agent still runs
Summaries are generated using deterministic fallback logic
No execution failure

📌 What This Project Demonstrates

Real-world Agentic AI design
Safe and responsible LLM usage
Production-grade data automation
Explainable AI workflows
Clean separation of logic and reasoning

📧 Contact

Have questions or suggestions?

📧 Email: harshitwaldia112@gmail.com
🐦 Twitter: @HarshitWaldia
💼 LinkedIn: Harshit Waldia
⚙️ GitHub: @HarshitWaldia

🌟 Star History

If you find this project helpful, please consider giving it a star! ⭐

👨‍💻 Author

Harshit Waldia

Ahaṁ Brahmāsmi | अहं ब्रह्मास्मि

The true self is not the body but an eternal, infinite part of the universe

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Output		Output
PPTX		PPTX
.env		.env
README.md		README.md
Traffic-status.xlsx		Traffic-status.xlsx
agentic_main.py		agentic_main.py
n_main.py		n_main.py
requirements.txt		requirements.txt

HarshitWaldia/WebTraffic-Analysis-Agent

Folders and files

Latest commit

History

Repository files navigation