Skip to content

An Agentic AI system that automatically analyzes Google Analytics (GA4) traffic data from Excel, computes YOY and Month-over-Month metrics, and generates strictly formatted, non-hallucinated business summaries using Groq LLM.

Notifications You must be signed in to change notification settings

HarshitWaldia/WebTraffic-Analysis-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š WEBTraffic Analysis Agent

Python Groq LangChain LangGraph Gradio dotenv Agentic AI Status


Agentic GA4 Traffic Analysis with Exact-Format LLM Summaries

An Agentic AI system that automatically analyzes Google Analytics (GA4) traffic data from Excel, computes YOY and Month-over-Month metrics, and generates strictly formatted, non-hallucinated business summaries using Groq LLM.

This project demonstrates safe, production-ready LLM integration by clearly separating deterministic computation from generative reasoning.


๐Ÿš€ Key Features

  • โœ… Multi-table GA4 Excel parsing
  • โœ… Automatic YOY % and LM % calculations
  • โœ… State-driven agent workflow (INIT โ†’ COMPLETE)
  • โœ… LLM-generated summaries in exact predefined format
  • โœ… Anti-hallucination design (LLM reasoning only)
  • โœ… Professional, styled Excel output
  • โœ… Fallback logic if LLM is unavailable

๐Ÿง  Why This Project?

GA4 (Google Analytics 4) traffic reports often:

  • Contain multiple tables in a single Excel sheet
  • Require manual calculations for YOY and MoM
  • Depend on human-written summaries, leading to inconsistency

LLMs alone cannot be trusted with numerical accuracy.

๐Ÿ‘‰ This project solves that by:

  • Using Python for all calculations
  • Using LLM only for explanation
  • Enforcing strict output formats
  • Producing business-ready reports

๐Ÿ—๏ธ System Architecture

High-Level Flow

Excel Input (GA4 Data)
        โ”‚
        โ–ผ
Traffic Analysis Agent (State Machine)
        โ”‚
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚               โ”‚
 โ–ผ               โ–ผ
Table Parser   Metrics Engine
 โ”‚               โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ–ผ
LLM Summary Agent (Groq โ€“ LLaMA 3.3)
        โ”‚
        โ–ผ
Styled Excel Report (Output)

Architectural Principles

  • Agentic orchestration via state machine
  • Deterministic math, generative reasoning
  • Fail-safe design with fallback summaries

๐Ÿงฉ Agent Workflow (State Machine)

State Description
INIT Initialize agent and config
LOAD_DATA Load Excel GA4 data
PARSE_TABLES Detect and parse multiple tables
CALCULATE_METRICS Compute YOY %, LM %, totals
GENERATE_SUMMARIES LLM-based reasoning
CREATE_OUTPUT Styled Excel report
COMPLETE Successful execution

๐Ÿ“ Metrics Calculation Logic

Year-over-Year (YOY %)

(Current Year โˆ’ Previous Year) / Previous Year ร— 100

Last Month (LM %)

(Current Month โˆ’ Previous Month) / Previous Month ร— 100

Special Handling

  • Totals calculated only till August
  • YOY applied to Total row
  • % Change row auto-updated
  • All numeric values validated before use

๐Ÿค– LLM Integration (Groq)

  • Model: llama-3.3-70b-versatile
  • Purpose: Narrative reasoning only
  • Temperature: 0.2
  • No calculations performed by LLM

Summary Format (Strict)

  • Reasoning (Aug vs July โ€“ 1 sentence)
  • Year-over-Year comparison
  • Trend analysis
  • Important notes

Anti-Hallucination Controls

  • Explicit data injection into prompt
  • Strict formatting rules
  • Low temperature
  • Python-computed values only

๐Ÿ“„ Output Report

The final Excel report includes:

  • ๐Ÿ“Œ Calculated GA4 tables
  • ๐Ÿ“Œ LLM summaries placed above each table
  • ๐Ÿ“Œ Color-coded trends
  • ๐Ÿ“Œ Business-ready formatting

Ideal for:

  • Stakeholder reviews
  • Management presentations
  • Monthly analytics reporting

๐Ÿ› ๏ธ Tech Stack

  • Python
  • Pandas / NumPy
  • OpenPyXL
  • Groq API
  • LLaMA 3.3 (70B)
  • Excel (GA4 Data)

โš™๏ธ How It Works (Step-by-Step)

1๏ธโƒฃ Input Ingestion

  • The agent loads a GA4 Excel file containing multiple traffic tables
  • No assumptions about table count or placement

2๏ธโƒฃ Agentic Orchestration (State Machine)

  • The system progresses through explicit states
  • Each state has one responsibility
  • Failures are isolated and debuggable

This avoids tightly coupled scripts and makes the system predictable.


3๏ธโƒฃ Intelligent Table Parsing

  • Tables are detected dynamically using "Month" headers
  • Supports inconsistent Excel formatting
  • Duplicate columns handled safely

4๏ธโƒฃ Deterministic Metrics Engine (Python)

All calculations are done before invoking the LLM:

  • YOY %
  • Last Month %
  • Totals till August
  • % Change rows

โœ” Reproducible โœ” Accurate โœ” Excel-equivalent


5๏ธโƒฃ Controlled LLM Reasoning Layer

  • LLM receives pre-computed data only
  • Generates business summaries
  • Enforced exact format

โŒ No math โŒ No data invention


6๏ธโƒฃ Professional Output Generation

  • Summaries placed above each table
  • Styled Excel formatting
  • Stakeholder-ready output

flowchart TD
    Start([Start]) --> A[๐Ÿ“Š GA4 Excel Input<br/>Multiple Traffic Tables]
    
    A --> B{๐Ÿค– Traffic Analysis Agent<br/>State Machine}
    
    B -->|State 1| C[๐Ÿ“ฅ LOAD_DATA]
    C -->|pandas.read_excel| C1[Validate Structure]
    C1 --> C2[Load All Sheets]
    
    C2 -->|State 2| D[๐Ÿ” PARSE_TABLES]
    D --> D1[Detect Month Headers]
    D1 --> D2[Extract Table Ranges]
    D2 --> D3[Separate Multiple Tables]
    
    D3 -->|State 3| E[๐Ÿ“ CALCULATE_METRICS]
    E --> E1[๐Ÿ”ข YOY % Calculation<br/>Current vs Last Year]
    E --> E2[๐Ÿ“Š LM % Calculation<br/>Current vs Last Month]
    E --> E3[โž• Total Rows<br/>Sum + % Change]
    
    E1 & E2 & E3 -->|State 4| F[๐Ÿง  GENERATE_SUMMARIES]
    
    F --> G[โ˜๏ธ Groq API Call]
    G --> G1[LLaMA 3.3 70B Model]
    G1 --> G2[Prompt: Exact Format Instructions]
    
    G2 --> H[๐Ÿ“ Structured Summary]
    H --> H1[๐Ÿ’ก Reasoning Section]
    H --> H2[๐Ÿ“ˆ YOY Analysis]
    H --> H3[๐Ÿ“‰ Trend Observations]
    H --> H4[๐Ÿ“Œ Key Notes]
    
    H1 & H2 & H3 & H4 -->|State 5| I[๐Ÿ“ฆ CREATE_OUTPUT]
    
    I --> I1[Build Excel Workbook]
    I1 --> I2[Apply Cell Styling]
    I2 --> I3[Format Headers]
    I3 --> I4[Add Summary Sections]
    
    I4 --> J[๐Ÿ’พ Styled Excel Output<br/>Professional Report]
    
    J --> K([โœ… COMPLETE])
    
    %% Styling
    classDef input fill:#E3F2FD,stroke:#1E88E5,stroke-width:3px
    classDef orchestrator fill:#E8F5E9,stroke:#2E7D32,stroke-width:3px
    classDef processing fill:#FFFDE7,stroke:#F9A825,stroke-width:2px
    classDef metrics fill:#F3E5F5,stroke:#6A1B9A,stroke-width:2px
    classDef llm fill:#FCE4EC,stroke:#C2185B,stroke-width:2px
    classDef output fill:#E0F2F1,stroke:#00695C,stroke-width:2px
    classDef terminal fill:#ECEFF1,stroke:#37474F,stroke-width:3px
    
    class Start,A input
    class B orchestrator
    class C,C1,C2,D,D1,D2,D3 processing
    class E,E1,E2,E3 metrics
    class F,G,G1,G2,H,H1,H2,H3,H4 llm
    class I,I1,I2,I3,I4,J output
    class K terminal
Loading

๐Ÿง  Why This Works (Design Rationale)

โœ… Separation of Concerns

Responsibility Component
Math & Logic Python
Orchestration Agent State Machine
Reasoning LLM
Presentation Excel Builder

โœ… Anti-Hallucination Architecture

LLMs explain results โ€” they do not compute them.

  • Data is injected directly
  • Strict formatting rules
  • Low temperature
  • Fallback summaries

โœ… Agentic, Not Scripted

  • Dynamic table detection
  • Context-aware summaries
  • Multi-stage decision flow
  • Failure-tolerant execution

โœ… Production-Ready

  • Deterministic execution
  • Observable states
  • Scalable to more tables
  • Business-aligned output

โš™๏ธ Setup & Usage

1๏ธโƒฃ Install Dependencies

pip install pandas numpy openpyxl python-dotenv

2๏ธโƒฃ Set Environment Variable

export GROQ_API_KEY=your_api_key_here

(Windows)

set GROQ_API_KEY=your_api_key_here

3๏ธโƒฃ Run the Agent

python agentic_main.py

๐Ÿ” Fallback Behavior

If no Groq API key is provided:

  • The agent still runs
  • Summaries are generated using deterministic fallback logic
  • No execution failure

๐Ÿ“Œ What This Project Demonstrates

  • Real-world Agentic AI design
  • Safe and responsible LLM usage
  • Production-grade data automation
  • Explainable AI workflows
  • Clean separation of logic and reasoning

๐Ÿ“ง Contact

Have questions or suggestions?


๐ŸŒŸ Star History

If you find this project helpful, please consider giving it a star! โญ


๐Ÿ‘จโ€๐Ÿ’ป Author

Harshit Waldia

Ahaแน Brahmฤsmi | เค…เคนเค‚ เคฌเฅเคฐเคนเฅเคฎเคพเคธเฅเคฎเคฟ

The true self is not the body but an eternal, infinite part of the universe

About

An Agentic AI system that automatically analyzes Google Analytics (GA4) traffic data from Excel, computes YOY and Month-over-Month metrics, and generates strictly formatted, non-hallucinated business summaries using Groq LLM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages