This project showcases a complete Microsoft Fabric implementation designed to solve common retail data challenges: inconsistent formatting, missing business logic, and slow reporting cycles. By processing over 401,000 transactions, the pipeline delivers a high-fidelity analytics suite tracking $8.76M in revenue with real-time scalability.
I architected the solution using a Medallion (Bronze-Silver-Gold) approach to ensure data governance and a clear "Single Source of Truth".
- Asset:
01_Data_Cleaning_Bronze.ipynb. - Process: Ingested raw CSV data into OneLake.
- Logic: Performed structural cleaning, including standardizing
InvoiceDateformats, handling over 130k missingCustomer_IDrecords, and removing non-product transactions to ensure a clean baseline.
- Asset:
02_Business_Logic_Flagging.ipynb. - Process: Utilized a Hybrid Compute Model (Pandas for agility, Spark for scale).
- Logic: Engineered custom business flags, including Return Transaction detection, High-Value Customer segmenting, and Revenue calculation ().
- Persistence: Converted finalized Pandas DataFrames into Spark Delta Tables to leverage ACID compliance and Fabric Metastore integration.
- Asset:
SQL_Scripts.sql. - Process: Instead of duplicating data, I engineered SQL Views on the Analytics Endpoint.
- Logic: Created complex analytical views to automate Month-over-Month (MoM) Growth and Product Intelligence, allowing for "Schema-on-Read" flexibility.
- Orchestration: Microsoft Fabric Workspace.
- Languages: Python (Pandas/PySpark) & T-SQL.
- Storage: Delta Lake (Parquet).
- Reporting: Power BI with DirectLake Connectivity for sub-second query performance.
- Revenue Optimization: Identified a 336.71% MoM growth surge during peak seasons.
- Market Expansion: Mapped geographical performance showing the UK as the core market ($6.7M) while identifying emerging EU growth bubbles.
- Operational Efficiency: Automated the entire cleaning-to-reporting lifecycle, reducing manual data prep time by an estimated 90%.
├── Notebooks
│ ├── 01_Data_Cleaning_Bronze.ipynb # Sanitization logic
│ └── 02_Business_Logic_Flagging.ipynb # Enrichment logic
├── SQL_Scripts
│ └── View_Definitions.sql # Gold-layer KPI views
├── Reports
│ ├── Retail_Analytics_Final.pbix # Interactive Dashboard
│ └── Screenshots/ # Visual proof of work
└── README.md
- Ingest: Upload raw retail CSV to the
Filessection of a Fabric Lakehouse. - Clean: Run Notebook 01 to generate the
cleaned_retail_datatable. - Enrich: Run Notebook 02 to apply business flags and save the
final_retail_transactionstable. - Visualize: Connect the Power BI
.pbixfile to your Fabric SQL Endpoint.