This repository provides a production-grade SQL Server data warehouse designed to be more than just a data dump — it's an engineered system built to be the single source of truth for enterprise analytics.
The primary objective is to offer a reference architecture that demonstrates best practices in enterprise data warehousing: from ingestion and transformation to orchestration, governance, and operational integrity. Built for scale and designed for clarity, this project can serve both as a blueprint and as a starting point for real-world deployments.
This project is shaped by modern data engineering principles to support long-term agility, robustness, and team collaboration.
-
Scalability & Performance
The system is built to handle growing data volumes and complex analytical queries with ease. The architecture prioritizes reliability in the early layers and speeds up access in the refined layers through proper indexing, data typing, and aggregation strategies. -
Maintainability & Collaboration
A clean directory structure, modular SQL, and separated configuration files ensure the codebase is easy to maintain and built for teamwork. -
Governance & Data Integrity
- Medallion Architecture: Data flows through structured Bronze → Silver → Gold layers, enabling clear lineage and traceability.
- Fail-Fast Orchestration: Pipelines halt on failure to prevent partial loads and preserve data integrity.
-
Security & Config Management
- Separation of Concerns: Environment-specific configs (e.g., file paths, credentials) are isolated using
config.local.sql, which is excluded from version control. This pattern ensures security without compromising reproducibility.
- Separation of Concerns: Environment-specific configs (e.g., file paths, credentials) are isolated using
/
├── README.md → Project overview (this file)
├── .gitignore → Ignores local-only or sensitive files
├── config.template.sql → Template for local config setup
│
├── /data/ → Raw CSVs (excluded from Git)
├── /docs/ → Architecture diagrams, glossaries, etc.
│
├── /scripts/
│ ├── 0\_setup/ → DB creation, schema init
│ ├── 1\_bronze/ → Raw data layer scripts
│ ├── 2\_silver/ → Cleaned data layer scripts
│ └── 3\_gold/ → Aggregated, business-ready data
│
├── /utils/ → Reusable helper functions & procs
├── /tests/ → Data quality checks & validations
└── /orchestration/ → Master pipeline execution scripts
Each data layer (bronze, silver, gold) includes its own README.md.
- Clone the repository.
- Copy
config.template.sql→config.local.sqland edit it with your local paths and secrets.
- Execute the scripts in
/scripts/0_setup/to create the database, schemas, and helper tables. - Run your
config.local.sqlscript to populate the file list configuration. - Execute the DDL scripts for each layer (e.g.,
ddl_bronze.sql,ddl_silver.sql).
Execute the procedure scripts for each layer (e.g., proc_load_bronze.sql) to create the stored procedures in the database.
- Run the master script located at
/scripts/orchestration/master_run_pipeline.sqlto execute the end-to-end ELT process. - Monitor the output and check the
dbo.etl_logtable to verify the results of the run.
- Status: Bronze & Silver & Gold Layers Complete.
This project demonstrates a deep understanding of scalable data warehousing architecture, ELT pipelines, and engineering best practices. If you're looking for someone to bring clarity and robustness to your data stack, let's connect.
Kaoutar EL Ouahabi
📧 kaoutar.elouahabi.de@gmail.com