Healthcare Data Project

About

End-to-end data engineering project based on Polish healthcare hospitalization data (21M+ records). Fully self-hosted infrastructure with modular architecture and reproducible pipelines.

Project Documentation

Data Engineering Features

Unit tests (dbt)
Data Dictionary (dbt)
Lineage (Airflow, dbt)

All Components & Tools

Infra

Docker-Compose
- Containers
  - Apache Airflow,
  - Apache Superset
  - Postgres,
  - Oracle,
  - my Utils
Dockerfile (Utils container)
- Content
  - DBT
  - Liquibase
  - SSH
  - DB Clients
Linux Debian

Orchestration

Apache Airflow
- DAG #1, Pipeline
  - SSHOperators
  - BranchOperators
  - PythonOperators
  - etc ETL
Extraction & Load, PySpark
Transformation,
- dbt
- Postgres

Data Layers

Raw CSV, original flat files
Stage, raw data loaded into Postgres
Core models, dbt transformations
- Bronze, initial cleaning & standardization
- Silver, cleaned & structured models
- Gold, aggregated or business-ready models

Analyzing & Visualisation

File Validation
- JupyterNotebook & bash
Exploratory Data Analysis
- First EDA, explore raw CSV (JupiterNb, DuckDb, SQL)
- Second EDA, after loading into DWH, histograms (JupiterNb, Pandas, Seaborn)
Dashboard
- Apache Superset

DWH

Stage: Postgres

CI/CD & Automation

Makefile
GIT
- hooks pre-commit & pre-push
Liquibase

Scripts

SQL scripts
Python
bash

References

Books [ PL ]

Inżynieria danych w praktyce, J. Reis
Zaawansowana Analiza Danych, G. Mount
SQL for Data Analysis, C. Tanimura

Video courses

DBT 1x
Airflow 2x

Others :)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.old		.old
airflow		airflow
app		app
docs		docs
eda		eda
img/utils		img/utils
.env		.env
.gitignore		.gitignore
Makefile		Makefile
docker-compose.yml		docker-compose.yml
etl.code-workspace		etl.code-workspace
notes.md		notes.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Data Project

About

Project Documentation

Data Engineering Features

All Components & Tools

Data Layers

Analyzing & Visualisation

References

About

Uh oh!

Releases

Packages

Languages

szwrk/nfz-hosp-mds

Folders and files

Latest commit

History

Repository files navigation

Healthcare Data Project

About

Project Documentation

Data Engineering Features

All Components & Tools

Data Layers

Analyzing & Visualisation

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages