Skip to content
#

pysaprk

Here are 12 public repositories matching this topic...

End-to-end data pipeline transforming Olist e-commerce data through Azure cloud services. Implements medallion architecture (Bronze-Silver-Gold) with multi-source ingestion, Spark-based processing, and OLTP-to-OLAP optimization for analytics-ready datasets.

  • Updated Nov 4, 2025
  • Jupyter Notebook

1.This is an end-to-end ETL pipeline to ingest data from AWS S3, validate it, process transformations using PySpark, and load final outputs into MySQL fact and dimension tables. 2.Designed a star schema and created optimized data marts using PySpark SQL and parquet processing. 3.Automated S3 operations (download, upload, archival) and implemented a

  • Updated Dec 10, 2025
  • Python

Improve this page

Add a description, image, and links to the pysaprk topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pysaprk topic, visit your repo's landing page and select "manage topics."

Learn more