pysaprk

Here are 12 public repositories matching this topic...

DHANA5982 / Azure-Powered-Data-Lakehouse-and-ETL-Pipeline

End-to-end data pipeline transforming Olist e-commerce data through Azure cloud services. Implements medallion architecture (Bronze-Silver-Gold) with multi-source ingestion, Spark-based processing, and OLTP-to-OLAP optimization for analytics-ready datasets.

distributed-systems ecommerce apache-spark distributed-computing data-engineering databricks etl-pipeline big-data-processing pysaprk kpi-dashboard azure-synapse-analytics parellel-processing medallion-architecture azure-data-lake-storage-gen2 azure-data-factoty data-pipeline-automation

Updated Nov 4, 2025
Jupyter Notebook

LalitSharma7 / F1-Data-Analysis

Star

Project based on application of azure databricks

azure databricks pysaprk pyspark-sql

Updated Mar 7, 2023
Python

yhskgo / pyspark_deep_learning

Star

spark pysaprk

Updated Oct 8, 2019
Jupyter Notebook

riju18 / apache-iceberg-kickstart

Star

docker sql s3 python3 minio zeppelin datalake dremio pysaprk apache-iceberg nessie datalakehouse

Updated Jun 30, 2025

miltiadiss / CEID_NE4348-Big-Data-Management-Systems

Star

This project implements a real-time data pipeline with Kafka, Spark, and MongoDB. It generates vehicle data using UXSIM, streams it to a Kafka broker, processes it with Spark, and stores raw and processed data in MongoDB. Queries analyze vehicle counts, speeds, and routes over specified periods.

pymongo kafka-consumer kafka-producer spark-sql pysaprk uxsim

Updated Mar 18, 2025
Python

victorlifan / Sparkify--Pyspark-Big-Data-Project

Star

This project performed data wrangling, analysis, visualization as well as machine learning prediction on a hypothetical music app's user churn with pyspark.

machine-learning spark data-visualization pysaprk

Updated Mar 22, 2022
Jupyter Notebook

Munanga / Pyspark-Analysis

Star

Sample analysis done using pyspark on parking violations issued for fiscal year 2017 using the databricks platform

python csv big-data spark analysis spark-sql pysaprk

Updated Feb 1, 2020
HTML

SA01 / spark-english-api-tutorial

Star

ontains the code and examples for my article on Medium, which introduces the English SDK for Apache Spark, showcasing how to combine the power of Apache Spark with large language models (LLMs)

python big-data spark analytics data-engineering spark-sql pysaprk llm generative-ai

Updated Oct 25, 2024
Python

adharangaonkar / ETL-Pipelines

Star

A repository concentrating on using High end parallel pipelines to perform ETL across various data sources

spark etl postgresql aws-ec2 etl-pipeline redshift-cluster pysaprk

Updated Sep 23, 2021
Jupyter Notebook

jpoberhauser / dist_comp_final

Star

NBA shot predictions with PySpark and SparkML

machine-learning pysaprk

Updated Dec 18, 2018
Jupyter Notebook

johngodoi / learning_pyspark

Star

linkedinlearning pysaprk

Updated Jan 21, 2022
Jupyter Notebook

daemon966 / End-to-End-ETL-Pipeline-for-Sales-Analytics

Star

1.This is an end-to-end ETL pipeline to ingest data from AWS S3, validate it, process transformations using PySpark, and load final outputs into MySQL fact and dimension tables. 2.Designed a star schema and created optimized data marts using PySpark SQL and parquet processing. 3.Automated S3 operations (download, upload, archival) and implemented a

python big-data s3-bucket data-engineering mysql-database big pysaprk

Updated Dec 10, 2025
Python

Improve this page

Add a description, image, and links to the pysaprk topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pysaprk topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pysaprk

Here are 12 public repositories matching this topic...

DHANA5982 / Azure-Powered-Data-Lakehouse-and-ETL-Pipeline

LalitSharma7 / F1-Data-Analysis

yhskgo / pyspark_deep_learning

riju18 / apache-iceberg-kickstart

miltiadiss / CEID_NE4348-Big-Data-Management-Systems

victorlifan / Sparkify--Pyspark-Big-Data-Project

Munanga / Pyspark-Analysis

SA01 / spark-english-api-tutorial

adharangaonkar / ETL-Pipelines

jpoberhauser / dist_comp_final

johngodoi / learning_pyspark

daemon966 / End-to-End-ETL-Pipeline-for-Sales-Analytics

Improve this page

Add this topic to your repo