This repository contains my learning project from the IBM Python Project for Data Engineering. The goal of this project is to build a complete ETL (Extract–Transform–Load) pipeline that processes ranking data of the world’s largest banks by market capitalization in August 2023.
The pipeline performs:
- Extraction from html page
- Transformation including currency conversion and table standardization
- Loading into a SQLite database
- Logging of each major step
This learning project demonstrates my understanding of foundational Data Engineering concepts: data ingestion, transformation logic, SQL database loading, and operational logging.
- Python 3
- Pandas
- SQLite
- Numpy
- Requests
- bs4 BeautifulSoup
- CSV
- Logging module
data-engineering-learning-project-banks-etl/
│
├── data/ # Raw dataset and output database
├── src/ # ETL Python script
├── logs/ # Log history of the ETL run
├── diagrams/ # Architecture diagram
└── README.md # Project documentation
- Clone the repository:
git clone https://github.com/<your-username>/data-engineering-learning-project-banks-etl.git - Navigate to the project:
cd data-engineering-learning-project-banks-etl/src - Install required depencies:
pip install -r <dependency_name> - Run the ETL pipeline:
python banks_project.py - Check the output:
- SQLite database:
../data/Banks.db - Logs:
../logs/code_log.txt
- SQLite database:
The final database ../data/Banks.db contains a fully transformed table of bank market capitalization data with multiple currency conversions.
Repositori ini berisi project pembelajaran saya dari IBM Python Project for Data Engineering di Coursera. Tujuan project ini adalah membangun pipeline ETL (Extract–Transform–Load) yang memproses data peringkat bank terbesar di dunia berdasarkan kapitalisasi pasar (tahun 2023).
Pipeline ini melakukan:
- Extract dari halaman html
- Transform berupa konversi nilai tukar dan standardisasi tabel
- Load ke database SQLite
- Logging setiap proses penting
Project ini menunjukkan pemahaman saya mengenai konsep dasar Data Engineering: data ingestion, transformasi data, loading database, dan pencatatan proses.
- Python 3
- Pandas
- SQLite
- Numpy
- Requests
- bs4 BeautifulSoup
- CSV
- Logging module
data-engineering-learning-project-banks-etl/
│
├── data/ # Raw dataset and output database
├── src/ # ETL Python script
├── logs/ # Log history of the ETL run
├── diagrams/ # Architecture diagram
└── README.md # Project documentation
- Kloning repositori:
git clone https://github.com/<your-username>/data-engineering-learning-project-banks-etl.git - Navigasi Proyek:
cd data-engineering-learning-project-banks-etl/src - Instalasi dependensi terkait:
pip install -r <dependency_name> - Jalankan pipeline ETL:
python banks_project.py - Cek output:
- SQLite database:
../data/Banks.db - Logs:
../logs/code_log.txt
- SQLite database:
Database ../data/Banks.db berisi tabel yang telah dibersihkan, dikonversi, dan siap digunakan untuk analisis data lebih lanjut.
This project uses MIT License. Feel free to use and develop for learning purpose.
Muhammad Rafi Akbar, S.Kom.
Jr. Data Analyst | Aspiring Data Engineer
LinkedIn
GitHub
Personal Web
