Skip to content

Learning project from IBM Data Engineering Certification. A complete Python ETL pipeline that extracts, transforms, and loads bank market cap data into a SQLite database.

License

Notifications You must be signed in to change notification settings

murafba/data-engineering-learning-project-banks-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📌 ETL Pipeline: Banks by Market Capitalization

Data Engineering Learning Project — Python ETL Pipeline

Learning project from IBM Data Engineering Certification.

Pembelajaran Pipeline ETL — Peringkat Bank Berdasarkan Kapitalisasi Pasar

Data origin in August 2023

Python ETL SQLite IBM Pandas CSV Learning License: MIT

📑 Table of Contents

📊 Dataset

Check the dataset here

🇬🇧 English Version

📘 Project Overview

This repository contains my learning project from the IBM Python Project for Data Engineering. The goal of this project is to build a complete ETL (Extract–Transform–Load) pipeline that processes ranking data of the world’s largest banks by market capitalization in August 2023.

The pipeline performs:

  • Extraction from html page
  • Transformation including currency conversion and table standardization
  • Loading into a SQLite database
  • Logging of each major step

This learning project demonstrates my understanding of foundational Data Engineering concepts: data ingestion, transformation logic, SQL database loading, and operational logging.

🧱 Project Architecture (ETL Flow)

etl_architecture

⚙️ Technologies Used

  • Python 3
  • Pandas
  • SQLite
  • Numpy
  • Requests
  • bs4 BeautifulSoup
  • CSV
  • Logging module

📁 Repository Structure

data-engineering-learning-project-banks-etl/
│
├── data/        # Raw dataset and output database
├── src/         # ETL Python script
├── logs/        # Log history of the ETL run
├── diagrams/    # Architecture diagram
└── README.md    # Project documentation

▶️ How to Run the Project

  • Clone the repository: git clone https://github.com/<your-username>/data-engineering-learning-project-banks-etl.git
  • Navigate to the project: cd data-engineering-learning-project-banks-etl/src
  • Install required depencies: pip install -r <dependency_name>
  • Run the ETL pipeline: python banks_project.py
  • Check the output:
    • SQLite database: ../data/Banks.db
    • Logs: ../logs/code_log.txt

📊 Output

The final database ../data/Banks.db contains a fully transformed table of bank market capitalization data with multiple currency conversions.


🇮🇩 Versi Bahasa Indonesia

📘 Ringkasan Proyek

Repositori ini berisi project pembelajaran saya dari IBM Python Project for Data Engineering di Coursera. Tujuan project ini adalah membangun pipeline ETL (Extract–Transform–Load) yang memproses data peringkat bank terbesar di dunia berdasarkan kapitalisasi pasar (tahun 2023).

Pipeline ini melakukan:

  • Extract dari halaman html
  • Transform berupa konversi nilai tukar dan standardisasi tabel
  • Load ke database SQLite
  • Logging setiap proses penting

Project ini menunjukkan pemahaman saya mengenai konsep dasar Data Engineering: data ingestion, transformasi data, loading database, dan pencatatan proses.

🧱 Arsitektur Proyek (Alur ETL)

etl_architecture

⚙️ Teknologi yang Digunakan

  • Python 3
  • Pandas
  • SQLite
  • Numpy
  • Requests
  • bs4 BeautifulSoup
  • CSV
  • Logging module

📁 Struktur Repo

data-engineering-learning-project-banks-etl/
│
├── data/        # Raw dataset and output database
├── src/         # ETL Python script
├── logs/        # Log history of the ETL run
├── diagrams/    # Architecture diagram
└── README.md    # Project documentation

▶️ Cara Menjalankan Proyek

  1. Kloning repositori: git clone https://github.com/<your-username>/data-engineering-learning-project-banks-etl.git
  2. Navigasi Proyek: cd data-engineering-learning-project-banks-etl/src
  3. Instalasi dependensi terkait: pip install -r <dependency_name>
  4. Jalankan pipeline ETL: python banks_project.py
  5. Cek output:
    • SQLite database: ../data/Banks.db
    • Logs: ../logs/code_log.txt

📊 Output

Database ../data/Banks.db berisi tabel yang telah dibersihkan, dikonversi, dan siap digunakan untuk analisis data lebih lanjut.


📜 License

This project uses MIT License. Feel free to use and develop for learning purpose.

👨 Author

Muhammad Rafi Akbar, S.Kom.
Jr. Data Analyst | Aspiring Data Engineer
LinkedIn
GitHub
Personal Web

About

Learning project from IBM Data Engineering Certification. A complete Python ETL pipeline that extracts, transforms, and loads bank market cap data into a SQLite database.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages