Skip to content

SQL-based fraud analysis of a 6M+ financial transaction dataset using PostgreSQL. The project follows audit-style workflows, preserves raw data integrity, applies performance indexing, and identifies systemic fraud patterns, balance inconsistencies, and high-risk transaction channels.

Notifications You must be signed in to change notification settings

rajab-bett-analytics/financial-transaction-fraud-analysis-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Financial Transaction Fraud Analysis (SQL)

Overview

This project analyzes a large-scale financial transaction dataset to identify confirmed fraud patterns and systemic control weaknesses using SQL. The analysis follows professional banking and audit workflows, emphasizing data integrity, reproducibility, and performance-aware design.

All findings are reproducible through documented SQL scripts without storing or exposing sensitive query outputs.


Objectives

  • Analyze confirmed fraudulent transactions
  • Identify high-risk transaction types
  • Detect balance and ledger inconsistencies
  • Surface systemic fraud patterns and control gaps

Dataset

  • Simulated financial transaction data (PaySim-style)
  • Approximately 6.3 million transaction records
  • Each row represents a single system processing step
  • Includes transaction metadata, balances, counterparties, and fraud indicators

Dataset Access

The original dataset file is not included in this repository due to GitHub file size limitations.
The data is based on the PaySim financial transaction simulation dataset and can be obtained externally.

All analysis logic is fully reproducible using the provided SQL scripts.

Technical Approach

  • PostgreSQL used for all analysis
  • Raw data preserved without modification
  • Clean analytical view created for reproducibility
  • Indexes applied to optimize performance on large datasets
  • SQL scripts serve as the authoritative analytical record

Project Structure

financial-transaction-analysis/ ├── data/ │ └── paysim1.csv ├── sql/ │ ├── 01_create_views.sql │ ├── 02_indexes.sql │ ├── 03_analysis.sql │ ├── 04_fraud_analysis.sql │ └── 05_audit_red_flags.sql ├── reports/ │ └── insights.md └── README.md


Key Findings

  • Fraud is highly concentrated in TRANSFER and CASH_OUT transactions
  • Confirmed fraud frequently violates basic balance integrity rules
  • Fraud occurs in repeated high-value bursts rather than isolated events
  • Funds are concentrated into a small number of destination accounts

Skills Demonstrated

  • Advanced SQL querying and aggregation
  • Performance optimization using indexes
  • Fraud and risk analysis
  • Audit-style documentation
  • Large dataset handling and governance awareness

Reproducibility and Data Governance

Query outputs are not stored or exported. All findings are reproducible by executing the documented SQL scripts against the dataset. This approach reflects standard practices in regulated financial and audit environments.


Methodology

  1. Load raw data into PostgreSQL
  2. Create a cleaned analytical view
  3. Build performance indexes
  4. Conduct baseline analysis
  5. Explore fraud patterns and red flags
  6. Document findings and recommendations

Getting Started

  1. Load data/paysim1.csv into PostgreSQL as transactions_raw
  2. Run 01_create_views.sql in pgAdmin
  3. Run index scripts (02_indexes.sql)
  4. Execute analysis scripts in order

Disclaimer

This project uses simulated data for educational and portfolio purposes only.

About

SQL-based fraud analysis of a 6M+ financial transaction dataset using PostgreSQL. The project follows audit-style workflows, preserves raw data integrity, applies performance indexing, and identifies systemic fraud patterns, balance inconsistencies, and high-risk transaction channels.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published