This project analyzes a large-scale financial transaction dataset to identify confirmed fraud patterns and systemic control weaknesses using SQL. The analysis follows professional banking and audit workflows, emphasizing data integrity, reproducibility, and performance-aware design.
All findings are reproducible through documented SQL scripts without storing or exposing sensitive query outputs.
- Analyze confirmed fraudulent transactions
- Identify high-risk transaction types
- Detect balance and ledger inconsistencies
- Surface systemic fraud patterns and control gaps
- Simulated financial transaction data (PaySim-style)
- Approximately 6.3 million transaction records
- Each row represents a single system processing step
- Includes transaction metadata, balances, counterparties, and fraud indicators
The original dataset file is not included in this repository due to GitHub file size limitations.
The data is based on the PaySim financial transaction simulation dataset and can be obtained externally.
All analysis logic is fully reproducible using the provided SQL scripts.
- PostgreSQL used for all analysis
- Raw data preserved without modification
- Clean analytical view created for reproducibility
- Indexes applied to optimize performance on large datasets
- SQL scripts serve as the authoritative analytical record
financial-transaction-analysis/ ├── data/ │ └── paysim1.csv ├── sql/ │ ├── 01_create_views.sql │ ├── 02_indexes.sql │ ├── 03_analysis.sql │ ├── 04_fraud_analysis.sql │ └── 05_audit_red_flags.sql ├── reports/ │ └── insights.md └── README.md
- Fraud is highly concentrated in TRANSFER and CASH_OUT transactions
- Confirmed fraud frequently violates basic balance integrity rules
- Fraud occurs in repeated high-value bursts rather than isolated events
- Funds are concentrated into a small number of destination accounts
- Advanced SQL querying and aggregation
- Performance optimization using indexes
- Fraud and risk analysis
- Audit-style documentation
- Large dataset handling and governance awareness
Query outputs are not stored or exported. All findings are reproducible by executing the documented SQL scripts against the dataset. This approach reflects standard practices in regulated financial and audit environments.
- Load raw data into PostgreSQL
- Create a cleaned analytical view
- Build performance indexes
- Conduct baseline analysis
- Explore fraud patterns and red flags
- Document findings and recommendations
- Load
data/paysim1.csvinto PostgreSQL astransactions_raw - Run
01_create_views.sqlin pgAdmin - Run index scripts (
02_indexes.sql) - Execute analysis scripts in order
This project uses simulated data for educational and portfolio purposes only.