Welcome to the Real Estate Market Analysis project!
This project explores property + customer datasets to uncover trends in:
- Property prices 🏷️
- Customer demographics 👥
- Building types 🏢
- Country & State segmentation 🌍
- Sales patterns over time 📅
The workflow covers data cleaning → merging → analysis → visualization → insights.
| Feature | Description |
|---|---|
| 🔧 Data Cleaning | Handle missing values, inconsistent entries, and format dates |
| 🔗 Dataset Merging | Merge 267×19 rows using customer_id |
| 📊 Descriptive Statistics | Summary stats by building, state, and country |
| 🧩 Segmentation | Building type segmentation, State Pareto analysis |
| 📈 Visualizations | Age histograms, deal satisfaction by country, Pareto charts, revenue trends, stacked area charts |
- Python 3
- Pandas
- NumPy
- Matplotlib / Seaborn
- Jupyter Notebook
| Finding | Conclusion |
|---|---|
| Building Type Distribution | Building 4 properties are larger, costlier, and have highest satisfaction |
| Country Breakdown | USA had duplicated text formatting; fixed via string cleaning |
| State Pareto | Few states dominate majority of sales; cumulative frequency validates US-only entries |
| Age Analysis | Age groups created → shows differences in buying patterns |
| Price Bands | 10 price intervals reveal sold vs unsold distribution |
| Age–Price Relationship | Weak–moderate correlation observed via covariance & correlation |
Deal Satisfaction by Country |
Age Distribution Histogram |
US Segmentation by State Pareto |
Stacked Area Chart V2 |
Stacked Area Chart |
Total Revenue per Year |






