Non - Immigrant Visa Analysis and End-to-End H-1B Visa prediction using dbt (Data Build Tool) and Python

For an interactive insight on Non Immigrant Visa Issuances, visit My Tableau Profile .

This repository contains all the work and experiments carried out to build a robust prediction model, The Gateway to Opportunity PDF file contains the details about the experiments.All folders and files outside of the project_walkthrough folder are part of the initial experiments conducted to decide on a suitable classification method. If you are interested in the actual workflow and final implementation, please refer to the project_walkthrough folder.

Project Walkthrough Folder Structure

The project_walkthrough folder contains the finalized workflow, organized in the following structure:

Root Files

directory_structure.txt: Contains a detailed structure of the project.

Data Folder

perm_23_q4.xlsx: The raw dataset used for data processing and transformation.

DBT Folder

The DBT folder contains the configurations and SQL models used for data transformations.

Subfolders:

analyses: Contains analysis scripts (currently empty).
logs: Logs generated during the dbt runs.
macros: Contains reusable macros for data transformations.
models: SQL models for staging and transforming the data.
- staging: Intermediate transformations.
  - binning.sql: Binning data.
  - education_encoding.sql: Encoding educational information.
  - employer_age.sql: Calculating employer age.
  - feature_selection.sql: Selecting relevant features.
- transforming: Final transformations.
  - convert_yes_no.sql: Converts binary columns from yes/no to 1/0.
  - dim_transformed.sql: Combines binning and transformed data.
  - target_class.sql: Defines the target class for classification.
seeds: Contains seed files for dbt (currently empty).
snapshots: Used for capturing data at a specific point in time (currently empty).
target: Stores compiled models and artifacts generated after dbt runs.
tests: Contains test cases for validating dbt transformations.

Logs Folder

dbt.log: Logs from dbt runs for debugging and monitoring.

Notebooks Folder

EDA_transformation.ipynb: Exploratory data analysis and initial transformations.
prediction.ipynb: Predictive analysis and model evaluation.

Scripts Folder

load_xlsx_to_postgres.py: Script to load data from Excel to PostgreSQL.
state_country_transform.ipynb: Additional transformations for state and country data.

Classification Results and Insights

The classification model was trained using SMOTE sampling to handle class imbalance and a Random Forest Classifier for prediction. The model was evaluated using precision, recall, and F1-score metrics. Here are the key takeaways:

Classification Report (SMOTE + Random Forest)

Metric	Class 0	Class 1	Accuracy	Macro Avg	Weighted Avg
Precision	0.93	0.96		0.95	0.95
Recall	0.96	0.93		0.95	0.95
F1-Score	0.95	0.95		0.95	0.95
Support	18514	18631	37145	37145	37145

Accuracy: 95%

High Accuracy:
- The model achieved an accuracy of 97.48%, indicating that it correctly classified the majority of instances.
Balanced Performance Across Classes:
- Both classes (0 and 1) demonstrated similarly high performance metrics:
  - Class 0: Precision = 0.93, Recall = 0.96, F1-score = 0.95
  - Class 1: Precision = 0.96, Recall = 0.93, F1-score = 0.95
- This balanced performance suggests that the model does not favor one class over the other, indicating that SMOTE sampling effectively addressed the class imbalance.
Macro and Weighted Averages:
- Both the macro average and weighted average of precision, recall, and F1-score are 0.97, confirming consistent performance across classes.
Generalization and Robustness:
- The high accuracy and balanced metrics indicate that the model generalizes well to new data without significant bias towards any particular class.

In summary, the combination of SMOTE sampling and the Random Forest Classifier resulted in a robust model with high accuracy and balanced classification performance.

To carry out the same project,

Clone the repository and navigate to the project_walkthrough folder.
Load the data using the load_xlsx_to_postgres.py script.
Run the dbt models to perform data transformations.
Open the notebooks for EDA and predictions.

For any questions or further clarifications, improvements and contributions, feel free to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
H-1B Prediction Files		H-1B Prediction Files
H1B_Data_Files		H1B_Data_Files
NonImmigrantVisas_Data_Files		NonImmigrantVisas_Data_Files
project_walkthrough		project_walkthrough
Gateway_to_Opportunity__An_Overview_of_Non_Immigrant_Visas.pdf		Gateway_to_Opportunity__An_Overview_of_Non_Immigrant_Visas.pdf
README.md		README.md
f1_Data_Analysis.ipynb		f1_Data_Analysis.ipynb
h1b_EDA.ipynb		h1b_EDA.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non - Immigrant Visa Analysis and End-to-End H-1B Visa prediction using dbt (Data Build Tool) and Python

Project Walkthrough Folder Structure

Root Files

Data Folder

DBT Folder

Subfolders:

Logs Folder

Notebooks Folder

Scripts Folder

Classification Results and Insights

Classification Report (SMOTE + Random Forest)

To carry out the same project,

About

Uh oh!

Releases

Packages

Languages

KartikThakkar1/Non-Immigrant-Visa-Analysis-and-H-1B-Visa-Prediction

Folders and files

Latest commit

History

Repository files navigation

Non - Immigrant Visa Analysis and End-to-End H-1B Visa prediction using dbt (Data Build Tool) and Python

Project Walkthrough Folder Structure

Root Files

Data Folder

DBT Folder

Subfolders:

Logs Folder

Notebooks Folder

Scripts Folder

Classification Results and Insights

Classification Report (SMOTE + Random Forest)

To carry out the same project,

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages