This project focuses on building a recommendation system for a clothing and fashion store based on customer's previous purchase history. Two different models have been implemented: Xception model for image data and BERT model for textual descriptions.
- articles.csv: Information about articles in the store.
- customers.csv: Information about customers.
- transactions_train.csv: Information about customer transactions.
- Images: Images for every article (product).
- 02.Model_Building_Xceptionmodel.ipynb: Implementation of the recommendation system using the Xception model.
- 03.Model_Building_Bertmodel.ipynb: Implementation of the recommendation system using the BERT model.
- 01.EDA.ipynb: Code for exploratory data analysis.
- Various images generated during EDA for better visualization and understanding.
Summary Statistics:
- Number of unique articles: 105,542
- Number of unique products: 47,224
- Number of unique product types: 132
- Number of unique product groups: 19
- Number of unique physical appearances: 30
- ...
Missing Values:
- About 0.4% of articles have no description, but there are no other missing values.
Volume Analysis:
- Products categorized into groups visualized using bar and pie charts.
- Age distribution of customers.
- Fashion news frequency distribution among customers.
- Visualization of unsold articles.
- Average spending per day per customer.
- ...
Top Performing and Least Performing Products:
- Identification of top 100 products generating the most earnings.
- Identification of worst-performing products (unsold and sold once).
Customer Analysis:
- Distribution of purchased quantity by customers.
- Analysis of purchased quantities based on customer age groups and fashion news frequency.
This section focuses on implementing a recommendation system using the Xception model. Recommendations are based on customer preferences inferred from historical transactions.
- Load articles, customer data, and transaction data.
- Retrieve image paths for articles.
- Extract feature embeddings using the Xception model for images associated with articles in the transaction data.
- Create a dataframe with article IDs and their corresponding image features.
- Extract article embeddings for articles in the dataset.
- Calculate cosine similarity between article embeddings.
Generate recommendations for each customer in the test set and visualize the input images along with the recommended items.
This Xception model-based recommendation system aims to enhance customer satisfaction and increase sales by providing personalized product suggestions based on their preferences. The combination of image embeddings and cosine similarity contributes to the effectiveness of the recommendation engine.
This part involves implementing a recommendation system using a BERT-based model. Recommendations are based on the textual descriptions of articles.
- Load articles, customer data, and transaction data.
- Extract unique customer IDs from the transactions.
- Load the BERT model and tokenizer.
- Create a database of word embeddings for article descriptions using BERT.
- Merge BERT word embeddings with image data and article information.
- Generate recommendations for a list of article descriptions.
This BERT model-based recommendation system leverages word embeddings to capture the semantic meaning of article descriptions. The recommendation engine aims to enhance customer satisfaction by providing personalized product suggestions based on textual information. The combination of image and text-based recommendation systems can offer a comprehensive and tailored shopping experience for customers.
βββ LICENSE
βββ Makefile <- Makefile with commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ data
βΒ Β βββ external <- Data from third party sources.
βΒ Β βββ interim <- Intermediate data that has been transformed.
βΒ Β βββ processed <- The final, canonical data sets for modeling.
βΒ Β βββ raw <- The original, immutable data dump.
β
βββ docs <- A default Sphinx project; see sphinx-doc.org for details
β
βββ models <- Trained and serialized models, model predictions, or model summaries
β
βββ notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
β the creator's initials, and a short `-` delimited description, e.g.
β `1.0-jqp-initial-data-exploration`.
β
βββ references <- Data dictionaries, manuals, and all other explanatory materials.
β
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
βΒ Β βββ figures <- Generated graphics and figures to be used in reporting
β
βββ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
β generated with `pip freeze > requirements.txt`
β
βββ setup.py <- makes project pip installable (pip install -e .) so src can be imported
βββ src <- Source code for use in this project.
βΒ Β βββ __init__.py <- Makes src a Python module
β β
βΒ Β βββ data <- Scripts to download or generate data
βΒ Β βΒ Β βββ make_dataset.py
β β
βΒ Β βββ features <- Scripts to turn raw data into features for modeling
βΒ Β βΒ Β βββ build_features.py
β β
βΒ Β βββ models <- Scripts to train models and then use trained models to make
β β β predictions
βΒ Β βΒ Β βββ predict_model.py
βΒ Β βΒ Β βββ train_model.py
β β
βΒ Β βββ visualization <- Scripts to create exploratory and results oriented visualizations
βΒ Β βββ visualize.py
β
βββ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
Project based on the cookiecutter data science project template. #cookiecutterdatascience