This is a book recommendation system based on the book rating data from GoodReads_100k dataset. The dataset contains 100k book.
recommendation_data_cleaning.ipynb is used to clean the data. The data is cleaned by removing the books with less than 50 ratings and users with less than 50 ratings. After running .ipynb file, It works TF-IDF Vectorizer and Cosine Similarity to find the similarity between books. The model is saved as cosine_sim_desc.pkl in model folder and final_data.csv also in model folder it contains the data after cleaning (25151 Books).
app.py is used to run the web app. The web app is created using Streamlit.
- Python
- Pandas
- Numpy
- Scikit-learn
- Streamlit
- Clone the repository
- Install the requirements using
pip install -r requirements.txt - Download the dataset from GoodReads_100k and place it in the dataset folder.
- Run
recommendation_data_cleaning.ipynbto clean the data and train the model. - Run
app.pyusingstreamlit run app.pyIt will open the web app in the browser.