This project implements a hybrid Book Recommendation System using the Book-Crossing dataset. It suggests books to users based on their ratings and preferences by combining content-based and collaborative filtering techniques.
- Content-Based Filtering using book metadata (title, author, publisher)
- User-Based Collaborative Filtering based on similar users' preferences
- Item-Based Collaborative Filtering for users with only one rating
- Global Top Picks fallback for users with no ratings
- Source: Book-Crossing Dataset on Kaggle
- Files Used:
Books.csvRatings.csvUsers.csv
- Data Loading: Downloads and loads CSV files using
kagglehubandpandas. - Preprocessing:
- Fills missing values and removes zero ratings.
- Merges
Books,Ratings, andUsersdata.
- Feature Engineering:
- Combines
Title,Author, andPublisher. - Applies
TfidfVectorizerand normalizes vectors.
- Combines
- Recommendation Logic (
recommend_books(user_id, ...)):- Chooses the appropriate method based on how many books the user has rated.
- Creates a metadata profile using
TF-IDFand computes similarity using cosine distance. - Recommends books similar to those rated highly by the user (≥ 8).
- Builds a user-item matrix and finds users with similar rating behavior.
- Suggests books liked by those similar users.
- Triggered when the user has rated exactly one book.
- Finds other users who liked the same book and recommends books they also enjoyed.
- Used for new users with no prior ratings.
- Recommends the highest-rated books overall.