The Movie Recommendation System is a project designed to suggest movies to users based on their preferences. It uses a precomputed similarity matrix and a dataset of movies to provide personalized recommendations through an interactive web interface built with Streamlit. Additionally, the project includes a Jupyter Notebook for data preprocessing and model creation.
- Personalized Recommendations: Suggests movies similar to the one selected by the user.
- Interactive Interface: Users can select a movie from a dropdown and get recommendations instantly.
- Streamlit Integration: A lightweight and user-friendly web application framework.
- Data Preprocessing Notebook: A detailed Jupyter Notebook for merging, cleaning, and preparing the dataset for recommendations.
- Programming Language: Python
- Libraries:
- Streamlit
- Pandas
- NumPy
- Scikit-learn
- NLTK
- Pickle
- Data: TMDB 5000 Movies and Credits datasets.
- Clone the repository:
git clone https://github.com/your-username/Movie_Recommendation_System.git
- Navigate to the project directory:
cd Movie_Recommendation_System - Install the required dependencies:
pip install -r requirements.txt
- Ensure the required
.pklfiles (movie_dict.pklandsimilarity.pkl) are present in the project directory. - Run the application:
streamlit run 1.py
- Open your browser and navigate to the URL provided by Streamlit (e.g.,
http://localhost:8501). - Select a movie from the dropdown menu and click the "Recommend" button to see the recommendations.
- Open the
Movie_Recomendation_System.ipynbfile in Jupyter Notebook or JupyterLab. - Follow the steps in the notebook to:
- Load and preprocess the TMDB datasets.
- Merge, clean, and transform the data.
- Create a similarity matrix using cosine similarity.
- Save the processed data and similarity matrix as
.pklfiles.
Movie_Recommendation_System/
│
├── 1.py # Main Streamlit application file
├── Movie_Recomendation_System.ipynb # Jupyter Notebook for data preprocessing
├── tmdb_5000_movies.csv # Movies dataset
├── tmdb_5000_credits.csv # Credits dataset
├── movie_dict.pkl # Preprocessed movie dataset (dictionary format)
├── similarity.pkl # Precomputed similarity matrix
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Dataset Loading: The TMDB 5000 Movies and Credits datasets are loaded into Pandas DataFrames.
- Data Cleaning: Unnecessary columns are removed, missing values are handled, and duplicate entries are dropped.
- Feature Engineering:
- Columns like
genres,keywords,cast, andcreware transformed into a unified format. - A new
tagscolumn is created by combining relevant features.
- Columns like
- Vectorization: The
tagscolumn is vectorized using CountVectorizer to create numerical representations. - Similarity Matrix: Cosine similarity is computed between movie vectors to identify similar movies.
- When a user selects a movie, its index is retrieved from the dataset.
- The similarity scores for the selected movie are sorted to find the top 5 most similar movies.
- The recommendations are displayed in the Streamlit interface.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Add feature-name" - Push to the branch:
git push origin feature-name
- Open a pull request.
This project is licensed under the MIT License.
- Thanks to the open-source community for providing useful libraries and tools.
- TMDB for providing the datasets used in this project.
- Inspiration from popular recommendation systems like Netflix and Amazon.