This project focuses on building a Fake News Detection system using Python and Machine Learning techniques. The model classifies news articles as "Real" or "Fake" based on their content. The project utilizes Natural Language Processing (NLP) for text preprocessing and various classification algorithms for model building.
Fake-News-Detection/ β βββ data/ # Dataset folder β βββ True.csv # Real news dataset β βββ Fake.csv # Fake news dataset β βββ notebooks/ # Jupyter Notebooks for development β βββ fake_news_detection.ipynb β βββ models/ # Saved models β βββ fake_news_model.pkl β βββ app.py # Flask/Streamlit app for deployment β βββ README.md # Project documentation
-
Clone the repository: bash git clone https://github.com/yourusername/Fake-News-Detection.git cd Fake-News-Detection
-
Create a virtual environment and activate it: bash python -m venv fake-news-env
fake-news-env\Scripts\activate
source fake-news-env/bin/activate
-
Install the required dependencies: bash pip install -r requirements.txt
-
Run the Jupyter Notebook: bash jupyter notebook
The dataset used in this project is taken from Kaggle. It contains two files:
- True.csv: Contains real news articles.
- Fake.csv: Contains fake news articles.
You can download the dataset from the following link: https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
The following preprocessing steps were performed on the dataset:
- Converting text to lowercase.
- Removing punctuation and special characters.
- Removing stopwords.
- Tokenization.
- Stemming/Lemmatization.
The following algorithms were used for building the Fake News Detection model:
- Logistic Regression
- Naive Bayes
The text data was vectorized using TF-IDF (Term Frequency-Inverse Document Frequency) to convert text into numerical features.
The model performance was evaluated using:
- Accuracy Score
- Confusion Matrix
- Precision, Recall, F1 Score
-
Run the Jupyter Notebook to train the model: bash jupyter notebook notebooks/fake_news_detection.ipynb
-
Run the Flask app for deployment: bash python app.py
- Python
- Pandas
- NumPy
- Scikit-learn
- NLTK
- Matplotlib
- Seaborn
- Flask/Streamlit
- Try different machine learning algorithms such as Random Forest or SVM.
- Perform hyperparameter tuning to improve model performance.
- Deploy the project using Streamlit or Heroku for a user-friendly interface.
This project is licensed under the Apache 2.0 License. Feel free to use and modify the code.