6- Social Buzz AI - Fake News Detection Using Machine Learning

Fake.News.Detection.-.Machine.Learning.in.the.Fight.Against.Disinformation.mov.mov

Slide Presentation

Course: Humanistic AI & Data Science (4th Semester)
Institution: PUC-SP
Professor: Erick Bacconi

Tip

This repository 2-social-buzz-ai-GBoost-and-LowDefault-Modeling is part of the main project 1-social-buzz-ai-main. To explore all related materials, analyses, and notebooks, visit the main repository

1-social-buzz-ai-main Part of the Humanistic AI Research & Data Modeling Series — where data meets human insight.
Access: NLP - Class 1 Repo
Access Code
Access True Dataset
Access Fake Dataset

1. Introduction

Fake news are false information mainly spread on social networks, which can cause serious political, social, and public health impacts.
This study aims to apply Machine Learning (ML) algorithms to automatically detect fake news, offering a technological alternative to address this issue.

2. Study Objectives

Test and compare different ML algorithms for detecting fake news.
Assess the performance of each model in accuracy, sensitivity, and specificity.
Propose an automated, replicable, and useful solution for society.

3. Detailed Methodology

3.1. Dataset

Data obtained from Kaggle:
- Fake: 23,481 samples
- True: 21,417 samples
Main columns: title, text, topic, date

3.2. Tools Used

Python + libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, NLTK

3.3. Data Processing

import pandas as pd
import numpy as np
import string
import nltk
from nltk.corpus import stopwords

# Load fake and true datasets
fake = pd.read_csv('Fake.csv')
true = pd.read_csv('True.csv')
fake['target'] = 1
true['target'] = 0

# Concatenate and shuffle records
data = pd.concat([fake, true], ignore_index=True)
data = data.sample(frac=1).reset_index(drop=True)

# Remove title and date columns
data.drop(['title', 'date'], axis=1, inplace=True)

# Clean text (lowercase, no punctuation, no stopwords)
nltk.download('stopwords')
stop_words = set(stopwords.words('portuguese'))

def clean_text(text):
    text = text.lower()
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = " ".join([word for word in text.split() if word not in stop_words])
    return text

data['text'] = data['text'].apply(clean_text)

3.4. Visualization: WordCloud

from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(width=800, height=400, background_color='white').generate(' '.join(data['text']))
plt.figure(figsize=(10, 5))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

3.5. Vectorization and Split

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

X = data['text']
y = data['target']

vectorizer = TfidfVectorizer(max_features=5000)
X_vect = vectorizer.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_vect, y, test_size=0.2, random_state=42)

4. Applied Algorithms

Five supervised models were trained and evaluated:

Model	Accuracy	Remarks
Logistic Regression	98.92%	High precision, confusion matrix shows low error rate.
Decision Tree	99.6%	Best overall performance and lowest error.
Random Forest	98.74%	Good performance, consistent confusion matrix.
Support Vector Machine	99.5%	Excellent accuracy and precision, robust text model.
K-Nearest Neighbors (KNN)	60.84%	Low performance, high number of false negatives.

Metrics computed via confusion matrix (including TP, TN, FP, FN) and specific precision and sensitivity values for each model.

Example: Simplified Confusion Matrix (Decision Tree)

4.1. Model Training and Evaluation

Below are examples of models tested and evaluated.

Logistic Regression

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Decision Tree

from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Random Forest

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

SVM

from sklearn.svm import SVC

svm = SVC()
svm.fit(X_train, y_train)
y_pred = svm.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

KNN

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Confusion Matrix Visualization

import seaborn as sns

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

5. Evaluation Metrics

Accuracy: Overall model success rate.
Precision: How well fake news are detected.
Sensitivity: Model's ability to correctly identify actual fake news.
Specificity: Model's ability to correctly identify actual real news.

Model Performance

Model	Precision	Sensitivity	Specificity
Logistic Regression	98%	99%	98%
Decision Tree	98.5%	99%	99%
Random Forest	98.5%	99%	98%
SVM	99%	99%	99%
KNN	99%	57%	19%

6. Detailed Results

Four out of five models achieved accuracy above 90%.
KNN performed poorly, mainly due to the high number of false negatives (57% sensitivity).
Decision Tree and SVM excelled as the most efficient.
Data processing and feature selection were key to the models' success.

7. Limitations and Future Directions

Study limitations: Difficulty in finding standardized datasets (especially in Portuguese), few systems applied to the Brazilian context.
Future directions: Test new algorithms (Naive Bayes, Boosting, K-means, Gradient Descent), apply other validation techniques, expand Portuguese datasets, include cross-validation (K-fold, Leave-one-out), and develop web applications for public use.

8. Conclusion

Machine Learning has proven powerful for fake news detection and is crucial for protecting society from the impact of false information.
Ongoing research is especially necessary in the Brazilian context.

9. Our Crew:

👩🏻‍🚀 Fabiana ⚡️ Campanari - Shoot me an email
👨🏽‍🚀 Pedro Barrenco

10. QR Code

11. References

Monteiro Bastos & Monteiro de Lima (2023). Fake News Detection Using Decision Tree, Support Vector Machine, and K-Nearest Neighbors Algorithms. Revista de Estudos Multidisciplinares XV Encontro Científico da UNDB.

💌 Let the data flow... Ping Me !

Contact and Support

For notebook files, detailed tutorials, or enhanced visualizations, please reach out.
Interested in Python notebooks simulating these dynamics or advanced Humanistic AI models? Just ask!

🛸๋ My Contacts Hub

────────────── 🔭⋆ ──────────────

➣➢➤ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
Code		Code
cleaned_data_Fake		cleaned_data_Fake
cleaned_data_True		cleaned_data_True
dataset_Fake		dataset_Fake
dataset_True		dataset_True
🇧🇷Apresentaçao_ Oral_TODOS_ARQUIVOS		🇧🇷Apresentaçao_ Oral_TODOS_ARQUIVOS
🇬🇧 ORAL PRESENTATION_ALL_FILES		🇬🇧 ORAL PRESENTATION_ALL_FILES
.gitignore		.gitignore
Apresentançao - Detecção de Fake News - PPTX - Exploratory.pptx		Apresentançao - Detecção de Fake News - PPTX - Exploratory.pptx
DETECÇÃO+DE+FAKE+NEWS.pdf		DETECÇÃO+DE+FAKE+NEWS.pdf
LICENSE		LICENSE
QR_Code_+_Octocat.md		QR_Code_+_Octocat.md
README.md		README.md
README.pt_BR.md		README.pt_BR.md
Screenshot 2025-11-05 at 17.38.11.png		Screenshot 2025-11-05 at 17.38.11.png
qr_github_repo.png		qr_github_repo.png
requirements.txt		requirements.txt
𝗡𝗟𝗣_𝘄𝗶𝘁𝗵_𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀.pdf		𝗡𝗟𝗣_𝘄𝗶𝘁𝗵_𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀.pdf

Uh oh!

License

Mindful-AI-Assistants/6-social-buzz-ai-fake-news-detection-ml-br

Folders and files

Latest commit

History

Repository files navigation

6- Social Buzz AI - Fake News Detection Using Machine Learning

1. Introduction

2. Study Objectives

3. Detailed Methodology

3.1. Dataset

3.2. Tools Used

3.3. Data Processing

3.4. Visualization: WordCloud

3.5. Vectorization and Split

4. Applied Algorithms

Example: Simplified Confusion Matrix (Decision Tree)

4.1. Model Training and Evaluation

5. Evaluation Metrics

6. Detailed Results

7. Limitations and Future Directions

8. Conclusion

9. Our Crew:

10. QR Code

11. References

💌 Let the data flow... Ping Me !

🛸๋ My Contacts Hub

Copyright 2025 Mindful-AI-Assistants. Code released under the MIT license.

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Languages