Migration-TR: Turkish Migration Discourse Dataset 🐦🇹🇷

Migration Discourses on X.com: Analysis of Public Perceptions and Attitudes Toward Refugees in Turkey Using Natural Language Processing

Evrim Yılmaz Polat* and Evrim Çağın Polat
Department of Sociology, Zonguldak Bülent Ecevit University, Zonguldak, Turkey;
Notrino Research, ODTÜ Teknokent, Ankara, Turkey

Computational Social Sciences Turkey (CSSTR) - Computational Social Sciences Working Group

🎯 Overview

This repository contains the Migration-TR dataset and accompanying AI models for analyzing migration discourse in Turkish social media. Our research analyzes 6 million tweets collected between 2011-2022 using the Twitter Academic API, focusing on public perceptions and attitudes toward migrants and refugees in Turkey.

Key Highlights

Component	Details
Dataset	5,884,624 raw tweets → 3,814,679 regular (human-authored) tweets
Time Span	12-year temporal analysis (2011-2022)
Classification	8-class granular model (F1: 0.768)
Bot Detection	XGBoost model (F1: 0.832) for data enrichment
Visualizations	Interactive time-series charts via Plotly

🤖 AI Models

We provide two perception-attitude classification models based on Intergroup Threat Theory:

Model	Schema	Classes	Macro F1	Use Case
Granular	`all-classes`	8	0.768	✅ Recommended for applications
Super-Class	`super-classes`	6	0.801	🧪 Experimental

Architecture

Base Model: VRLLab/TurkishBERTweet (894M tweets pre-trained)
Fine-tuning: LoRA adapters
Training Data: 15,000 manually annotated tweets

Classification Labels

Both models classify tweets into perception-attitude categories based on Intergroup Threat Theory:

Class Labels	Description	Theoretical Basis
Sympathy	Positive attitudes toward migrants	Counter-frame
Neutral	Neutral/informational content	No threat frame
Antipathy: Economic Threat	Fiscal burden, opposition to aid	Realistic threat
Antipathy: Employment Threat	Jobs/wages competition	Realistic threat
Antipathy: Security Threat	Crime, violence, border concerns	Realistic threat
Antipathy: Identity Threat	Cultural imposition, demographic change	Symbolic threat
Antipathy: Political Threat	Naturalization/voting as threat	Realistic threat
Antipathy: Other	Generalized hostility	Generalized threat

Bot Detection Model

Attribute	Value
Architecture	XGBoost (ONNX format)
Features	17 user behavior and profile characteristics
Performance	F1 = 0.832
Purpose	Enrich dataset with bot likelihood scores

📊 Dataset

Attribute	Value
Total Tweets	5,884,624 (raw) → 3,814,679 (regular subset)
Time Period	January 1, 2011 - December 31, 2022
Language	Turkish
Data Source	Twitter Academic API
Processing	Cleaned, deduplicated, enriched with bot/duplicate flags

📋 Click to view complete data schema (26 fields)

Important: Fields marked ❌ Confidential are retained only for internal compliance and are not distributed.

Field	Type	Description	Availability
`created_at`	datetime	Tweet creation timestamp	✅ Available
`tweet_location`	string	Geographic location (if available)	✅ Available
`text`	string	Tweet content (Turkish)	✅ Available
`retweets`	int	Number of retweets	✅ Available
`replies`	int	Number of replies	✅ Available
`likes`	int	Number of likes	✅ Available
`quote_count`	int	Number of quote tweets	✅ Available
`author_id`	string	Anonymized author identifier	❌ Confidential
`username`	string	Author username	❌ Confidential
`name`	string	Author display name	❌ Confidential
`author_pic`	string	Profile picture URL	❌ Confidential
`author_followers`	int	Follower count	❌ Confidential
`author_listed`	int	Listed count	❌ Confidential
`author_following`	int	Following count	❌ Confidential
`author_tweets`	int	Total tweet count	❌ Confidential
`author_protected`	boolean	Protected account status	❌ Confidential
`author_entities`	json	Profile entities	❌ Confidential
`author_description`	string	Profile bio	❌ Confidential
`author_verified`	boolean	Verification status	❌ Confidential
`author_created_at`	datetime	Account creation date	❌ Confidential
`author_withheld`	string	Withheld status	❌ Confidential
`author_location`	string	Author location	❌ Confidential
`is_duplicate`	boolean	Exact duplicate flag	✅ Available
`bot_prob`	float	Bot probability score (0-1)	✅ Available
`is_bot`	boolean	Bot likelihood flag	✅ Available
`all_classes_results`	json	AI model predictions	✅ Available

🚀 Quick Start

Perception-Attitude Classification

# Install dependencies
pip install -r requirements.txt

# 8-class Granular model (recommended)
python run_inference.py --text "Mültecilere vatandaşlık verilmesin" --model-type all-classes

# 6-class Super-Class model (experimental)
python run_inference.py --text "Mültecilere vatandaşlık verilmesin" --model-type super-classes

# Run on CPU
python run_inference.py --text "Mültecilere vatandaşlık verilmesin" --device cpu

Bot Detection

python run_bot_detection.py --features example_user_data.json

📈 Interactive Visualizations

Explore temporal dynamics of migration discourse: View Interactive Charts

📊 Pan, zoom, and hover for detailed data points
📅 12-year temporal coverage (2011-2022)
💾 Export charts as PNG

📂 Repository Structure

Migration-TR/
├── trained_models/
│   ├── perception_attitude_clf_super_classes/   # 6-class model weights
│   ├── perception_attitude_all_classes/         # 8-class model weights
│   └── bot_clf/                                 # Bot detection model
├── docs/                                        # GitHub Pages site
│   ├── index.html                               # Main visualization page
│   └── assets/plots/                            # Interactive Plotly charts
├── run_inference.py                             # Classification inference script
├── run_bot_detection.py                         # Bot detection script
├── example_user_data.json                       # Sample bot detection input
├── requirements.txt                             # Python dependencies
└── DATA_USE_AGREEMENT.md                        # Data use agreement

📝 Data Access

🔐 Access Requirements

Who Can Access:

Academic Researchers at accredited institutions
Graduate Students with supervisor approval
Policy Researchers at recognized organizations
Non-commercial use only - no commercial applications

Not Permitted:

Commercial use or monetization
Surveillance or tracking applications
Attempts to re-identify users
Redistribution of raw tweet text

📋 Access Process

Step 1: Review Data Use Agreement

Read our comprehensive Data Use Agreement carefully.

Step 2: Submit Request

Email your signed DUA to: info@csstr.org

Include:

Your institutional affiliation
Research purpose and methodology
Specific data requirements (which specific data chunk you need: From Chunk-1 to Chunk-11770)
Supervisor information (for students)

Step 3: Approval & Delivery

We review within 5 business days
Approved users receive secure download links
Data delivered as password-protected archives
Manual delivery: maximum 500 hydrated objects per recipient per day (non-automated delivery only)

⚠️ Important Disclaimer

Data Delivery Policy: Due to X.com (formerly Twitter) Developer Policy requirements, we manually deliver:

Maximum 500 hydrated tweets per recipient per day (non-automated delivery via email/SFTP)
Multiple researchers can receive data simultaneously (500 objects per person per day)
Academic use only
No public redistribution of full tweet text allowed
24-hour deletion compliance: CSSTR monitors X Compliance API and will inform recipients; you must delete or mask affected tweets within 24 hours

Legal Framework: This dataset complies with:

X.com Developer Agreement (current version)
Turkish data protection laws
GDPR requirements for research
Academic research ethics standards

📖 Citation

If you use Migration-TR in your research, please cite:

@article{yilmazpolat_migration_2025,
  title={Migration Discourses on X.com: Analysis of Public Perceptions and 
         Attitudes Toward Refugees in Turkey Using Natural 
         Language Processing},
  author={Yılmaz Polat, Evrim and Çağın Polat, Evrim},
  journal={[Under Review]},
  year={2025},
  note={Dataset available at: https://github.com/cssturkiye/Migration-TR}
}

Paper Status: Currently under peer review. Citation will be updated upon acceptance.

🙏 Acknowledgments

Our classification model is built upon TurkishBERTweet by VRLLab (Najafi & Varol, 2024). We thank the authors for making their work available.

Base Model: VRLLab/TurkishBERTweet (894M Turkish tweets, 163M parameters)

🤝 Contact

Evrim Yılmaz Polat, PhD - Corresponding Author
Department of Sociology, Zonguldak Bülent Ecevit University

Evrim Çağın Polat - Co-Author
Notrino Research, ODTÜ Teknokent, Ankara

📧 Email: info@csstr.org
🏛️ Organization: Computational Social Sciences Turkey (CSSTR)

Migration-TR | Advancing Migration Research Through Computational Social Science

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Migration-TR: Turkish Migration Discourse Dataset 🐦🇹🇷

🎯 Overview

Key Highlights

🤖 AI Models

Architecture

Classification Labels

Bot Detection Model

📊 Dataset

🚀 Quick Start

Perception-Attitude Classification

Bot Detection

📈 Interactive Visualizations

📂 Repository Structure

📝 Data Access

🔐 Access Requirements

📋 Access Process

Step 1: Review Data Use Agreement

Step 2: Submit Request

Step 3: Approval & Delivery

⚠️ Important Disclaimer

📖 Citation

🙏 Acknowledgments

🤝 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
TurkishBERTweet		TurkishBERTweet
docs		docs
trained_models		trained_models
.gitattributes		.gitattributes
.gitignore		.gitignore
DATA_USE_AGREEMENT.md		DATA_USE_AGREEMENT.md
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
example_user_data.json		example_user_data.json
requirements.txt		requirements.txt
run_bot_detection.py		run_bot_detection.py
run_inference.py		run_inference.py

License

cssturkiye/migration-tr

Folders and files

Latest commit

History

Repository files navigation

Migration-TR: Turkish Migration Discourse Dataset 🐦🇹🇷

🎯 Overview

Key Highlights

🤖 AI Models

Architecture

Classification Labels

Bot Detection Model

📊 Dataset

🚀 Quick Start

Perception-Attitude Classification

Bot Detection

📈 Interactive Visualizations

📂 Repository Structure

📝 Data Access

🔐 Access Requirements

📋 Access Process

Step 1: Review Data Use Agreement

Step 2: Submit Request

Step 3: Approval & Delivery

⚠️ Important Disclaimer

📖 Citation

🙏 Acknowledgments

🤝 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages