Migration Discourses on X.com: Analysis of Public Perceptions and Attitudes Toward Refugees in Turkey Using Natural Language Processing
Evrim Yılmaz Polat* and Evrim Çağın Polat
Department of Sociology, Zonguldak Bülent Ecevit University, Zonguldak, Turkey;
Notrino Research, ODTÜ Teknokent, Ankara, Turkey
Computational Social Sciences Turkey (CSSTR) - Computational Social Sciences Working Group
This repository contains the Migration-TR dataset and accompanying AI models for analyzing migration discourse in Turkish social media. Our research analyzes 6 million tweets collected between 2011-2022 using the Twitter Academic API, focusing on public perceptions and attitudes toward migrants and refugees in Turkey.
| Component | Details |
|---|---|
| Dataset | 5,884,624 raw tweets → 3,814,679 regular (human-authored) tweets |
| Time Span | 12-year temporal analysis (2011-2022) |
| Classification | 8-class granular model (F1: 0.768) |
| Bot Detection | XGBoost model (F1: 0.832) for data enrichment |
| Visualizations | Interactive time-series charts via Plotly |
We provide two perception-attitude classification models based on Intergroup Threat Theory:
| Model | Schema | Classes | Macro F1 | Use Case |
|---|---|---|---|---|
| Granular | all-classes |
8 | 0.768 | ✅ Recommended for applications |
| Super-Class | super-classes |
6 | 0.801 | 🧪 Experimental |
- Base Model: VRLLab/TurkishBERTweet (894M tweets pre-trained)
- Fine-tuning: LoRA adapters
- Training Data: 15,000 manually annotated tweets
Both models classify tweets into perception-attitude categories based on Intergroup Threat Theory:
| Class Labels | Description | Theoretical Basis |
|---|---|---|
| Sympathy | Positive attitudes toward migrants | Counter-frame |
| Neutral | Neutral/informational content | No threat frame |
| Antipathy: Economic Threat | Fiscal burden, opposition to aid | Realistic threat |
| Antipathy: Employment Threat | Jobs/wages competition | Realistic threat |
| Antipathy: Security Threat | Crime, violence, border concerns | Realistic threat |
| Antipathy: Identity Threat | Cultural imposition, demographic change | Symbolic threat |
| Antipathy: Political Threat | Naturalization/voting as threat | Realistic threat |
| Antipathy: Other | Generalized hostility | Generalized threat |
| Attribute | Value |
|---|---|
| Architecture | XGBoost (ONNX format) |
| Features | 17 user behavior and profile characteristics |
| Performance | F1 = 0.832 |
| Purpose | Enrich dataset with bot likelihood scores |
| Attribute | Value |
|---|---|
| Total Tweets | 5,884,624 (raw) → 3,814,679 (regular subset) |
| Time Period | January 1, 2011 - December 31, 2022 |
| Language | Turkish |
| Data Source | Twitter Academic API |
| Processing | Cleaned, deduplicated, enriched with bot/duplicate flags |
📋 Click to view complete data schema (26 fields)
Important: Fields marked ❌ Confidential are retained only for internal compliance and are not distributed.
| Field | Type | Description | Availability |
|---|---|---|---|
created_at |
datetime | Tweet creation timestamp | ✅ Available |
tweet_location |
string | Geographic location (if available) | ✅ Available |
text |
string | Tweet content (Turkish) | ✅ Available |
retweets |
int | Number of retweets | ✅ Available |
replies |
int | Number of replies | ✅ Available |
likes |
int | Number of likes | ✅ Available |
quote_count |
int | Number of quote tweets | ✅ Available |
author_id |
string | Anonymized author identifier | ❌ Confidential |
username |
string | Author username | ❌ Confidential |
name |
string | Author display name | ❌ Confidential |
author_pic |
string | Profile picture URL | ❌ Confidential |
author_followers |
int | Follower count | ❌ Confidential |
author_listed |
int | Listed count | ❌ Confidential |
author_following |
int | Following count | ❌ Confidential |
author_tweets |
int | Total tweet count | ❌ Confidential |
author_protected |
boolean | Protected account status | ❌ Confidential |
author_entities |
json | Profile entities | ❌ Confidential |
author_description |
string | Profile bio | ❌ Confidential |
author_verified |
boolean | Verification status | ❌ Confidential |
author_created_at |
datetime | Account creation date | ❌ Confidential |
author_withheld |
string | Withheld status | ❌ Confidential |
author_location |
string | Author location | ❌ Confidential |
is_duplicate |
boolean | Exact duplicate flag | ✅ Available |
bot_prob |
float | Bot probability score (0-1) | ✅ Available |
is_bot |
boolean | Bot likelihood flag | ✅ Available |
all_classes_results |
json | AI model predictions | ✅ Available |
# Install dependencies
pip install -r requirements.txt
# 8-class Granular model (recommended)
python run_inference.py --text "Mültecilere vatandaşlık verilmesin" --model-type all-classes
# 6-class Super-Class model (experimental)
python run_inference.py --text "Mültecilere vatandaşlık verilmesin" --model-type super-classes
# Run on CPU
python run_inference.py --text "Mültecilere vatandaşlık verilmesin" --device cpupython run_bot_detection.py --features example_user_data.jsonExplore temporal dynamics of migration discourse: View Interactive Charts
- 📊 Pan, zoom, and hover for detailed data points
- 📅 12-year temporal coverage (2011-2022)
- 💾 Export charts as PNG
Migration-TR/
├── trained_models/
│ ├── perception_attitude_clf_super_classes/ # 6-class model weights
│ ├── perception_attitude_all_classes/ # 8-class model weights
│ └── bot_clf/ # Bot detection model
├── docs/ # GitHub Pages site
│ ├── index.html # Main visualization page
│ └── assets/plots/ # Interactive Plotly charts
├── run_inference.py # Classification inference script
├── run_bot_detection.py # Bot detection script
├── example_user_data.json # Sample bot detection input
├── requirements.txt # Python dependencies
└── DATA_USE_AGREEMENT.md # Data use agreement
Who Can Access:
- Academic Researchers at accredited institutions
- Graduate Students with supervisor approval
- Policy Researchers at recognized organizations
- Non-commercial use only - no commercial applications
Not Permitted:
- Commercial use or monetization
- Surveillance or tracking applications
- Attempts to re-identify users
- Redistribution of raw tweet text
Read our comprehensive Data Use Agreement carefully.
Email your signed DUA to: info@csstr.org
Include:
- Your institutional affiliation
- Research purpose and methodology
- Specific data requirements (which specific data chunk you need: From Chunk-1 to Chunk-11770)
- Supervisor information (for students)
- We review within 5 business days
- Approved users receive secure download links
- Data delivered as password-protected archives
- Manual delivery: maximum 500 hydrated objects per recipient per day (non-automated delivery only)
Data Delivery Policy: Due to X.com (formerly Twitter) Developer Policy requirements, we manually deliver:
- Maximum 500 hydrated tweets per recipient per day (non-automated delivery via email/SFTP)
- Multiple researchers can receive data simultaneously (500 objects per person per day)
- Academic use only
- No public redistribution of full tweet text allowed
- 24-hour deletion compliance: CSSTR monitors X Compliance API and will inform recipients; you must delete or mask affected tweets within 24 hours
Legal Framework: This dataset complies with:
- X.com Developer Agreement (current version)
- Turkish data protection laws
- GDPR requirements for research
- Academic research ethics standards
If you use Migration-TR in your research, please cite:
@article{yilmazpolat_migration_2025,
title={Migration Discourses on X.com: Analysis of Public Perceptions and
Attitudes Toward Refugees in Turkey Using Natural
Language Processing},
author={Yılmaz Polat, Evrim and Çağın Polat, Evrim},
journal={[Under Review]},
year={2025},
note={Dataset available at: https://github.com/cssturkiye/Migration-TR}
}Paper Status: Currently under peer review. Citation will be updated upon acceptance.
Our classification model is built upon TurkishBERTweet by VRLLab (Najafi & Varol, 2024). We thank the authors for making their work available.
- Base Model: VRLLab/TurkishBERTweet (894M Turkish tweets, 163M parameters)
Evrim Yılmaz Polat, PhD - Corresponding Author
Department of Sociology, Zonguldak Bülent Ecevit University
Evrim Çağın Polat - Co-Author
Notrino Research, ODTÜ Teknokent, Ankara
📧 Email: info@csstr.org
🏛️ Organization: Computational Social Sciences Turkey (CSSTR)
Migration-TR | Advancing Migration Research Through Computational Social Science
