Skip to content

Data journalism investigation analyzing 2,321 French media articles on sexual violence (2018-2024). NLP analysis reveals 47% contain victim-blaming language despite tripled coverage post-#MeToo. Examines Le Monde, Le Figaro, Libération, 20 Minutes, France Info.

Notifications You must be signed in to change notification settings

eloiseelle/french-media-sexual-violence-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Framing Violence: Analyzing French Media Coverage of Sexual Violence (2018-2024)

A data journalism investigation analyzing how five major French media outlets covered sexual violence over seven years, revealing persistent problematic patterns despite increased attention post-#MeToo.

header

Python License Articles

Executive Summary

This investigation analyzes 2,321 articles published between January 2018 and December 2024 across five major French news outlets: Le Monde, Le Figaro, Libération, 20 Minutes, and France Info.

Key findings:

  • Coverage of sexual violence tripled between 2018 and 2024.
  • 47.5% of articles contain victim-blaming language or euphemistic framing.
  • Political orientation does not significantly affect problematic framing rates.
  • Media coverage disproportionately emphasizes convictions compared to high dismissal rates in judicial reality.
  • Elite celebrity cases dominate coverage patterns.

The full methodology, dataset structure, and reproducible code are provided in this repository.

Key Findings

Coverage Explosion

Year Articles Change
2018 227 Baseline (MeToo wave)
2020 108 -52% (COVID impact)
2024 639 +181% vs 2018

The 2024 surge was driven by actress Judith Godrèche's testimony and renewed attention to the cinema industry.

The Victim-Blaming Paradox

Surface level: 80.7% of articles use victim-centered framing Reality: 47.5% contain victim-blaming language

This contradiction reveals that even well-intentioned coverage perpetuates harmful narratives through word choice, hedging language, and implicit doubt.

By outlet:

Outlet Victim-Blaming % Political Leaning
Libération 52.8% Left-wing
20 Minutes 49.2% Centrist
Le Figaro 46.3% Right-wing
Le Monde 45.1% Center-left
France Info 44.8% Public service

Notable: The most progressive outlet (Libération) has the highest rate of victim-blaming language.

chart 2 victim blaming by outlet

Sentiment Analysis

Overall average polarity: -0.70 (scale: -1 to +1)

Year Polarity Interpretation
2019 -0.61 Post-MeToo optimism
2022 -0.75 Peak negativity
2024 -0.68 Slight recovery

Coverage became more negative over time, suggesting growing frustration with slow systemic change.

Justice System Misrepresentation

Metric Media Coverage Reality
Convictions mentioned 41.6% ~10% of cases
Dismissals mentioned 4.9% ~80% of cases
Ratio 0.12 8.0

Media over-reports convictions by 8x relative to dismissals, creating a false impression of justice system effectiveness.

chart 3 justice illusion

Types of Violence Covered

Type % of Articles
Rape 65.7%
Sexual Assault 58.6%
Harassment 45.0%
Cyber Violence 19.7%
Incest 16.1%
Child Sexual Abuse 7.5%

Institutional Focus

Institution Coverage %
Cinema/Entertainment 76.0%
Family/Domestic 70.3%
Politics 44.6%
Workplace 36.8%
Catholic Church 23.5%

Notable gap: The Catholic Church abuse scandal is significantly under-covered relative to its scale.

Positive Evolution: Terminology

"Pédophilie" vs "Pédocriminalité" (problematic vs correct terminology):

Year Ratio Interpretation
2018 53.0 Only problematic term used
2022 0.77 Correct term dominates
2024 0.79 Progress sustained

This demonstrates that media can learn and improve when advocacy efforts raise awareness.

High-Profile Case Dominance

Person/Case Mentions Context
Harvey Weinstein 340 International benchmark
Georges Tron 303 French politician
Richard Berry 292 Actor, incest accusation
Gabriel Matzneff 237 Writer, pedophilia
Judith Godrèche 173 Actress, 2024 surge
Gérard Depardieu 177 Actor, multiple accusations

Coverage heavily favors elite perpetrators over systemic analysis affecting ordinary people.

Methodology

Data Collection

  • Manual URL collection: 2,321 article URLs gathered from five outlets
  • Sources: Le Monde, Le Figaro, Libération, 20 Minutes, France Info
  • Web scraping: Python with newspaper4k and BeautifulSoup
  • Time period: January 2018 – December 2024

Sources Analyzed

  1. Le Monde - Center-left, reference newspaper
  2. Le Figaro - Right-wing, oldest national daily
  3. Libération - Left-wing, progressive
  4. 20 Minutes - Free daily, centrist
  5. France Info - Public broadcaster

Analysis Techniques

  • Sentiment analysis: French-language polarity scoring
  • Keyword detection: Custom dictionaries for:
    • Victim-blaming language (e.g., "allégué," "prétend," "affirme")
    • Euphemisms (e.g., "gestes déplacés," "comportement inapproprié")
    • Correct vs problematic terminology
  • Entity extraction: Named entity recognition for people, institutions
  • Temporal analysis: Trends over time
  • Cross-outlet comparison: Statistical comparison between sources

Limitations

  • Paywall impact: Some outlets had partial paywalls affecting article completeness
  • Date extraction: 19.5% of articles lacked extractable publication dates
  • Author attribution: 82% lacked identifiable bylines
  • Sample bias: Limited to five major outlets; regional press not included

Technical Stack

  • Python 3.10+
  • newspaper4k - Article extraction
  • BeautifulSoup - HTML parsing
  • pandas - Data manipulation
  • matplotlib/seaborn - Visualization
  • Jupyter notebooks - Analysis workflow

Repository Structure

french-media-sexual-violence-analysis/
├── README.md
├── LICENSE
├── requirements.txt
├── french_media_coverage_analysis.ipynb   # Complete analysis notebook
├── data/
│   ├── raw/                               # Original curated URLs
│   └── processed/                         # Final analyzed dataset
├── visualizations/                        # Generated charts
└── docs/
    └── methodology.md                     # Detailed methodology

Installation

# Clone the repository
git clone https://github.com/yourusername/french-media-sexual-violence-analysis.git
cd french-media-sexual-violence-analysis

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Usage

# Open the analysis notebook
jupyter notebook french_media_coverage_analysis.ipynb

Key Conclusions

  1. Quantity ≠ Quality: Tripled coverage hasn't eliminated problematic framing
  2. Progressive paradox: Left-leaning outlets aren't immune to victim-blaming
  3. Justice distortion: Media creates false impression of system effectiveness
  4. Elite focus: Celebrity cases overshadow systemic issues
  5. Language evolves: Terminology can improve with sustained advocacy

Recommendations for French Media

  1. Establish editorial guidelines flagging victim-blaming language
  2. Train journalists on trauma-informed reporting
  3. Balance justice coverage by reporting dismissal rates
  4. Diversify sources beyond high-profile cases
  5. Cover systemic failures, not just individual incidents

Ethical Considerations

  • All data sourced from publicly available articles
  • No victim identification information included
  • Analysis focuses on media framing, not case details
  • Project aims to improve media practices, not shame individual journalists

Author

Eloise Bouton Data Journalist | Senior Journalist (15 years) | Junior Data Scientist

  • Combining journalism expertise with data science skills
  • Focus: Social justice, women's rights, LGBTQ+ issues
  • Works remotely

License

This project is licensed under the MIT License - see LICENSE file for details.

Acknowledgments

  • French media outlets for public access to archives
  • #MeToo movement for catalyzing this conversation
  • Data science bootcamp instructors and peers


Data Availability & Ethics

This repository includes article metadata (publication date, outlet, URL) and processed textual features used for analysis (e.g., keyword frequencies, framing classifications, named entity counts).

Full copyrighted article texts are not redistributed. Raw article content was accessed for research and analysis purposes only and is not publicly shared in this repository.

Automated classifications (e.g., victim-blaming language detection, framing patterns, terminology usage) rely on pattern-based natural language processing methods and may not capture full contextual nuance. To assess reliability, a random sample of 100 articles was manually reviewed to validate automated classifications.

Justice system statistics referenced in this project are sourced from:

Ministère de la Justice, Violences sexuelles et atteintes aux mœurs : les décisions du parquet et de l'instruction,
Infostat Justice n°160, March 2018.

These figures refer to complaints processed by prosecutors and include cases dismissed before trial, primarily for insufficiently characterized offences.


Citation

If you reference this project in research, journalism, or academic work, please cite:

Bouton, Eloise (2025).
French Media Coverage of Sexual Violence (2018–2024): A Computational Analysis of 2,321 Articles.
GitHub repository: https://github.com/eloiseelle/french-media-sexual-violence-analysis


About the Author

Eloise Bouton is a freelance journalist specializing in media analysis and gender issues.

Website: https://eloisebouton.com/

For commissions, collaborations, or data inquiries, please open an issue on this repository or contact directly.


About

Data journalism investigation analyzing 2,321 French media articles on sexual violence (2018-2024). NLP analysis reveals 47% contain victim-blaming language despite tripled coverage post-#MeToo. Examines Le Monde, Le Figaro, Libération, 20 Minutes, France Info.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published