Skip to content

SSA is a counterfactual explanation approach to assess social bias in hate speech classifiers by stereotypes and counter-stereotypes

License

Notifications You must be signed in to change notification settings

franciellevargas/SSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers


SSA — Social Stereotype Bias Analysis — is a counterfactual explanation approach designed to assess social bias in hate speech classifiers through the use of stereotypes and counter-stereotypes. SSA evaluates the extent to which hate speech classifiers reflect social stereotypes by contrasting stereotypical beliefs with their counter-stereotypical counterparts. We empirically measure the presence of stereotypical bias in hate speech classifiers by analyzing how they classify tuples containing stereotypes versus counter-stereotypes. Experimental results show that hate speech classifiers tend to attribute unrealistic or unwarranted offensiveness to social group identifiers (e.g., women, gay, etc.), thereby reflecting and reinforcing stereotypical beliefs about minorities.

SSC-logo-150x71 SSC-logo-150x71

CITING / BIBTEX

Please cite our paper if you use the SSA:

@inproceedings{vargas-etal-2023-socially,
    title = "Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?",
    author = {Vargas, Francielle  and
      Carvalho, Isabelle  and
      H{\"u}rriyeto{\u{g}}lu, Ali  and
      Pardo, Thiago  and
      Benevenuto, Fabr{\'\i}cio},
    editor = "Mitkov, Ruslan  and
      Angelova, Galia",
    booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing",
    year = "2023",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, Bulgaria",
    url = "https://aclanthology.org/2023.ranlp-1.126",
    pages = "1187--1196",
}

FUNDING

SSC-logo-300x171 SSC-logo-300x171 SSC-logo-300x171

About

SSA is a counterfactual explanation approach to assess social bias in hate speech classifiers by stereotypes and counter-stereotypes

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •