SSA — Social Stereotype Bias Analysis — is a counterfactual explanation approach designed to assess social bias in hate speech classifiers through the use of stereotypes and counter-stereotypes. SSA evaluates the extent to which hate speech classifiers reflect social stereotypes by contrasting stereotypical beliefs with their counter-stereotypical counterparts. We empirically measure the presence of stereotypical bias in hate speech classifiers by analyzing how they classify tuples containing stereotypes versus counter-stereotypes. Experimental results show that hate speech classifiers tend to attribute unrealistic or unwarranted offensiveness to social group identifiers (e.g., women, gay, etc.), thereby reflecting and reinforcing stereotypical beliefs about minorities.
Please cite our paper if you use the SSA:
@inproceedings{vargas-etal-2023-socially,
title = "Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?",
author = {Vargas, Francielle and
Carvalho, Isabelle and
H{\"u}rriyeto{\u{g}}lu, Ali and
Pardo, Thiago and
Benevenuto, Fabr{\'\i}cio},
editor = "Mitkov, Ruslan and
Angelova, Galia",
booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing",
year = "2023",
address = "Varna, Bulgaria",
publisher = "INCOMA Ltd., Shoumen, Bulgaria",
url = "https://aclanthology.org/2023.ranlp-1.126",
pages = "1187--1196",
}



