GitHub - franciellevargas/SSA: SSA is a counterfactual explanation approach to assess social bias in hate speech classifiers by stereotypes and counter-stereotypes

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers

SSA — Social Stereotype Bias Analysis — is a counterfactual explanation approach designed to assess social bias in hate speech classifiers through the use of stereotypes and counter-stereotypes. SSA evaluates the extent to which hate speech classifiers reflect social stereotypes by contrasting stereotypical beliefs with their counter-stereotypical counterparts. We empirically measure the presence of stereotypical bias in hate speech classifiers by analyzing how they classify tuples containing stereotypes versus counter-stereotypes. Experimental results show that hate speech classifiers tend to attribute unrealistic or unwarranted offensiveness to social group identifiers (e.g., women, gay, etc.), thereby reflecting and reinforcing stereotypical beliefs about minorities.

CITING / BIBTEX

Please cite our paper if you use the SSA:

@inproceedings{vargas-etal-2023-socially,
    title = "Socially Responsible Hate Speech Detection: Can Classifiers Reflect Social Stereotypes?",
    author = {Vargas, Francielle  and
      Carvalho, Isabelle  and
      H{\"u}rriyeto{\u{g}}lu, Ali  and
      Pardo, Thiago  and
      Benevenuto, Fabr{\'\i}cio},
    editor = "Mitkov, Ruslan  and
      Angelova, Galia",
    booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing",
    year = "2023",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, Bulgaria",
    url = "https://aclanthology.org/2023.ranlp-1.126",
    pages = "1187--1196",
}

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
datasets		datasets
models		models
tuples		tuples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers

CITING / BIBTEX

FUNDING

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

License

franciellevargas/SSA

Folders and files

Latest commit

History

Repository files navigation

SSA: A Counterfactual Explanation Approach to Assess Social Bias in Hate Speech Classifiers

CITING / BIBTEX

FUNDING

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Packages