Skip to content

copenlu/cultural-name-bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Studying Name Bias and Cultural Presumptions in Language Models

This repository contains the scripts and resources used for the research paper "Presumed Cultural Identity: How Names Shape LLM Responses". The goal of this project is to investigate how language models (LLMs) respond to information-seeking questions based on the perceived cultural identity of the person asking the question, as inferred from their name.

Main Components

  1. Data Preparation:

    • generate-responses-named.py, openai-gen-name-in-prompt.py: Scripts for generating LLM responses to information-seeking questions, based on the person's name.
    • data/: Contains the names data and filtered CANDEL data used in the study.
    • open_ended_qa_cleaned.py: Defines the open-ended questions used for testing name bias.
  2. Bias Evaluation:

    • assert-processing/: Scripts for processing and filtering the CANDLE dataset, and creating the questions.
    • bias-candle-assertion-based.py: Script for calculating bias in LLM responses based on assertions.
    • bias-llm-as-judge.py: Script for calculating bias in LLM responses using the LLM as a judge.
  3. Result Aggregation:

    • aggregated-stats-unorganized/: [Currently unorganized] scripts for aggregating the results of the bias evaluation.

Usage

To use the scripts in this repository, you will need to have the necessary dependencies installed, including Python libraries such as pandas, openai, tqdm, and others. Refer to the project's requirements or the individual script files for the specific dependencies.

The main entry points are the generate-responses-named.py and openai-gen-name-in-prompt.py scripts, which generate the LLM responses based on the person's name. The other scripts handle the bias evaluation and result aggregation.

Please note that the repository is currently in an unorganized state, and the scripts may require some cleanup and refactoring to make them more user-friendly. We are working on improving the organization and documentation of the codebase.

Contributing

If you would like to contribute to this project, please feel free to submit a pull request or open an issue. We welcome any suggestions or improvements to the codebase or the research methodology.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •