Studying Name Bias and Cultural Presumptions in Language Models

This repository contains the scripts and resources used for the research paper "Presumed Cultural Identity: How Names Shape LLM Responses". The goal of this project is to investigate how language models (LLMs) respond to information-seeking questions based on the perceived cultural identity of the person asking the question, as inferred from their name.

Main Components

Data Preparation:
- generate-responses-named.py, openai-gen-name-in-prompt.py: Scripts for generating LLM responses to information-seeking questions, based on the person's name.
- data/: Contains the names data and filtered CANDEL data used in the study.
- open_ended_qa_cleaned.py: Defines the open-ended questions used for testing name bias.
Bias Evaluation:
- assert-processing/: Scripts for processing and filtering the CANDLE dataset, and creating the questions.
- bias-candle-assertion-based.py: Script for calculating bias in LLM responses based on assertions.
- bias-llm-as-judge.py: Script for calculating bias in LLM responses using the LLM as a judge.
Result Aggregation:
- aggregated-stats-unorganized/: [Currently unorganized] scripts for aggregating the results of the bias evaluation.

Usage

To use the scripts in this repository, you will need to have the necessary dependencies installed, including Python libraries such as pandas, openai, tqdm, and others. Refer to the project's requirements or the individual script files for the specific dependencies.

The main entry points are the generate-responses-named.py and openai-gen-name-in-prompt.py scripts, which generate the LLM responses based on the person's name. The other scripts handle the bias evaluation and result aggregation.

Please note that the repository is currently in an unorganized state, and the scripts may require some cleanup and refactoring to make them more user-friendly. We are working on improving the organization and documentation of the codebase.

Contributing

If you would like to contribute to this project, please feel free to submit a pull request or open an issue. We welcome any suggestions or improvements to the codebase or the research methodology.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.trunk		.trunk
aggregated-stats-unorganized		aggregated-stats-unorganized
assert-processing/src		assert-processing/src
data		data
img-scripts		img-scripts
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
bias-candle-assertion-based.py		bias-candle-assertion-based.py
bias-llm-as-judge.py		bias-llm-as-judge.py
bias-oai-llm-as-judge.py		bias-oai-llm-as-judge.py
count-num-names-fb.py		count-num-names-fb.py
country_codes.py		country_codes.py
generate-responses-named.py		generate-responses-named.py
inter-intra-continent-bias.py		inter-intra-continent-bias.py
name-eval-pipeline-withoutnames.py		name-eval-pipeline-withoutnames.py
open_ended_qa_cleaned.py		open_ended_qa_cleaned.py
openai-gen-countries-in-prompt.py		openai-gen-countries-in-prompt.py
openai-gen-name-in-prompt.py		openai-gen-name-in-prompt.py
prompt_updated.py		prompt_updated.py
prompts.py		prompts.py
run_batch-cluster.sh		run_batch-cluster.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Studying Name Bias and Cultural Presumptions in Language Models

Main Components

Usage

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

copenlu/cultural-name-bias

Folders and files

Latest commit

History

Repository files navigation

Studying Name Bias and Cultural Presumptions in Language Models

Main Components

Usage

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages