Statistical Hypothesis Testing for Auditing Robustness in Language Models

This is the official implementation of Statistical Hypothesis Testing for Auditing Robustness in Language Models (arxiv). Our paper proposes a new method for detecting and quantifying perturbations in large language model (LLM) responses.

Overview

This repository includes the necessary code, prompts, and collected responses to reproduce the experiments and results presented in the NeurIPS workshop SFLLM paper.

Installation

pip install -e .

You should also make sure you have the packages in requirements.txt.

Prerequisites

To use this repository, you will need an API to access the LLM. The simplest way to set this up is to create a Python file named src/dbpa/utils/llm_config.py with the following structure:

def get_llm_config():
    llm_config = dict(
        api_key = api_key,
        api_version = api_version,
        api_endpoint = api_endpoint,
        model_deployment_id = model_deployment_id
    )
    return llm_config

def get_embedding_config():
    ada_config = dict(
        api_key = api_key,
        api_version = api_version,
        api_endpoint = api_endpoint,
        embedding_model_deployment_id = embedding_model_deployment_id
    )
    return ada_config

Replace the placeholders api_key, api_version, api_endpoint, model_deployment_id, embedding_model_deployment_id with your actual configuration values.

Repository Structure

The core component of the repository is the src folder. Under src, there are three subcategories:

data contains the necessary code to generate data for the experiments. For the purpose of this paper, we generate synthetic sentences, and focus on perturbing the immediate neighbors to each word in the sentence.
model contains the core code to calculate distance between the original response and the perturbed response. For this paper, we provide two methods to approximate the distance---JSD and energy distance.
utils contains the code to setup and query LLM + embedding models.

Additionally, exp contains all the sampled LLM responses for the experiments in the paper. All the raw LLM responses are stored under the responses folder, and the processed responses are stored under the scores folder, which calculates the distance between the original response and the perturbed response.

Quick Start - Command Line Interface

For quick text perturbation analysis, you can use our command-line interface:

Basic Usage

# Analyze a simple text perturbation
python src/dbpa/interface.py --text "My age is 45 and I am male. What is my life expectancy?" --change "age is 45" "age is 55"

# Multiple changes at once
python src/dbpa/interface.py --text "I am 30 years old and live in NYC" --change "30 years old" "40 years old" "NYC" "LA"

# Get detailed output with interpretation
python src/dbpa/interface.py --text "Hello world" --change "Hello" "Hi" --verbose

Advanced Usage

# Use Jensen-Shannon divergence instead of energy distance
python src/dbpa/interface.py --text "Test text" --change "Test" "Modified" --method jsd

# Customize distance metric and number of permutations
python src/dbpa/interface.py --text "Sample text" --change "Sample" "Example" --distance l2 --permutations 1000

# Output results in JSON format
python src/dbpa/interface.py --text "Hello world" --change "Hello" "Hi" --output-format json

Using JSON Configuration Files

For complex perturbations, create a JSON file with your changes:

# Create an example configuration file
python src/dbpa/interface.py --create-example

# Use the configuration file
python src/dbpa/interface.py --text "My age is 45 and I am male. What is my life expectancy?" --change-file example_changes.json

Example JSON format:

{
  "age is 45": "age is 55",
  "male": "female", 
  "life expectancy": "retirement age"
}

Command Line Options

Option	Description	Default
`--text`	Original text to analyze	Required
`--change`	Pairs of original and replacement phrases	-
`--change-file`	JSON file containing changes	-
`--method`	Statistical method (`energy` or `jsd`)	`energy`
`--distance`	Distance metric (`cosine`, `l1`, `l2`)	`cosine`
`--permutations`	Number of permutations for testing	`500`
`--output-format`	Output format (`plain` or `json`)	`plain`
`--verbose`	Show detailed output and interpretation	`false`

Running the Full Experiments

Prerequisites

Setup API Configuration: Ensure you can run src/utils/openai_config.py and src/utils/setup_llm.py before proceeding.
Install Dependencies: Make sure all requirements are installed via pip install -e .

Experiment Execution

Full Experiment Pipeline

If you wish to run the entire experiment including LLM generation:

cd exp/SFLLM
python experiment_file.py

This will:

Generate new LLM responses
Calculate perturbation distances
Generate plots and analysis

Note that experiments are correspondingly named to match the paper, specifically Table 2

Programmatic Usage

You can also use the core functionality directly in your Python code:

from dbpa.model.core import quantify_perturbations

# Basic usage
text_orig = 'My age is 45 and I am male. What is my life expectancy?'
change = {'age is 45': 'age is 55'}

statistic, p_value = quantify_perturbations(text_orig, change)
print(f"Statistic: {statistic:.6f}, P-value: {p_value:.6f}")

# Advanced usage with custom parameters
statistic, p_value = quantify_perturbations(
    text_orig, 
    change, 
    method='jsd',  # or 'energy'
    distance='l2',  # for energy method
    num_permutations=1000
)

Contributing

We welcome contributions! Please ensure your code follows the existing style and includes appropriate tests.

Citation

@article{rauba2025statistical,
  title={Statistical Hypothesis Testing for Auditing Robustness in Language Models},
  author={Rauba, Paulius and Wei, Qiyao and van der Schaar, Mihaela},
  journal={arXiv preprint arXiv:2506.07947},
  year={2025}
}

Support

For questions or issues, please open a GitHub issue or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
exps/SFLLM		exps/SFLLM
src/dbpa		src/dbpa
.coveragerc		.coveragerc
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
AUTHORS.rst		AUTHORS.rst
CHANGELOG.rst		CHANGELOG.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.txt		LICENSE.txt
PUBLIC_MODELS_EXAMPLES.md		PUBLIC_MODELS_EXAMPLES.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test_public_models.py		test_public_models.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistical Hypothesis Testing for Auditing Robustness in Language Models

Overview

Installation

Prerequisites

Repository Structure

Quick Start - Command Line Interface

Basic Usage

Advanced Usage

Using JSON Configuration Files

Command Line Options

Running the Full Experiments

Prerequisites

Experiment Execution

Full Experiment Pipeline

Programmatic Usage

Contributing

Citation

Support

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

vanderschaarlab/dbpa

Folders and files

Latest commit

History

Repository files navigation

Statistical Hypothesis Testing for Auditing Robustness in Language Models

Overview

Installation

Prerequisites

Repository Structure

Quick Start - Command Line Interface

Basic Usage

Advanced Usage

Using JSON Configuration Files

Command Line Options

Running the Full Experiments

Prerequisites

Experiment Execution

Full Experiment Pipeline

Programmatic Usage

Contributing

Citation

Support

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages