Diversity from Human feedback

The official implementation of Diversity from Human Feedback.

Abstract

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that the behavior learned by DivHF is much more consistent with human requirements than the one learned by direct data-driven approaches without human feedback, and makes the final solutions more diverse under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

Requirements

The implementation is built in conda.

The environment can be built with

bash setup.sh

Running Experiments

Use the following commands to run DivHF:

conda activate divhf
python -m src framework=... domain=... qdbase=... emitter=... seed=... <other configs>

You can replace each of the five parameters (i.e., framework, domain, qdbase, emitter, and seed) to evaluate different methods on different domains with different seeds. For example, to run DivHF on walker2d_uni, you can run the following command:

python3 -m src framework=DivHF domain=walker2d_uni qdbase=AURORA emitter=std seed=1 task=offline-training-only task.path=logs/ant_uni/AURORA/std/1 domain.total_evals=100000

Citation

If you find this work useful in your research, please consider citing our paper.

@article{divhf,
    author = {Ren-Jian Wang and Ke Xue and Yu-Tong Wang and Peng Yang and Hao-Bo Fu and Qiang Fu and Chao Qian},
    title = {Diversity from Human Feedback},
    journal = {Frontiers of Computer Science},
    year = {2026},
    volume = {20},
    number = {2},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
config		config
external		external
src		src
.flake8		.flake8
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diversity from Human feedback

Abstract

Requirements

Running Experiments

Citation

About

Uh oh!

Releases

Packages

Languages

License

lamda-bbo/DivHF

Folders and files

Latest commit

History

Repository files navigation

Diversity from Human feedback

Abstract

Requirements

Running Experiments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages