GitHub - doesburg11/PredPreyGrass-pettingzoo-legacy: Legacy code from PettingZoo environment. Centralized learning, de-centralized execution.

Legacy framework: PettingZoo & Stable Baselines3 framework

Centralized training, decentralized evaluation

The MARL environment predpregrass_base.py is implemented using PettingZoo, and the agents are trained using Stable-Baselines3 (SB3) PPO. Essentially this solution demonstrates how SB3 can be adapted for MARL using parallel environments and centralized training. Rewards (stepping, eating, dying and reproducing) are aggregated and can be adjusted in the environment configuration file. Stable Baseline3 is originally designed for single-agent training. This means that in this solution, training utilizes only one unified network for Predators as well Prey. See further below how SB3 PPO is used in this centralilzed trained Predator-Prey-Grass multi-agent setting.

Random policy Predator-Prey-Grass PettingZoo environment

How SB3 PPO is used in the Predator-Prey-Grass Multi-Agent Setting

1. PettingZoo AEC to Parallel Conversion

The environment is initially implemented as an Agent-Environment-Cycle (AEC) environment using PettingZoo (predpregrass_aec.py which inherits from predpregrass_base.py).
It is wrapped and converted into a Parallel Environment using aec_to_parallel() inside trainer.py.
This conversion enables multiple agents to take actions simultaneously rather than sequentially.

2. Treating Multi-Agent Learning as a Single-Agent Problem

SB3 PPO expects a single-agent Gymnasium-style environment.
The converted parallel environment stacks observations and actions for all agents, making it appear as a single large observation-action space.
PPO then treats the multi-agent problem as a centralized learning problem, where all agents share one policy.

3. Performance Optimization with Vectorized Environments

The environment is further wrapped using SuperSuit:

env = ss.pettingzoo_env_to_vec_env_v1(env)
env = ss.concat_vec_envs_v1(env, num_vec_envs, num_cpus=num_cores, base_class="stable_baselines3")

This enables running multiple instances of the environment in parallel, significantly improving training efficiency.
The training process treats the multi-agent setup as a single centralized policy, where PPO learns from the collective experiences of all agents.

Centralized training and decentralized evaluation

Predator-Prey-Grass PettingZoo environment centralized trained using SB3's PPO

Emergent Behaviors

Training the single objective environment predpregrass_base.py with the SB3 PPO algorithm is an example of how elaborate behaviors can emerge from simple rules in agent-based models. In the above displayed MARL example, rewards for learning agents are solely obtained by reproduction. So all other reward options are set to zero in the environment configuration. Despite this relativily sparse reward structure, maximizing these rewards results in elaborate emerging behaviors such as:

Predators hunting Prey
Prey finding and eating grass
Predators hovering around grass to catch Prey
Prey trying to escape Predators

Moreover, these learning behaviors lead to more complex emergent dynamics at the ecosystem level. The trained agents are displaying a classic Lotka–Volterra pattern over time:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
.vscode		.vscode
assets/images		assets/images
src/predpreygrass		src/predpreygrass
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
predpreygrass_env.yml		predpreygrass_env.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Legacy framework: PettingZoo & Stable Baselines3 framework

Centralized training, decentralized evaluation

Random policy with the PettingZoo framework

Training model using PPO from stable baselines3

Configuration environment parameters

Evaluate and visualize trained model

Batch training and evaluating in one go:

How SB3 PPO is used in the Predator-Prey-Grass Multi-Agent Setting

1. PettingZoo AEC to Parallel Conversion

2. Treating Multi-Agent Learning as a Single-Agent Problem

3. Performance Optimization with Vectorized Environments

Centralized training and decentralized evaluation

Emergent Behaviors

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

doesburg11/PredPreyGrass-pettingzoo-legacy

Folders and files

Latest commit

History

Repository files navigation

Legacy framework: PettingZoo & Stable Baselines3 framework

Centralized training, decentralized evaluation

Random policy with the PettingZoo framework

Training model using PPO from stable baselines3

Configuration environment parameters

Evaluate and visualize trained model

Batch training and evaluating in one go:

How SB3 PPO is used in the Predator-Prey-Grass Multi-Agent Setting

1. PettingZoo AEC to Parallel Conversion

2. Treating Multi-Agent Learning as a Single-Agent Problem

3. Performance Optimization with Vectorized Environments

Centralized training and decentralized evaluation

Emergent Behaviors

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages