Implementation of the algorithms introduced in "Multi-Player Bandits Revisited" [1]. This project was done as part of "Sequential Decision Making" course taught by Émilie Kaufmann.
Warning – This "toy"-repository does not intend to collect the state-of-the-art multi-player multi-armed bandits (MAB) algorithms! We highly recommend you to take a look at the SMPyBandits library if you want to play in a larger sandbox!
The strategies currently implemented are:
RandTopMMCTopMSelfish
Simply clone the project
git clone https://github.com/JulienNonin/multiplayer-bandits
cd multiplayer-bandits
The project runs on Python 3.6+ and only require NumPy and Matplotlib.
import matplotlib.pyplot as plt
import src as mpmab
# Define a MAB bandit environment
bandit = mpmab.BernoulliMAB([0.3, 0.4, 0.5, 0.6, 0.8])
nb_arms = bandit.nb_arms
# Define a strategy with nb_players players
nb_players = 3
strategy = mpmab.strategy(mpmab.PlayerRandTop, bandit.nb_arms, nb_players, policy=mpmab.UCB1Policy(0.5))
# Run the strategy on the bandit environment
my_experiment = mpmab.MultiplayerExp(bandit, strategy, time_horizon=200).run()
my_experiment.rewards # Rewards historyIn tutorial.ipynb you can find a more complete presentation of the module and its usage. For example, you will be able to produce a nice animation of the history of selections and collisions, like this one below:
Some experiments are carried out in multi-player-simulation.ipynb to test the strategies on different problems, and reproduce some results of [1] (cf Section "Illustrations" for the impatient ones).
As "algorithmically" defined in the original paper [1], it seems that the RandTopM players do not change arm when a collision occurs, but only if the arm currently chosen becomes "sub-optimal". This seems to contradict what the authors state about this algorithm. You can find an implementation of the algorithms as originally stated under PlayerRandTopOld. This strategy suffers linear regret. Therefore our own interpretation of what the strategy should be, can be found in PlayerRandTop.
- Propose a fix to the algorithm
RandTopMas introduced in [1] - Implement
RhoRand - Make the setting and display of experiences more user-friendly
[1] L. Besson & É. Kaufmann, Multi-Player Bandits Revisited. In Proc. of Algorithmic Learning Theory, 2018. http://proceedings.mlr.press/v83/besson18a.html.


