Implementations of PF-LEX, OM-LEX, and NOM-LEX for the multi-armed bandit with lexicographically ordered objectives (Lex-MAB).
In all implementations:
Ais the number of arms,Dis the number of objectives,meansis an AxD matrix, wheremeans[a,i]is the expected reward of arm a in objective i, and arm 0 is assumed to be lexicographic optimal,Kis the number of individual runs,Tis the number of rounds,regis a DxKxT matrix, wherereg[i,k,t]is the regret incurred in objective i, individual run k, and round t.
In the implementation of PF-LEX:
dltis the confidence term,epsis proportional to the suboptimality that the learner aims to tolerate,TTis the period in which the regrets are recorded. Note that this script takes an argument calleduid, which is a unique identifier with which the final results are saved. This is to allow for parallel execution of the script.
In the implementation of NOM-LEX:
etais a 1xD matrix, whereeta[0,i]is the near lexicographic optimal expected reward in objective i.
In addition to algorithms for Lex-MAB, 'satisficing.py' includes an adapted version of NOM-LEX for the multi-armed bandit with satisficing objectives (Sat-MAB), along with an implementation of Satisficing-In-Mean-Rewards UCL (Reverdy et al., "Satisficing in multi-armed bandit problems," 2017).