Multi-armed bandits
A multi-armed bandit is a sequential decision problem where a learner repeatedly chooses among k actions (arms), observes a stochastic reward for the chosen arm only, and adapts future choices to balance exploration (sampling under-tested arms to learn their value) against exploitation (sampling the...
May 18, 2026