Paper
22 April 2022 Frequentist multi-armed bandits for complex reward models
Haoran Chen, Weibing Deng, Keqin Liu, Ting Wu
Author Affiliations +
Proceedings Volume 12163, International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021); 121631P (2022) https://doi.org/10.1117/12.2627482
Event: International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021), 2021, Nanjing, China
Abstract
In the classical Multi-Armed Bandit (MAB) problem, a player selects one out of a set of arms to play at each time, without knowing the reward models. At each time, the player selects one arm to play, aiming to maximize the total expected reward over 𝑇 times. The regret of an algorithm is the expected total loss after 𝑇 steps compared to the ideal scenario of knowing reward models. When the distributions of the arm reward are heavy-tailed, it is difficult to learn which arm has the best reward. In this paper, we introduce an algorithm based on the idea of Upper Confidence Bound (UCB) and prove that the algorithm achieves a sublinear growth of regret for heavy-tailed reward distributions. Furthermore, we consider MAB with gap periods as a dynamic model requiring that the arm will get into a gap period without offering reward immediately after being played. This model finds a broad application area in Internet advertising. Clearly the player should avoid to choose an arm until it gets out of the gap period. We extend the algorithm framework of Deterministic Sequencing of Exploration and Exploitation (DSEE) to the MAB model with gap periods, with regret reaching a growth of optimal order 𝑂ሺlog 𝑇ሻ for light-tailed distributions and a sublinear growth the for the heavy-tailed.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Haoran Chen, Weibing Deng, Keqin Liu, and Ting Wu "Frequentist multi-armed bandits for complex reward models", Proc. SPIE 12163, International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021), 121631P (22 April 2022); https://doi.org/10.1117/12.2627482
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Statistical modeling

Systems modeling

Internet

Performance modeling

Numerical analysis

Statistical analysis

Back to Top