Skip to Main content Skip to Navigation
New interface
Conference papers

Variational Thompson Sampling for Relational Recurrent Bandits

Abstract : In this paper, we introduce a novel non-stationary bandit setting, called relational recurrent bandit, where rewards of arms at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: the first one directly models temporal dependencies between arms, as the second one assumes the existence of hidden states of the system behind the observed rewards. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.
Document type :
Conference papers
Complete list of metadata
Contributor : Sylvain Lamprier Connect in order to contact the contributor
Submitted on : Thursday, March 21, 2019 - 10:13:04 AM
Last modification on : Tuesday, October 18, 2022 - 8:34:05 AM



Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari. Variational Thompson Sampling for Relational Recurrent Bandits. Joint European Conference on Machine Learning and Knowledge Discovery in Databases - ECML/PKDD 2017, Sep 2017, Skopje, Macedonia. pp.405-421, ⟨10.1007/978-3-319-71246-8_25⟩. ⟨hal-02075008⟩



Record views