Variational Thompson Sampling for Relational Recurrent Bandits

Abstract : In this paper, we introduce a novel non-stationary bandit setting, called relational recurrent bandit, where rewards of arms at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: the first one directly models temporal dependencies between arms, as the second one assumes the existence of hidden states of the system behind the observed rewards. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.
Document type :
Conference papers
Complete list of metadatas

https://hal.sorbonne-universite.fr/hal-02075008
Contributor : Sylvain Lamprier <>
Submitted on : Thursday, March 21, 2019 - 10:13:04 AM
Last modification on : Saturday, March 23, 2019 - 1:30:33 AM

Identifiers

Citation

Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari. Variational Thompson Sampling for Relational Recurrent Bandits. Joint European Conference on Machine Learning and Knowledge Discovery in Databases - ECML/PKDD 2017, Sep 2017, Skopje, Macedonia. pp.405-421, ⟨10.1007/978-3-319-71246-8_25⟩. ⟨hal-02075008⟩

Share

Metrics

Record views

23