Variational Thompson Sampling for Relational Recurrent Bandits - Sorbonne Université
Communication Dans Un Congrès Année : 2017

Variational Thompson Sampling for Relational Recurrent Bandits

Sylvain Lamprier
Thibault Gisselbrecht
  • Fonction : Auteur
  • PersonId : 987816

Résumé

In this paper, we introduce a novel non-stationary bandit setting, called relational recurrent bandit, where rewards of arms at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: the first one directly models temporal dependencies between arms, as the second one assumes the existence of hidden states of the system behind the observed rewards. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.
Fichier non déposé

Dates et versions

hal-02075008 , version 1 (21-03-2019)

Identifiants

Citer

Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari. Variational Thompson Sampling for Relational Recurrent Bandits. Joint European Conference on Machine Learning and Knowledge Discovery in Databases - ECML/PKDD 2017, Sep 2017, Skopje, Macedonia. pp.405-421, ⟨10.1007/978-3-319-71246-8_25⟩. ⟨hal-02075008⟩
129 Consultations
0 Téléchargements

Altmetric

Partager

More