Variational Thompson Sampling for Relational Recurrent Bandits

Sylvain Lamprier; Thibault Gisselbrecht; Patrick Gallinari

doi:10.1007/978-3-319-71246-8_25

Communication Dans Un Congrès Année : 2017

Variational Thompson Sampling for Relational Recurrent Bandits

(1) , (2) , (1)

1
2

Sylvain Lamprier

Fonction : Auteur
PersonId : 740402
IdHAL : sylvain-lamprier
ORCID : 0000-0002-2508-922X
IdRef : 142632201

Machine Learning and Information Access

Thibault Gisselbrecht

Fonction : Auteur
PersonId : 987816

IRT SystemX

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Access

Résumé

In this paper, we introduce a novel non-stationary bandit setting, called relational recurrent bandit, where rewards of arms at successive time steps are interdependent. The aim is to discover temporal and structural dependencies between arms in order to maximize the cumulative collected reward. Two algorithms are proposed: the first one directly models temporal dependencies between arms, as the second one assumes the existence of hidden states of the system behind the observed rewards. For both approaches, we develop a Variational Thompson Sampling method, which approximates distributions via variational inference, and uses the estimated distributions to sample reward expectations at each iteration of the process. Experiments conducted on both synthetic and real data demonstrate the effectiveness of our approaches.

Domaines

Intelligence artificielle [cs.AI]

Sylvain Lamprier : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-02075008

Soumis le : jeudi 21 mars 2019-10:13:04

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-02075008 , version 1 (21-03-2019)

Identifiants

HAL Id : hal-02075008 , version 1
DOI : 10.1007/978-3-319-71246-8_25

Citer

Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari. Variational Thompson Sampling for Relational Recurrent Bandits. Joint European Conference on Machine Learning and Knowledge Discovery in Databases - ECML/PKDD 2017, Sep 2017, Skopje, Macedonia. pp.405-421, ⟨10.1007/978-3-319-71246-8_25⟩. ⟨hal-02075008⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 IRT-SYSTEMX SORBONNE-UNIVERSITE SU-SCIENCES

92 Consultations

0 Téléchargements

Variational Thompson Sampling for Relational Recurrent Bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager