Profile-Based Bandit with Unknown Profiles

Sylvain Lamprier; Thibault Gisselbrecht; Patrick Gallinari

Article Dans Une Revue Journal of Machine Learning Research Année : 2018

Profile-Based Bandit with Unknown Profiles

(1) , (2) , (1)

1
2

Sylvain Lamprier

Fonction : Auteur
PersonId : 740402
IdHAL : sylvain-lamprier
ORCID : 0000-0002-2508-922X
IdRef : 142632201

Machine Learning and Information Access

Thibault Gisselbrecht

Fonction : Auteur
PersonId : 1044395

SNIPS

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Access

Résumé

Stochastic bandits have been widely studied since decades. A very large panel of settings have been introduced, some of them for the inclusion of some structure between actions. If actions are associated with feature vectors that underlie their usefulness, the discovery of a mapping parameter between such profiles and rewards can help the exploration process of the bandit strategies. This is the setting studied in this paper, but in our case the action profiles (constant feature vectors) are unknown beforehand. Instead, the agent is only given sample vectors, with mean centered on the true profiles, for a subset of actions at each step of the process. In this new bandit instance, policies have thus to deal with a doubled uncertainty, both on the profile estimators and the reward mapping parameters learned so far. We propose a new algorithm, called \textit{SampLinUCB}, specifically designed for this case. Theoretical convergence guarantees are given for this strategy, according to various profile samples delivery scenarios. Finally, experiments are conducted on both artificial data and a task of focused data capture from online social networks. Obtained results demonstrate the relevance of the approach in various settings.

Domaines

Intelligence artificielle [cs.AI]

Sylvain Lamprier : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-02074907

Soumis le : jeudi 21 mars 2019-09:03:26

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

hal-02074907 , version 1 (21-03-2019)

Identifiants

HAL Id : hal-02074907 , version 1

Citer

Sylvain Lamprier, Thibault Gisselbrecht, Patrick Gallinari. Profile-Based Bandit with Unknown Profiles. Journal of Machine Learning Research, 2018, 19 (53), pp.53:1--53:40. ⟨hal-02074907⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

88 Consultations

0 Téléchargements

Profile-Based Bandit with Unknown Profiles

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager