Linear Bandits in Unknown Environments

Thibault Gisselbrecht; Sylvain Lamprier; Patrick Gallinari

doi:10.1007/978-3-319-46227-1_18

Communication Dans Un Congrès Année : 2016

Linear Bandits in Unknown Environments

(1, 2) , (2) , (2)

1
2

Thibault Gisselbrecht

Fonction : Auteur
PersonId : 987818

IRT SystemX

Machine Learning and Information Access

Sylvain Lamprier

Fonction : Auteur
PersonId : 740402
IdHAL : sylvain-lamprier
ORCID : 0000-0002-2508-922X
IdRef : 142632201

Machine Learning and Information Access

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Access

Résumé

In contextual bandit problems, an agent has to choose an action among a bigger set of available ones at each decision step, according to features observed on them. The goal is to define a decision strategy that maximizes the cumulative reward of actions over time. We focus on the specific case where the features of each action correspond to some kind of a constant profile, which can be used to determine its intrinsic utility for the task in concern. If there exists an unknown linear application that allows rewards to be mapped from profiles, this can be leveraged to greatly improve the exploitation-exploration trade-off of stationary stochastic methods like UCB. In this paper, we consider the case where action profiles are unknown beforehand. Instead, the agent only observes sample vectors, with mean equal to the true profiles, for a subset of actions at each decision step. We propose a new algorithm, called SampLinUCB, and derive a finite time high probability upper bound on its regret. We also provide numerical experiments on a task of focused data capture from online social networks.

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Sylvain Lamprier : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-01357953

Soumis le : mardi 30 août 2016-16:41:25

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01357953 , version 1 (30-08-2016)

Identifiants

HAL Id : hal-01357953 , version 1
DOI : 10.1007/978-3-319-46227-1_18

Citer

Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari. Linear Bandits in Unknown Environments. ECML PKDD 2016 - European Conference on Machine Learning and Knowledge Discovery in Databases, Sep 2016, Riva Del Garda, Italy. pp.282-298, ⟨10.1007/978-3-319-46227-1_18⟩. ⟨hal-01357953⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 IRT-SYSTEMX SORBONNE-UNIVERSITE SU-SCIENCES

226 Consultations

0 Téléchargements

Linear Bandits in Unknown Environments

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager