Linear Bandits in Unknown Environments

Abstract : In contextual bandit problems, an agent has to choose an action among a bigger set of available ones at each decision step, according to features observed on them. The goal is to define a decision strategy that maximizes the cumulative reward of actions over time. We focus on the specific case where the features of each action correspond to some kind of a constant profile, which can be used to determine its intrinsic utility for the task in concern. If there exists an unknown linear application that allows rewards to be mapped from profiles, this can be leveraged to greatly improve the exploitation-exploration trade-off of stationary stochastic methods like UCB. In this paper, we consider the case where action profiles are unknown beforehand. Instead, the agent only observes sample vectors, with mean equal to the true profiles, for a subset of actions at each decision step. We propose a new algorithm, called SampLinUCB, and derive a finite time high probability upper bound on its regret. We also provide numerical experiments on a task of focused data capture from online social networks.
Type de document :
Communication dans un congrès
ECML PKDD 2016 - European Conference on Machine Learning and Knowledge Discovery in Databases, Sep 2016, Riva Del Garda, Italy. Springer, 9852, pp.282-298, 2016, Lecture Notes in Computer Science. 〈10.1007/978-3-319-46227-1_18〉
Liste complète des métadonnées

https://hal.sorbonne-universite.fr/hal-01357953
Contributeur : Sylvain Lamprier <>
Soumis le : mardi 30 août 2016 - 16:41:25
Dernière modification le : jeudi 22 novembre 2018 - 14:33:02

Identifiants

Citation

Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari. Linear Bandits in Unknown Environments. ECML PKDD 2016 - European Conference on Machine Learning and Knowledge Discovery in Databases, Sep 2016, Riva Del Garda, Italy. Springer, 9852, pp.282-298, 2016, Lecture Notes in Computer Science. 〈10.1007/978-3-319-46227-1_18〉. 〈hal-01357953〉

Partager

Métriques

Consultations de la notice

247