Linear Bandits in Unknown Environments - Sorbonne Université
Communication Dans Un Congrès Année : 2016

Linear Bandits in Unknown Environments

Résumé

In contextual bandit problems, an agent has to choose an action among a bigger set of available ones at each decision step, according to features observed on them. The goal is to define a decision strategy that maximizes the cumulative reward of actions over time. We focus on the specific case where the features of each action correspond to some kind of a constant profile, which can be used to determine its intrinsic utility for the task in concern. If there exists an unknown linear application that allows rewards to be mapped from profiles, this can be leveraged to greatly improve the exploitation-exploration trade-off of stationary stochastic methods like UCB. In this paper, we consider the case where action profiles are unknown beforehand. Instead, the agent only observes sample vectors, with mean equal to the true profiles, for a subset of actions at each decision step. We propose a new algorithm, called SampLinUCB, and derive a finite time high probability upper bound on its regret. We also provide numerical experiments on a task of focused data capture from online social networks.
Fichier non déposé

Dates et versions

hal-01357953 , version 1 (30-08-2016)

Identifiants

Citer

Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari. Linear Bandits in Unknown Environments. ECML PKDD 2016 - European Conference on Machine Learning and Knowledge Discovery in Databases, Sep 2016, Riva Del Garda, Italy. pp.282-298, ⟨10.1007/978-3-319-46227-1_18⟩. ⟨hal-01357953⟩
226 Consultations
0 Téléchargements

Altmetric

Partager

More