Contextual Bandits with Hidden Contexts: a Focused Data Capture From Social Media Streams
Résumé
This paper addresses the problem of real time data capture from social media. Due to different limitations, it is not possible to collect all the data produced by social networks such as Twitter. Therefore, to be able to gather enough relevant information related to a predefined need, it is necessary to focus on a subset of the information sources. In this work, we focus on user-centered data capture and consider each account of a social network as a source that can be followed at each iteration of a data capture process. This process, whose aim is to maximize the cumulative utility of the captured information for the specified need, is constrained at each time step by the number of users that can be monitored simultaneously. The problem of selecting a subset of accounts to listen to over time is a sequential decision problem under constraints, which we formalize as a bandit problem with multiple selections. In this work, we propose a contextual UCB-like approach, that uses the activity of any user during the current step to predict his future behavior. Besides the capture of usefulness variations, considering contexts also enables to improve the efficiency of the process by leveraging some structure in the search space. However, existing contextual bandit approaches do not fit for our setting where most of the contexts are hidden from the agent. We therefore propose a new algorithm, called HiddenLinUCB, which aims at dealing with such missing information via variational inference. Experiments demonstrate the very good behavior of this approach compared to existing methods for tasks of data capture from social networks.