Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits - Sorbonne Université
Communication Dans Un Congrès Année : 2017

Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits

George Velentzas
  • Fonction : Auteur
  • PersonId : 1163519
Costas Tzafestas
  • Fonction : Auteur

Résumé

Fast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). The problem of finding an efficient exploration-exploitation trade-off has been well studied both in the Machine Learning and Computational Neuroscience fields. Rather than using a fixed proportion, non-stationary multi-armed bandit methods in the former have proven that principles such as exploring actions that have not been tested for a long time can lead to performance closer to optimalbounded regret. In parallel, researches in the latter have investigated solutions such as progressively increasing exploitation in response to improvements of performance, transiently increasing exploration in response to drops in average performance, or attributing exploration bonuses specifically to actions associated with high uncertainty in order to gain information when performing these actions. In this work, we first try to bridge some of these different methods from the two research fields by rewriting their decision process with a common formalism. We then show numerical simulations of a hybrid algorithm combining bio-inspired meta-learning, kalman filter and exploration bonuses compared to several state-of-the-art alternatives on a set of non-stationary stochastic multi-armed bandit tasks. While we find that different methods are appropriate in different scenarios, the hybrid algorithm displays a good combination of advantages from different methods and outperforms these methods in the studied scenarios.
Fichier principal
Vignette du fichier
Velentzas2017_RLDM.pdf (504.48 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03775008 , version 1 (12-09-2022)

Identifiants

  • HAL Id : hal-03775008 , version 1

Citer

George Velentzas, Costas Tzafestas, Mehdi Khamassi. Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits. 3rd International Conference on Reinforcement Learning and Decision Making (RLDM) 2017, Jun 2017, Ann Arbor, United States. ⟨hal-03775008⟩
11 Consultations
18 Téléchargements

Partager

More