Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits

George Velentzas; Costas Tzafestas; Mehdi Khamassi

Communication Dans Un Congrès Année : 2017

Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits

, , (1)

George Velentzas

Fonction : Auteur
PersonId : 1163519

Costas Tzafestas

Fonction : Auteur

Mehdi Khamassi

Fonction : Auteur
PersonId : 186
IdHAL : mehdi-khamassi
ORCID : 0000-0002-2515-1046
IdRef : 12845072X

Institut des Systèmes Intelligents et de Robotique

Résumé

Fast adaptation to changes in the environment requires both natural and artificial agents to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). The problem of finding an efficient exploration-exploitation trade-off has been well studied both in the Machine Learning and Computational Neuroscience fields. Rather than using a fixed proportion, non-stationary multi-armed bandit methods in the former have proven that principles such as exploring actions that have not been tested for a long time can lead to performance closer to optimalbounded regret. In parallel, researches in the latter have investigated solutions such as progressively increasing exploitation in response to improvements of performance, transiently increasing exploration in response to drops in average performance, or attributing exploration bonuses specifically to actions associated with high uncertainty in order to gain information when performing these actions. In this work, we first try to bridge some of these different methods from the two research fields by rewriting their decision process with a common formalism. We then show numerical simulations of a hybrid algorithm combining bio-inspired meta-learning, kalman filter and exploration bonuses compared to several state-of-the-art alternatives on a set of non-stationary stochastic multi-armed bandit tasks. While we find that different methods are appropriate in different scenarios, the hybrid algorithm displays a good combination of advantages from different methods and outperforms these methods in the studied scenarios.

Mots clés

Bandits Decision-making Meta-learning Active exploration Kalman filter Reinforcement learning Multi-armed bandit

Domaines

Apprentissage [cs.LG] Modélisation et simulation Psychologie et comportements Neurosciences

Fichier principal

Velentzas2017_RLDM.pdf (504.48 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mehdi Khamassi : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03775008

Soumis le : lundi 12 septembre 2022-11:38:12

Dernière modification le : samedi 7 octobre 2023-21:36:23

Archivage à long terme le : mardi 13 décembre 2022-18:37:36

Dates et versions

hal-03775008 , version 1 (12-09-2022)

Identifiants

HAL Id : hal-03775008 , version 1

Citer

George Velentzas, Costas Tzafestas, Mehdi Khamassi. Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits. 3rd International Conference on Reinforcement Learning and Decision Making (RLDM) 2017, Jun 2017, Ann Arbor, United States. ⟨hal-03775008⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR TDS-MACS SORBONNE-UNIVERSITE SU-SCIENCES ISIR_AMAC

11 Consultations

18 Téléchargements

Bridging Computational Neuroscience and Machine Learning on Non-Stationary Multi-Armed Bandits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager