Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks

George Velentzas; Costas Tzafestas; Mehdi Khamassi

doi:10.1109/IntelliSys.2017.8324365

Communication Dans Un Congrès Année : 2017

Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks

, , (1)

George Velentzas

Fonction : Auteur

Costas Tzafestas

Fonction : Auteur

Mehdi Khamassi

Fonction : Auteur
PersonId : 186
IdHAL : mehdi-khamassi
ORCID : 0000-0002-2515-1046
IdRef : 12845072X

Institut des Systèmes Intelligents et de Robotique

Résumé

Fast adaptation to changes in the environment requires agents (animals, robots and simulated artefacts) to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). Rather than using a fixed proportion, nonstationary multi-armed bandit methods in the field of machine learning have proven that principles such as exploring actions that have not been tested for a long time can lead to performance closer to optimal-bounded regret. In parallel, researches in active exploration in the fields of robot learning and computational neuroscience of learning and decision-making have proposed alternative solutions such as transiently increasing exploration in response to drops in average performance, or attributing exploration bonuses specifically to actions associated with high uncertainty in order to gain information when choosing them. In this work, we compare different methods from machine learning, computational neuroscience and robot learning on a set of nonstationary stochastic multi-armed bandit tasks: abrupt shifts; best bandit becomes worst one and vice versa; multiple shifting frequencies. We find that different methods are appropriate in different scenarios. We propose a new hybrid method combining bio-inspired meta-learning, kalman filter and exploration bonuses and show that it outperforms other methods in these scenarios.

Mots clés

Bandits Decision Making meta-learning Active exploration kalman filter reinforcement learning multiarmed bandit

Domaines

Apprentissage [cs.LG] Automatique / Robotique

Fichier principal

Velentzas2017_Intellisys.pdf (916.17 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mehdi Khamassi : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03774989

Soumis le : lundi 12 septembre 2022-11:30:52

Dernière modification le : samedi 7 octobre 2023-21:36:23

Archivage à long terme le : mardi 13 décembre 2022-18:36:21

Dates et versions

hal-03774989 , version 1 (12-09-2022)

Identifiants

HAL Id : hal-03774989 , version 1
DOI : 10.1109/IntelliSys.2017.8324365

Citer

George Velentzas, Costas Tzafestas, Mehdi Khamassi. Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks. Intelligent Systems Conference (IntelliSys) 2017, Sep 2017, London, France. pp.661-669, ⟨10.1109/IntelliSys.2017.8324365⟩. ⟨hal-03774989⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR TDS-MACS SORBONNE-UNIVERSITE SU-SCIENCES ISIR_AMAC

22 Consultations

47 Téléchargements

Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager