Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks - Sorbonne Université
Conference Papers Year : 2017

Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks

George Velentzas
  • Function : Author
Costas Tzafestas
  • Function : Author

Abstract

Fast adaptation to changes in the environment requires agents (animals, robots and simulated artefacts) to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). Rather than using a fixed proportion, nonstationary multi-armed bandit methods in the field of machine learning have proven that principles such as exploring actions that have not been tested for a long time can lead to performance closer to optimal-bounded regret. In parallel, researches in active exploration in the fields of robot learning and computational neuroscience of learning and decision-making have proposed alternative solutions such as transiently increasing exploration in response to drops in average performance, or attributing exploration bonuses specifically to actions associated with high uncertainty in order to gain information when choosing them. In this work, we compare different methods from machine learning, computational neuroscience and robot learning on a set of nonstationary stochastic multi-armed bandit tasks: abrupt shifts; best bandit becomes worst one and vice versa; multiple shifting frequencies. We find that different methods are appropriate in different scenarios. We propose a new hybrid method combining bio-inspired meta-learning, kalman filter and exploration bonuses and show that it outperforms other methods in these scenarios.
Fichier principal
Vignette du fichier
Velentzas2017_Intellisys.pdf (916.17 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-03774989 , version 1 (12-09-2022)

Identifiers

Cite

George Velentzas, Costas Tzafestas, Mehdi Khamassi. Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks. Intelligent Systems Conference (IntelliSys) 2017, Sep 2017, London, France. pp.661-669, ⟨10.1109/IntelliSys.2017.8324365⟩. ⟨hal-03774989⟩
12 View
38 Download

Altmetric

Share

More