learning a Subspace of Policies for Online Adaptation in Reinforcement Learning

Jean-Baptiste Gaya; Laure Soulier; Ludovic Denoyer

Communication Dans Un Congrès Année : 2022

learning a Subspace of Policies for Online Adaptation in Reinforcement Learning

(1, 2) , (2) , (1)

1
2

Jean-Baptiste Gaya

Fonction : Auteur
PersonId : 1272600
IdRef : 279072759

Meta AI

Machine Learning and Information Access

Laure Soulier

Fonction : Auteur
PersonId : 8070
IdHAL : soulierl
ORCID : 0000-0001-9827-7400
IdRef : 189293683

Machine Learning and Information Access

Ludovic Denoyer

Fonction : Auteur

Meta AI

Résumé

Deep Reinforcement Learning (RL) is mainly studied in a setting where the training and the testing environments are similar. But in many practical applications, these environments may differ. For instance, in control systems, the robot(s) on which a policy is learned might differ from the robot(s) on which a policy will run. It can be caused by different internal factors (e.g., calibration issues, system attrition, defective modules) or also by external changes (e.g., weather conditions). There is a need to develop RL methods that generalize well to variations of the training conditions. In this article, we consider the simplest yet hard to tackle generalization setting where the test environment is unknown at train time, forcing the agent to adapt to the system's new dynamics. This online adaptation process can be computationally expensive (e.g., fine-tuning) and cannot rely on meta-RL techniques since there is just a single train environment. To do so, we propose an approach where we learn a subspace of policies within the parameter space. This subspace contains an infinite number of policies that are trained to solve the training environment while having different parameter values. As a consequence, two policies in that subspace process information differently and exhibit different behaviors when facing variations of the train environment. Our experiments 1 carried out over a large variety of benchmarks compare our approach with baselines, including diversity-based methods. In comparison, our approach is simple to tune, does not need any extra component (e.g., discriminator) and learns policies able to gather a high reward on unseen environments.

Domaines

Apprentissage [cs.LG]

Fichier principal

qskpdghpmpzdyxgjfhykcrppncsbkrfs.pdf (4.33 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Jean-Baptiste Gaya : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03826331

Soumis le : lundi 24 octobre 2022-10:28:35

Dernière modification le : mercredi 30 octobre 2024-13:28:42

Archivage à long terme le : mercredi 25 janvier 2023-18:23:25

Dates et versions

hal-03826331 , version 1 (24-10-2022)

Identifiants

HAL Id : hal-03826331 , version 1
ARXIV : 2110.05169

Citer

Jean-Baptiste Gaya, Laure Soulier, Ludovic Denoyer. learning a Subspace of Policies for Online Adaptation in Reinforcement Learning. International Conference of Learning Representations (ICLR)2022, Apr 2022, Virtual, France. ⟨hal-03826331⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR SORBONNE-UNIVERSITE SU-SCIENCES ISIR_MLIA

33 Consultations

122 Téléchargements

learning a Subspace of Policies for Online Adaptation in Reinforcement Learning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager