Skip to Main content Skip to Navigation
Conference papers

Transfer learning for direct policy search: a reward shaping approach

Abstract : In the perspective of life long learning, a robotmay face different, but related situations. Being able to exploitthe knowledge acquired during a first learning phase may becritical in order to solve more complex tasks. This is the transferlearning problem. This problem is addressed here in the case ofdirect policy search algorithms. No discrete states, nor actionsare defined a priori. A policy is described by a controllerthat computes orders to be sent to the motors out of sensorvalues. Both motor and sensor values can be continuous. Theproposed approach relies on population based direct policy searchalgorithms, i.e. evolutionary algorithms. It exploits the numerousbehaviors that are generated during the search. When learningon the source task, a knowledge base is built. The knowledgebase aims at identifying the most salient behaviors segments withregards to the considered task. Afterwards, the knowledge baseis exploited on a target task, with a reward shaping approach : besides its reward on the task, a policy is credited with a rewardcomputed from the knowledge base. The rationale behind thisapproach is to automatically detect the stepping stones, i.e. thebehavior segments that have lead to a reward in the source taskbefore the policy is efficient enough to get the reward on the targettask. The approach is tested in simulation with a neuroevolutionapproach and on ball collecting tasks
Complete list of metadatas

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02987427
Contributor : Stephane Doncieux <>
Submitted on : Tuesday, November 3, 2020 - 8:49:11 PM
Last modification on : Saturday, November 7, 2020 - 3:34:07 AM

File

2013ACTI2858.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02987427, version 1

Citation

Stéphane Doncieux. Transfer learning for direct policy search: a reward shaping approach. Proceedings of ICDL-EpiRob conference, 2013, Osaka, Japan. pp.1-6. ⟨hal-02987427⟩

Share

Metrics

Record views

2

Files downloads

6