D. W. Aha, Lazy learning, pp.7-10, 1997.

Y. Akimoto, Y. Nagata, I. Ono, and S. Kobayashi, Bidirectional relation between cma evolution strategies and natural evolution strategies, International Conference on Parallel Problem Solving from Nature, pp.154-163, 2010.

B. D. Argall, S. Chernova, M. Veloso, and B. Browning, A survey of robot learning from demonstration, Robotics and Autonomous Systems, vol.57, pp.469-483, 2009.

L. Arnold, A. Auger, N. Hansen, Y. ;. Ollivier, and . Inria-saclay, Information-geometric optimization algorithms: A unifying picture via invariance principles, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00601503

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, A brief survey of deep reinforcement learning, 2017.

K. Azizzadenesheli, E. Brunskill, and A. Anandkumar, Efficient exploration through bayesian deep q-networks, 2018.

T. Back, Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms, 1996.

P. Bacon, J. Harb, and D. Precup, The option-critic architecture, pp.1726-1734, 2017.

L. C. Baird, Reinforcement learning in continuous time: Advantage updating, Proceedings of the International Conference on Neural Networks, 1994.

A. Baranes and P. Oudeyer, Intrinsically motivated goal exploration for active motor learning in robots: A case study, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00541769

A. Baranes and P. Oudeyer, Active learning of inverse models with intrinsically motivated goal exploration in robots, Robotics and Autonomous Systems, vol.61, issue.1, pp.49-73, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00788440

A. F. Baranes, P. Oudeyer, and J. Gottlieb, The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration, Frontiers in neuroscience, vol.8, p.317, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01087227

G. Barth-maron, M. Hoffman, D. Budden, W. Dabney, D. Horgan et al., Distributional policy gradient, pp.1-16, 2018.

J. Baxter and P. L. Bartlett, Infinite-horizon policygradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

M. G. Bellemare, W. Dabney, and R. Munos, A distributional perspective on reinforcement learning, 2017.

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental natural actor-critic algorithms, Advances in Neural Information Processing Systems, 2007.

L. Bottou, Stochastic gradient descent tricks, Neural networks: Tricks of the trade, pp.421-436, 2012.

E. Brochu, V. M. Cora, and N. De-freitas, A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010.

R. Calandra, N. Gopalan, A. Seyfarth, J. Peters, and M. P. Deisenroth, Bayesian gait optimization for bipedal locomotion, International Conference on Learning and Intelligent Optimization, pp.274-290, 2014.

K. Chatzilygeroudis, R. Rama, R. Kaushik, D. Goepp, V. Vassiliades et al., Black-box data-efficient policy search for robotics, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01576683

P. Chrabaszcz, I. Loshchilov, F. ;. Hutter, O. Sigaud, and P. Oudeyer, GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms, 2018.

E. Conti, V. Madhavan, F. P. Such, J. Lehman, K. O. Stanley et al., Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, 2017.

G. Cuccu and F. Gomez, When novelty is not enough, European Conference on the Applications of Evolutionary Computation, pp.234-243, 2011.

A. Cully, J. Clune, D. Tarapore, and J. Mouret, Robots that can adapt like animals, Nature, vol.521, issue.7553, pp.503-507, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01158243

A. Cully and Y. Demiris, Quality and diversity optimization: A unifying modular framework, IEEE Transactions on Evolutionary Computation, 2017.

A. De-froissard-de-broissia and O. Sigaud, Actor-critic versus direct policy search: a comparison based on sample complexity, 2016.

M. Deisenroth and C. E. Rasmussen, Pilco: A modelbased and data-efficient approach to policy search, Proceedings of the 28th International Conference on machine learning, pp.465-472, 2011.

M. P. Deisenroth, G. Neumann, and J. Peters, A survey on policy search for robotics, Foundations and Trends R in Robotics, vol.2, issue.1-2, pp.1-142, 2013.

S. Doncieux and J. Mouret, Beyond black-box optimization: a review of selective pressures for evolutionary robotics, Evolutionary Intelligence, vol.7, issue.2, pp.71-93, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01150254

Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, Benchmarking deep reinforcement learning for continuous control, 2016.

D. Floreano, P. Dürr, and C. Mattiussi, Neuroevolution: from architectures to learning, Evolutionary Intelligence, vol.1, issue.1, pp.47-62, 2008.

S. Forestier, Y. Mollard, and P. Oudeyer, Intrinsically motivated goal exploration processes with automatic curriculum learning, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01651233

S. Forestier and P. Oudeyer, Overlapping waves in tool use development: a curiosity-driven computational model, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01384562

M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband et al., Noisy networks for exploration, 2017.

S. Fujimoto, H. Van-hoof, and D. Meger, Addressing function approximation error in actor-critic methods, 2018.

T. Gangwani and J. Peng, Genetic policy optimization, 2017.

T. Gangwani and J. Peng, Policy optimization by genetic distillation, 2018.

P. E. Gill, W. Murray, and M. H. Wright, Practical optimization, 1981.

T. Glasmachers, T. Schaul, S. Yi, D. Wierstra, and J. Schmidhuber, Exponential natural evolution strategies, Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp.393-400, 2010.

D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, 1989.

I. Grondman, L. Busoniu, G. A. Lopes, and R. Babuska, A survey of actor-critic reinforcement learning: Standard and natural policy gradients, Man, and Cybernetics, Part C (Applications and Reviews), vol.42, issue.6, pp.1291-1307, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00756747

S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, Q-prop: Sample-efficient policy gradient with an off-policy critic, 2016.

S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, B. Schölkopf et al., Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning, 2017.

S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, Continuous deep q-learning with model-based acceleration, 2016.

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.

N. Hansen and A. Ostermeier, Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001.

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup et al., Deep reinforcement learning that matters, 2017.

M. Hessel, J. Modayil, H. Van-hasselt, T. Schaul, G. Ostrovski et al., Rainbow: Combining improvements in deep reinforcement learning, 2017.

J. Hwangbo, C. Gehring, H. Sommer, R. Siegwart, and J. Buchli, ROCK * :efficient black-box optimization for policy learning, IEEE-RAS International Conference on Humanoid Robots. IEEE, pp.535-540, 2014.

A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, Dynamical movement primitives: learning attractor models for motor behaviors, Neural computation, vol.25, issue.2, pp.328-373, 2013.

R. Islam, P. Henderson, M. Gomrokchi, and D. Precup, Reproducibility of benchmarked deep reinforcement learning tasks for continuous control, Proceedings of the ICML 2017 workshop on Reproducibility in Machine Learning (RML), 2017.

M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue et al., Population based training of neural networks, 2017.

M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo et al., Reinforcement learning with unsupervised auxiliary tasks, 2016.

R. Jonschkowski and O. Brock, State representation learning in robotics: Using prior knowledge about physical interaction, Proceedings of Robotics, Science and Systems, 2014.

K. Kawaguchi, Deep learning without poor local minima, Advances In Neural Information Processing Systems, pp.586-594, 2016.

M. J. Kearns and S. P. Singh, Bias-variance error bounds for temporal difference updates, pp.142-147, 2000.

S. Khadka and K. Tumer, Evolutionary reinforcement learning, 2018.

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, vol.32, issue.11, pp.1238-1274, 2013.

J. Kober and J. Peters, Learning motor primitives for robotics, IEEE International Conference on Robotics and Automation, pp.2112-2118, 2009.

J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, 1992.

T. D. Kulkarni, K. R. Narasimhan, A. Saeedi, and J. B. Tenenbaum, Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, 2016.

P. Larrañaga and J. A. Lozano, Estimation of distribution algorithms: A new tool for evolutionary computation, vol.2, 2001.

A. Laversanne-finot, A. Péré, and P. Oudeyer, Curiosity driven exploration of learned disentangled goal spaces, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01891598

J. Lehman, J. Chen, J. Clune, and K. O. Stanley, ES is more than just a traditional finite-difference approximator, 2017.

J. Lehman and K. O. Stanley, Abandoning objectives: Evolution through the search for novelty alone, Evolutionary computation, vol.19, issue.2, pp.189-223, 2011.

T. Lesort, N. Díaz-rodríguez, J. Goudou, and D. Filliat, State representation learning for control: An overview, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01858558

S. Levine and V. Koltun, Guided policy search, Proceedings of the 30th International Conference on Machine Learning, pp.1-9, 2013.

A. Levy, R. Platt, and K. Saenko, Hierarchical reinforcement learning with hindsight, 2018.

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez et al., Continuous control with deep reinforcement learning, 2015.

D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans, Automatic gait optimization with gaussian process regression, In: IJCAI, vol.7, pp.944-949, 2007.

H. Mania, A. Guy, and B. Recht, Simple random search provides a competitive approach to reinforcement learning, 2018.

R. Martinez-cantin, K. Tee, and M. Mccourt, Policy search using robust Bayesian optimization, Neural Information Processing Systems (NIPS) Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning, 2017.

J. H. Metzen, A. Fabisch, and J. Hansen, Bayesian optimization for contextual policy search, Proceedings of the Second Machine Learning in Planning and Control of Robot Motion Workshop, 2015.

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap et al., Asynchronous methods for deep reinforcement learning, 2016.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Humanlevel control through deep reinforcement learning, Nature, vol.518, issue.7540, pp.529-533, 2015.

W. H. Montgomery and S. Levine, Guided policy search via approximate mirror descent, Advances in Neural Information Processing Systems, pp.4008-4016, 2016.

O. Nachum, S. Gu, H. Lee, and S. Levine, Data-efficient hierarchical reinforcement learning, 2018.

A. Najar, O. Sigaud, and M. Chetouani, Training a robot with evaluative feedback and unlabeled guidance signals, 25th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, pp.261-266, 2016.

G. Neumann, Variational inference for policy search in changing situations, Proceedings of the 28th international conference on machine learning, pp.817-824, 2011.

B. O'donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, Combining policy gradient and q-learning, 2016.

M. Pelikan, D. E. Goldberg, and E. Cantú-paz, Boa: The Bayesian optimization algorithm, Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol.1, pp.525-532, 1999.

A. Pere, S. Forestier, O. Sigaud, and P. Oudeyer, Unsupervised learning of goal spaces for intrinsically motivated goal exploration, International Conference on Learning Representations (ICLR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01891758

J. Peters, K. Mülling, and Y. Altun, Relative entropy policy search, pp.1607-1612, 2010.

J. Peters and S. Schaal, Natural actor-critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008.

J. Peters and S. Schaal, Reinforcement learning of motor skills with policy gradients, Neural networks, vol.21, issue.4, pp.682-697, 2008.

P. Such, F. Madhavan, V. Conti, E. Lehman, J. Stanley et al., Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, 2017.

T. Pierrot, N. Perrin, and O. Sigaud, First-order and second-order variants of the gradient descent: a unified framework, 2018.

M. Plappert, R. Houthooft, P. Dhariwal, S. Sidor, R. Y. Chen et al., Parameter space noise for exploration, 2017.

A. Pourchot, N. Perrin, and O. Sigaud, Importance mixing: Improving sample reuse in evolutionary policy search methods, 2018.

A. Pourchot and O. Sigaud, Cem-rl: Combining evolutionary and gradient-based methods for policy search, 2018.

J. K. Pugh, L. Soros, P. A. Szerlip, and K. O. Stanley, Confronting the challenge of quality diversity, Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp.967-974, 2015.

A. Raffin, S. Höfer, R. Jonschkowski, O. Brock, and F. Stulp, Unsupervised learning of state representations for multiple tasks, 2016.

L. Rastrigin, The convergence of the random search method in the extremal control of a many parameter system, Automation and Remote Control, vol.24, issue.10, pp.1337-1342, 1963.

M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave et al., Learning by playing-solving sparse reward tasks from scratch, 2018.

M. Riedmiller, J. Peters, and S. Schaal, Evaluation of policy gradient methods and variants on the cartpole benchmark, IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), 2008.

R. Rubinstein and D. Kroese, The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, 2004.

T. Salimans, J. Ho, X. Chen, and I. Sutskever, Evolution strategies as a scalable alternative to reinforcement learning, 2017.

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, Prioritized experience replay, 2015.

J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, Trust region policy optimization, 2015.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, 2017.

F. Sehnke, C. Osendorfer, T. Rückstieß, A. Graves, J. Peters et al., Parameter-exploring policy gradients, Neural Networks, vol.23, issue.4, pp.551-559, 2010.

E. Shelhamer, P. Mahmoudieh, M. Argus, and T. Darrell, Loss is its own reward: Self-supervision for reinforcement learning, 2016.

O. Sigaud and O. Buffet, Markov Decision Processes in Artificial Intelligence, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00432735

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra et al., Deterministic policy gradient algorithms, Proceedings of the 30th International Conference in Machine Learning, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00938992

K. O. Stanley and R. Miikkulainen, Efficient evolution of neural network topologies, Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on, vol.2, pp.1757-1762, 2002.

F. Stulp and O. Sigaud, Path integral policy improvement with covariance matrix adaptation, Proceedings of the 29th International Conference on Machine Learning, pp.1-8, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00789391

F. Stulp and O. Sigaud, Policy improvement methods: Between black-box optimization and episodic reinforcement learning, p.738463, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00738463

F. Stulp and O. Sigaud, Robot skill learning: From reinforcement learning to evolution strategies, Paladyn Journal of Behavioral Robotics, vol.4, issue.1, pp.49-61, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922132

F. Stulp and O. Sigaud, Many regression algorithms, one unified model: A review, Neural Networks, vol.69, pp.60-79, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01162281

Y. Sun, D. Wierstra, T. Schaul, and J. Schmidhuber, Efficient natural evolution strategies, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp.539-546, 2009.

R. S. Sutton, Learning to Predict by the Method of Temporal Differences, Machine Learning, vol.3, pp.9-44, 1988.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 1998.

Y. Tang and A. Kucukelbir, , 2017.

E. Theodorou, J. Buchli, and S. Schaal, A generalized path integral control approach to reinforcement learning, Journal of Machine Learning Research, vol.11, pp.3137-3181, 2010.

S. Thrun and T. M. Mitchell, Lifelong robot learning, Robotics and autonomous systems, vol.15, issue.1-2, pp.25-46, 1995.

J. Togelius, T. Schaul, D. Wierstra, C. Igel, F. Gomez et al., Ontogenetic and phylogenetic reinforcement learning, Künstliche Intelligenz, vol.23, issue.3, pp.30-33, 2009.

V. Veeriah, J. Oh, and S. Singh, Many-goals reinforcement learning, 2018.

A. S. Vezhnevets, S. Osindero, T. Schaul, N. Heess, M. Jaderberg et al., Feudal networks for hierarchical reinforcement learning, 2017.

J. X. Wang, Z. Kurth-nelson, D. Tirumala, H. Soyer, J. Z. Leibo et al., Learning to reinforcement learn, 2016.

Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos et al., Sample efficient actor-critic with experience replay, 2016.

D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber, Natural evolution strategies, pp.3381-3387, 2008.

J. K. Williams and S. P. Singh, Experimental results on learning stochastic memoryless policies for partially observable markov decision processes, pp.1073-1080, 1998.

R. J. Williams, Simple statistical gradientfollowing algorithms for connectionist reinforcement learning, Machine Learning, vol.8, issue.3-4, pp.229-256, 1992.

A. Wilson, A. Fern, and P. Tadepalli, Using trajectory data to improve bayesian optimization for reinforcement learning, The Journal of Machine Learning Research, vol.15, issue.1, pp.253-282, 2014.

Y. Wu, E. Mansimov, S. Liao, R. Grosse, and J. Ba, Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, 2017.

Y. Yang and T. M. Hospedales, A unified perspective on multi-domain and multi-task learning, 2014.

X. Zhang, J. Clune, and K. O. Stanley, On the relationship between the openai evolution strategy and stochastic gradient descent, 2017.

M. Zimmer and S. Doncieux, Bootstrapping Q-learning for robotics from neuro-evolution results, IEEE Transactions on Cognitive and Developmental Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01494744