, Lazy learning, pp.7-10, 1997.
Bidirectional relation between cma evolution strategies and natural evolution strategies, International Conference on Parallel Problem Solving from Nature, pp.154-163, 2010. ,
A survey of robot learning from demonstration, Robotics and Autonomous Systems, vol.57, pp.469-483, 2009. ,
Information-geometric optimization algorithms: A unifying picture via invariance principles, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00601503
A brief survey of deep reinforcement learning, 2017. ,
Efficient exploration through bayesian deep q-networks, 2018. ,
Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms, 1996. ,
The option-critic architecture, pp.1726-1734, 2017. ,
Reinforcement learning in continuous time: Advantage updating, Proceedings of the International Conference on Neural Networks, 1994. ,
Intrinsically motivated goal exploration for active motor learning in robots: A case study, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00541769
Active learning of inverse models with intrinsically motivated goal exploration in robots, Robotics and Autonomous Systems, vol.61, issue.1, pp.49-73, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00788440
The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration, Frontiers in neuroscience, vol.8, p.317, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01087227
Distributional policy gradient, pp.1-16, 2018. ,
Infinite-horizon policygradient estimation, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
A distributional perspective on reinforcement learning, 2017. ,
Incremental natural actor-critic algorithms, Advances in Neural Information Processing Systems, 2007. ,
Stochastic gradient descent tricks, Neural networks: Tricks of the trade, pp.421-436, 2012. ,
A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010. ,
Bayesian gait optimization for bipedal locomotion, International Conference on Learning and Intelligent Optimization, pp.274-290, 2014. ,
Black-box data-efficient policy search for robotics, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01576683
GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms, 2018. ,
Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, 2017. ,
When novelty is not enough, European Conference on the Applications of Evolutionary Computation, pp.234-243, 2011. ,
Robots that can adapt like animals, Nature, vol.521, issue.7553, pp.503-507, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01158243
Quality and diversity optimization: A unifying modular framework, IEEE Transactions on Evolutionary Computation, 2017. ,
Actor-critic versus direct policy search: a comparison based on sample complexity, 2016. ,
Pilco: A modelbased and data-efficient approach to policy search, Proceedings of the 28th International Conference on machine learning, pp.465-472, 2011. ,
A survey on policy search for robotics, Foundations and Trends R in Robotics, vol.2, issue.1-2, pp.1-142, 2013. ,
Beyond black-box optimization: a review of selective pressures for evolutionary robotics, Evolutionary Intelligence, vol.7, issue.2, pp.71-93, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01150254
Benchmarking deep reinforcement learning for continuous control, 2016. ,
Neuroevolution: from architectures to learning, Evolutionary Intelligence, vol.1, issue.1, pp.47-62, 2008. ,
Intrinsically motivated goal exploration processes with automatic curriculum learning, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01651233
Overlapping waves in tool use development: a curiosity-driven computational model, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01384562
, Noisy networks for exploration, 2017.
Addressing function approximation error in actor-critic methods, 2018. ,
, Genetic policy optimization, 2017.
Policy optimization by genetic distillation, 2018. ,
Practical optimization, 1981. ,
Exponential natural evolution strategies, Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp.393-400, 2010. ,
Genetic Algorithms in Search, Optimization, and Machine Learning, 1989. ,
A survey of actor-critic reinforcement learning: Standard and natural policy gradients, Man, and Cybernetics, Part C (Applications and Reviews), vol.42, issue.6, pp.1291-1307, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00756747
Q-prop: Sample-efficient policy gradient with an off-policy critic, 2016. ,
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning, 2017. ,
, Continuous deep q-learning with model-based acceleration, 2016.
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018. ,
Completely derandomized self-adaptation in evolution strategies, Evolutionary Computation, vol.9, issue.2, pp.159-195, 2001. ,
, Deep reinforcement learning that matters, 2017.
, Rainbow: Combining improvements in deep reinforcement learning, 2017.
ROCK * :efficient black-box optimization for policy learning, IEEE-RAS International Conference on Humanoid Robots. IEEE, pp.535-540, 2014. ,
Dynamical movement primitives: learning attractor models for motor behaviors, Neural computation, vol.25, issue.2, pp.328-373, 2013. ,
Reproducibility of benchmarked deep reinforcement learning tasks for continuous control, Proceedings of the ICML 2017 workshop on Reproducibility in Machine Learning (RML), 2017. ,
, Population based training of neural networks, 2017.
Reinforcement learning with unsupervised auxiliary tasks, 2016. ,
State representation learning in robotics: Using prior knowledge about physical interaction, Proceedings of Robotics, Science and Systems, 2014. ,
Deep learning without poor local minima, Advances In Neural Information Processing Systems, pp.586-594, 2016. ,
Bias-variance error bounds for temporal difference updates, pp.142-147, 2000. ,
Evolutionary reinforcement learning, 2018. ,
Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, vol.32, issue.11, pp.1238-1274, 2013. ,
Learning motor primitives for robotics, IEEE International Conference on Robotics and Automation, pp.2112-2118, 2009. ,
Genetic Programming: On the Programming of Computers by Means of Natural Selection, 1992. ,
Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, 2016. ,
Estimation of distribution algorithms: A new tool for evolutionary computation, vol.2, 2001. ,
Curiosity driven exploration of learned disentangled goal spaces, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01891598
ES is more than just a traditional finite-difference approximator, 2017. ,
Abandoning objectives: Evolution through the search for novelty alone, Evolutionary computation, vol.19, issue.2, pp.189-223, 2011. ,
State representation learning for control: An overview, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01858558
Guided policy search, Proceedings of the 30th International Conference on Machine Learning, pp.1-9, 2013. ,
Hierarchical reinforcement learning with hindsight, 2018. ,
, Continuous control with deep reinforcement learning, 2015.
Automatic gait optimization with gaussian process regression, In: IJCAI, vol.7, pp.944-949, 2007. ,
Simple random search provides a competitive approach to reinforcement learning, 2018. ,
Policy search using robust Bayesian optimization, Neural Information Processing Systems (NIPS) Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning, 2017. ,
Bayesian optimization for contextual policy search, Proceedings of the Second Machine Learning in Planning and Control of Robot Motion Workshop, 2015. ,
Asynchronous methods for deep reinforcement learning, 2016. ,
Humanlevel control through deep reinforcement learning, Nature, vol.518, issue.7540, pp.529-533, 2015. ,
Guided policy search via approximate mirror descent, Advances in Neural Information Processing Systems, pp.4008-4016, 2016. ,
Data-efficient hierarchical reinforcement learning, 2018. ,
Training a robot with evaluative feedback and unlabeled guidance signals, 25th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, pp.261-266, 2016. ,
Variational inference for policy search in changing situations, Proceedings of the 28th international conference on machine learning, pp.817-824, 2011. ,
Combining policy gradient and q-learning, 2016. ,
Boa: The Bayesian optimization algorithm, Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol.1, pp.525-532, 1999. ,
Unsupervised learning of goal spaces for intrinsically motivated goal exploration, International Conference on Learning Representations (ICLR), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01891758
Relative entropy policy search, pp.1607-1612, 2010. ,
Natural actor-critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008. ,
Reinforcement learning of motor skills with policy gradients, Neural networks, vol.21, issue.4, pp.682-697, 2008. ,
Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, 2017. ,
First-order and second-order variants of the gradient descent: a unified framework, 2018. ,
, Parameter space noise for exploration, 2017.
Importance mixing: Improving sample reuse in evolutionary policy search methods, 2018. ,
Cem-rl: Combining evolutionary and gradient-based methods for policy search, 2018. ,
Confronting the challenge of quality diversity, Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp.967-974, 2015. ,
Unsupervised learning of state representations for multiple tasks, 2016. ,
The convergence of the random search method in the extremal control of a many parameter system, Automation and Remote Control, vol.24, issue.10, pp.1337-1342, 1963. ,
Learning by playing-solving sparse reward tasks from scratch, 2018. ,
Evaluation of policy gradient methods and variants on the cartpole benchmark, IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), 2008. ,
The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, 2004. ,
Evolution strategies as a scalable alternative to reinforcement learning, 2017. ,
, Prioritized experience replay, 2015.
Trust region policy optimization, 2015. ,
, Proximal policy optimization algorithms, 2017.
Parameter-exploring policy gradients, Neural Networks, vol.23, issue.4, pp.551-559, 2010. ,
Loss is its own reward: Self-supervision for reinforcement learning, 2016. ,
Markov Decision Processes in Artificial Intelligence, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00432735
Deterministic policy gradient algorithms, Proceedings of the 30th International Conference in Machine Learning, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00938992
Efficient evolution of neural network topologies, Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on, vol.2, pp.1757-1762, 2002. ,
Path integral policy improvement with covariance matrix adaptation, Proceedings of the 29th International Conference on Machine Learning, pp.1-8, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00789391
Policy improvement methods: Between black-box optimization and episodic reinforcement learning, p.738463, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00738463
Robot skill learning: From reinforcement learning to evolution strategies, Paladyn Journal of Behavioral Robotics, vol.4, issue.1, pp.49-61, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00922132
Many regression algorithms, one unified model: A review, Neural Networks, vol.69, pp.60-79, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01162281
Efficient natural evolution strategies, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp.539-546, 2009. ,
Learning to Predict by the Method of Temporal Differences, Machine Learning, vol.3, pp.9-44, 1988. ,
Reinforcement Learning: An Introduction, 1998. ,
, , 2017.
A generalized path integral control approach to reinforcement learning, Journal of Machine Learning Research, vol.11, pp.3137-3181, 2010. ,
Lifelong robot learning, Robotics and autonomous systems, vol.15, issue.1-2, pp.25-46, 1995. ,
Ontogenetic and phylogenetic reinforcement learning, Künstliche Intelligenz, vol.23, issue.3, pp.30-33, 2009. ,
Many-goals reinforcement learning, 2018. ,
Feudal networks for hierarchical reinforcement learning, 2017. ,
, Learning to reinforcement learn, 2016.
Sample efficient actor-critic with experience replay, 2016. ,
Natural evolution strategies, pp.3381-3387, 2008. ,
Experimental results on learning stochastic memoryless policies for partially observable markov decision processes, pp.1073-1080, 1998. ,
Simple statistical gradientfollowing algorithms for connectionist reinforcement learning, Machine Learning, vol.8, issue.3-4, pp.229-256, 1992. ,
Using trajectory data to improve bayesian optimization for reinforcement learning, The Journal of Machine Learning Research, vol.15, issue.1, pp.253-282, 2014. ,
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, 2017. ,
A unified perspective on multi-domain and multi-task learning, 2014. ,
On the relationship between the openai evolution strategy and stochastic gradient descent, 2017. ,
Bootstrapping Q-learning for robotics from neuro-evolution results, IEEE Transactions on Cognitive and Developmental Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01494744