B. D. Argall, S. Chernova, M. Veloso, and B. Browning, A survey of robot learning from demonstration, Robotics and Autonomous Systems, vol.57, issue.5, pp.469-483, 2009.

A. G. Barto, R. S. Sutton, and C. W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, vol.SMC-13, issue.5, pp.834-846, 1983.

H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, Speeded-Up Robust Features (SURF), Computer Vision and Image Understanding, vol.110, issue.3, pp.346-359, 2008.

S. R. Branavan, H. Chen, L. S. Zettlemoyer, and R. Barzilay, Reinforcement learning for mapping instructions to actions, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09, vol.1, pp.82-90, 2009.

S. R. Branavan, L. S. Zettlemoyer, and R. Barzilay, Reading Between the Lines: Learning to Map High-level Instructions to Commands, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pp.1268-1277, 2010.

S. Chernova and A. L. Thomaz, Robot Learning from Human Teachers, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.8, issue.3, pp.1-121, 2014.

J. A. Clouse and P. E. Utgoff, A Teaching Method for Reinforcement Learning, Machine Learning Proceedings 1992, pp.92-101, 1992.

F. Cruz, J. Twiefel, S. Magg, C. Weber, and S. Wermter, Interactive reinforcement learning through speech guidance in a domestic scenario, 2015 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2015.

S. Doncieux, N. Bredeche, J. Mouret, and A. E. Eiben, Evolutionary Robotics: What, Why, and Where to, Frontiers in Robotics and AI, vol.2, p.4, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01131267

S. Feng, E. Whitman, X. Xinjilefu, and C. G. Atkeson, Optimization based full body control for the atlas robot, 2014 IEEE-RAS International Conference on Humanoid Robots, pp.120-127, 2014.

J. García and F. Fernández, Comprehensive Survey on Safe Reinforcement Learning. J. Mach. Learn. Res, vol.16, issue.1, pp.1437-1480, 2015.

S. Griffith, K. Subramanian, J. Scholz, C. L. Isbell, and A. Thomaz, Motivated Reinforcement Learning, Advances in Neural Information Processing Systems 14, pp.2625-2633, 2002.

J. Grizou, M. Lopes, and P. Y. Oudeyer, Robot learning simultaneously a task and how to interpret human instructions, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL), pp.1-8, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00850703

M. Grze? and D. Kudenko, Online learning of shaping rewards in reinforcement learning, Neural Networks, vol.23, issue.4, pp.541-550, 2010.

M. K. Ho, F. A. Cushman, M. L. Littman, and J. L. Austerweil, People Teach with Rewards and Punishments as Communication not Reinforcements, Proceedings of the 37th Annual Meeting of the Cognitive Science Society, 2018.

C. Isbell, C. R. Shelton, M. Kearns, S. Singh, and P. Stone, A social reinforcement learning agent, Proceedings of the fifth international conference on Autonomous agents - AGENTS '01, pp.377-384, 2001.

W. B. Knox, C. Breazeal, and P. Stone, Learning from feedback on actions past and intended, Proceedings of 7th ACM/IEEE International Conference on Human-Robot Interaction, Late-Breaking Reports Session (HRI 2012), 2012.

W. B. Knox and P. Stone, Interactively shaping agents via human reinforcement, Proceedings of the fifth international conference on Knowledge capture - K-CAP '09, pp.9-16, 2009.

W. B. Knox and P. Stone, Reinforcement learning from human reward: Discounting in episodic tasks, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, pp.878-885, 2012.

W. B. Knox and P. Stone, International Foundation for Autonomous Agents and Multiagent Systems, Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol.1, pp.475-482, 2012.

W. B. Knox, P. Stone, and C. Breazeal, Training a Robot via Human Feedback: A Case Study, Social Robotics, vol.8239, pp.460-470, 2013.

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, The International Journal of Robotics Research, vol.32, issue.11, pp.1238-1274, 2013.

G. Konidaris and A. Barto, Autonomous shaping, Proceedings of the 23rd international conference on Machine learning - ICML '06, pp.489-496, 2006.

G. Konidaris and G. Hayes, Estimating Future Reward in Reinforcement Learning Animats using Associative Learning, From Animals to Animats 8, pp.297-304, 2004.

R. Loftin, J. Macglashan, B. Peng, M. E. Taylor, M. L. Littman et al., A Strategy-aware Technique for Learning Behaviors from Discrete Human Feedback, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI'14, pp.937-943, 2014.

R. Loftin, B. Peng, J. Macglashan, M. L. Littman, M. E. Taylor et al., Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning, Autonomous Agents and Multi-Agent Systems, vol.30, issue.1, pp.30-59, 2015.

J. Macglashan, M. Babes-vroman, M. Desjardins, M. Littman, S. Muresan et al., Grounding English Commands to Reward Functions, Robotics: Science and Systems XI, 2015.

J. Macglashan, M. Ho, R. Loftin, B. Peng, G. Wang et al., Interactive learning from policy-dependent human feedback. ICML, 2017.

B. Marthi, Automatic shaping and decomposition of reward functions, Proceedings of the 24th international conference on Machine learning - ICML '07, pp.601-608, 2007.

K. W. Mathewson and P. M. Pilarski, Simultaneous Control and Human Feedback in the Training of a Robotic Agent with Actor-Critic Reinforcement Learning, 2016.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.518, issue.7540, pp.529-533, 2015.

M. N. Nicolescu and M. J. Mataric, Natural methods for robot task learning, Proceedings of the second international joint conference on Autonomous agents and multiagent systems - AAMAS '03, pp.241-248, 2003.

K. V. Pradyot, S. S. Manimaran, and B. Ravindran, Instructing a Reinforcement Learner, Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference, pp.23-25, 2012.

K. V. Pradyot, S. S. Manimaran, B. Ravindran, and S. Natarajan, Integrating Human Instructions and Reinforcement Learners: An SRL Approach, Proceedings of the UAI workshop on Statistical Relational AI, 2012.

M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote et al., Ros: an open-source robot operating system, ICRA Workshop on Open Source Software, 2009.

M. T. Rosenstein, A. G. Barto, J. Si, A. Barto, W. Powell et al., Supervised Actor-Critic Reinforcement Learning, Handbook of Learning and Approximate Dynamic Programming, pp.359-380, 2004.

P. E. Rybski, K. Yoon, J. Stolarz, and M. M. Veloso, Interactive robot task training through dialog and demonstration, Proceeding of the ACM/IEEE international conference on Human-robot interaction - HRI '07, pp.49-56, 2007.

O. Sigaud and O. Buffet, Markov Decision Processes in Artificial Intelligence, 2013.
URL : https://hal.archives-ouvertes.fr/inria-00432735

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016.

H. B. Suay and S. Chernova, Effect of human guidance and state space size on Interactive Reinforcement Learning, 2011 RO-MAN, pp.1-6, 2011.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 1998.

A. C. Tenorio-gonzalez, E. F. Morales, and L. Villaseor-pineda, Dynamic Reward Shaping: Training a Robot by Voice, Advances in Artificial Intelligence IBERAMIA 2010: 12th Ibero-American Conference on AI, pp.483-492, 2010.

A. L. Thomaz and C. Breazeal, Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance, Proceedings of the 21st National Conference on Artificial Intelligence, vol.1, pp.1000-1005, 2006.

A. L. Thomaz and C. Breazeal, Robot learning via socially guided exploration, 2007 IEEE 6th International Conference on Development and Learning, 2007.

A. L. Thomaz and C. Breazeal, Robot learning via socially guided exploration, 2007 IEEE 6th International Conference on Development and Learning, pp.82-87, 2007.

A. L. Thomaz, G. Hoffman, and C. Breazeal, Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication, pp.352-357, 2006.

P. E. Utgoff and J. A. Clouse, Ninth Annual Conference on Uncertainty in Artificial Intelligence, Artificial Intelligence in Medicine, vol.5, issue.1, pp.83-84, 1993.

A. Vogel and D. Jurafsky, Learning to Follow Navigational Directions, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pp.806-814, 2010.

A. Vollmer, B. Wrede, K. J. Rohlfing, and P. Oudeyer, Pragmatic Frames for Teaching and Learning in Human?Robot Interaction: Review and Challenges, Frontiers in Neurorobotics, vol.10, issue.10, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01376455

C. J. Watkins and P. Dayan, Q-learning, Machine Learning, vol.8, issue.3/4, pp.279-292, 1992.