F. G. Ashby, B. O. Turner, and J. C. Horvitz, Cortical and basal ganglia contributions to habit learning and automaticity, Trends in Cognitive Sciences, vol.14, issue.5, 2010.
DOI : 10.1016/j.tics.2010.02.001
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2862890

A. Baddeley, Working memory, Science, vol.255, issue.5044, pp.556-559, 1992.
DOI : 10.1126/science.1736359

B. W. Balleine, M. R. Delgado, and O. Hikosaka, The Role of the Dorsal Striatum in Reward and Decision-Making, Journal of Neuroscience, vol.27, issue.31, pp.8161-8165, 2007.
DOI : 10.1523/JNEUROSCI.1554-07.2007

B. W. Balleine, O. Doherty, and J. P. , Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action, Neuropsychopharmacology, vol.20, issue.1, pp.48-69, 2009.
DOI : 10.1016/S0149-7634(99)00065-2

M. Botvinick and A. Weinstein, Model-based hierarchical reinforcement learning and human action control, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.11, issue.2, 2014.
DOI : 10.1371/journal.pone.0030284
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4186233

A. Brovelli, N. Laksiri, B. Nazarian, M. Meunier, and D. Boussaoud, Understanding the Neural Computations of Arbitrary Visuomotor Learning through fMRI and Associative Learning Theory, Cerebral Cortex, vol.18, issue.7, pp.1485-1495, 2008.
DOI : 10.1093/cercor/bhm198

A. Brovelli, B. Nazarian, M. Meunier, and D. Boussaoud, Differential roles of caudate nucleus and putamen during instrumental learning, NeuroImage, vol.57, issue.4, 2011.
DOI : 10.1016/j.neuroimage.2011.05.059

K. Caluwaerts, M. Staffa, S. N-'guyen, C. Grand, L. Dollé et al., A biologically inspired meta-control navigation system for the Psikharpax rat robot, Bioinspiration & Biomimetics, vol.7, issue.2, pp.1-29, 2012.
DOI : 10.1088/1748-3182/7/2/025009
URL : https://hal.archives-ouvertes.fr/hal-01000945

R. H. Carpenter, B. A. Reddi, A. , and A. J. , A simple two-stage model predicts response time distributions, The Journal of Physiology, vol.71, issue.16, pp.4051-4062, 2009.
DOI : 10.1113/jphysiol.2009.173955

R. Chavarriaga, T. Strösslin, D. Sheynikhovich, and W. Gerstner, A Computational Model of Parallel Navigation Systems in Rodents, Neuroinformatics, vol.3, issue.3, pp.223-241, 2005.
DOI : 10.1385/NI:3:3:223

A. Collins and M. Frank, How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis, European Journal of Neuroscience, vol.22, issue.7, 2012.
DOI : 10.1111/j.1460-9568.2011.07980.x

N. Daw, Trial-by-trial data analysis using computational models, " in Decision Making, Affect, and Learning: Attention and Performance XXIII, pp.1-26, 2011.

N. D. Daw, Y. Niv, and P. Dayan, Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nature Neuroscience, vol.58, issue.12, pp.1704-1711, 1038.
DOI : 10.1038/nn1560

K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, A fast elitist nondominated sorting genetic algorithm for multi-objective optimization, pp.nsga-ii, 2000.

A. Dezfouli and B. A. Balleine, Habits, action sequences and reinforcement learning Actions and habits: the development of behavioural autonomy, Eur. J. Neurosci. Philos. Trans. R. Soc. Lond. B Biol. Sci, vol.35, issue.308, pp.1036-1051, 1985.

A. Dickinson and B. Balleine, Actions and responses: the dual psychology of behaviour, Spatial Representation: Problems in Philosophy and Psychology, pp.277-293, 1993.

A. Dickinson and B. Balleine, Motivational control of goal-directed action, Animal Learning & Behavior, vol.19, issue.1, pp.1-18, 1994.
DOI : 10.3758/BF03199951

L. Dollé, D. Sheynikhovich, B. Girard, R. Chavarriaga, A. Guillot et al., Path planning versus cue responding: a bio-inspired model of switching between navigation strategies, Biological Cybernetics, vol.16, issue.6, pp.299-317, 2010.
DOI : 10.1007/s00422-010-0400-z

M. Donoso, A. G. Collins, and E. Koechlin, Foundations of human reasoning in the prefrontal cortex, Science, vol.344, issue.6191, pp.1481-1486, 2014.
DOI : 10.1126/science.1252254

K. Doya, What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw, pp.961-974, 1999.

A. Emrouznejad and M. Marra, Ordered Weighted Averaging Operators 1988-2014: A Citation-Based Literature Survey, International Journal of Intelligent Systems, vol.37, issue.9, pp.994-1014, 2014.
DOI : 10.1002/int.21673

K. Enomoto and N. Matsumoto, Dopamine neurons learn to encode the long-term value of multiple future rewards, Proceedings of the National Academy of Sciences, vol.108, issue.37, pp.15462-15467, 2011.
DOI : 10.1073/pnas.1014457108

M. Geist, O. Pietquin, and G. Fricout, Kalman Temporal Differences: The deterministic case, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp.185-192, 2009.
DOI : 10.1109/ADPRL.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00380870

J. Gläscher, N. Daw, P. Dayan, O. Doherty, and J. , States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning, Neuron, vol.66, issue.4, pp.585-595, 2010.
DOI : 10.1016/j.neuron.2010.04.016

A. Graybiel, Habits, Rituals, and the Evaluative Brain, Annual Review of Neuroscience, vol.31, issue.1, pp.359-387, 2008.
DOI : 10.1146/annurev.neuro.29.051605.112851

A. Jauffret, N. Cuperlier, P. Gaussier, and P. Tarroux, From selfassessment to frustration, a small step towards autonomy in robotic navigation, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00862838

M. Keramati, A. Dezfouli, and P. Piray, Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes, PLoS Computational Biology, vol.35, issue.5, 2011.
DOI : 10.1371/journal.pcbi.1002055.t002

M. Khamassi and M. Humphries, Integrating cortico-limbicbasal ganglia architectures for learning model-based and model-free navigation strategies, Front. Behav. Neurosci, 2012.

M. Khamassi, R. Quilodran, P. Enel, P. F. Dominey, E. E. Procyk et al., Behavioral Regulation and the Modulation of Information Coding in the Lateral Prefrontal and Cingulate Cortex, Cerebral Cortex, vol.25, issue.9, pp.3197-3218, 2007.
DOI : 10.1093/cercor/bhu114
URL : https://hal.archives-ouvertes.fr/hal-01219972

S. W. Lee, S. Shimojo, O. Doherty, and J. P. , Neural Computations Underlying Arbitration between Model-Based and Model-free Learning, Neuron, vol.81, issue.3, 2014.
DOI : 10.1016/j.neuron.2013.11.028
URL : http://doi.org/10.1016/j.neuron.2013.11.028

F. Lesaint, O. Sigaud, S. B. Flagel, T. E. Robinson, and M. Khamassi, Modelling Individual Differences in the Form of Pavlovian Conditioned Approach Responses: A Dual Learning Systems Approach with Factored Representations, PLoS Computational Biology, vol.6, issue.2, 2014.
DOI : 10.1371/journal.pcbi.1003466.s010

J. Liénard and B. Girard, A biologically constrained model of the whole basal ganglia addressing the paradoxes of connections and selection, Journal of Computational Neuroscience, vol.4, issue.11, pp.445-468, 2014.
DOI : 10.1007/s10827-013-0476-2

R. Morey, A Bayesian hierarchical model for the measurement of working memory capacity, Journal of Mathematical Psychology, vol.55, issue.1, pp.8-24, 2011.
DOI : 10.1016/j.jmp.2010.08.008

J. Mouret and S. Doncieux, Sferes v2: evolvin' in the multi-core world, WCCI 2010 IEEE World Congress on Computational Intelligence, pp.4079-4086, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00687633

J. Mouret and S. Doncieux, Encouraging Behavioral Diversity in Evolutionary Robotics: An Empirical Study, Evolutionary Computation, vol.341, issue.1, pp.91-133, 2012.
DOI : 10.1016/0020-0190(92)90136-J
URL : https://hal.archives-ouvertes.fr/hal-00687609

K. Norwich, Information, Sensation and Perception, 2003.

O. Doherty, J. Dayan, P. Schultz, J. Deichmann, R. Friston et al., Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning, Science, vol.304, issue.5669, pp.452-454, 2004.
DOI : 10.1126/science.1094285

G. Pezzulo, F. Rigoli, C. , and F. , The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation, Frontiers in Psychology, vol.4, 2013.
DOI : 10.3389/fpsyg.2013.00092

R. Quilodran, M. Rothé, and E. Procyk, Behavioral Shifts and Action Valuation in the Anterior Cingulate Cortex, Neuron, vol.57, issue.2, 2008.
DOI : 10.1016/j.neuron.2007.11.031
URL : https://hal.archives-ouvertes.fr/inserm-00906686

B. A. Reddi and R. H. Carpenter, The influence of urgency on decision time, Nat. Neurosci, vol.3, pp.827-830, 2000.

A. D. Redish, S. Jensen, J. , A. E. Girard, B. Chatila et al., A unified framework for addiction: vulnerabilities in the decision process Design of a control architecture for habit learning in robots, Biomimetic & Biohybrid Systems, Third International Conference, Living Machines 2014, pp.415-437, 2008.

E. Renaudo, B. Girard, R. Chatila, and M. Khamassi, Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots?, 2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2015.
DOI : 10.1109/DEVLRN.2015.7346152
URL : https://hal.archives-ouvertes.fr/hal-01312449

R. Rescorla, Associations of multiple outcomes with an instrumental response., Journal of Experimental Psychology: Animal Behavior Processes, vol.17, issue.4, 1991.
DOI : 10.1037/0097-7403.17.4.465

K. Samejima and K. Doya, Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops, Annals of the New York Academy of Sciences, vol.20, issue.1, pp.213-228, 2007.
DOI : 10.1038/nrn1884

W. Schultz, P. Dayan, M. , and P. , A Neural Substrate of Prediction and Reward, Science, vol.275, issue.5306, 1997.
DOI : 10.1126/science.275.5306.1593

G. Schwarz, Estimating the Dimension of a Model, The Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.
DOI : 10.1214/aos/1176344136

J. Staddon, C. , and D. , Operant Conditioning, Annual Review of Psychology, vol.54, issue.1, pp.115-144, 2003.
DOI : 10.1146/annurev.psych.54.101601.145124

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Watkins, P. Dayan, M. A. Wiering, and H. Van-hasselt, Technical Note, Mach. Learn, vol.292, pp.279-292, 1992.
DOI : 10.1007/978-1-4615-3618-5_4

A. Wierzbicki, On the completeness and constructiveness of parametric characterizations to vector optimization problems, Operations-Research-Spektrum, vol.22, issue.2, pp.73-87, 1986.
DOI : 10.1287/mnsc.22.6.652

S. P. Wise, M. , and E. A. , Arbitrary associations between antecedents and actions, Trends in Neurosciences, vol.23, issue.6, pp.271-276, 2000.
DOI : 10.1016/S0166-2236(00)01570-8

R. Yager, Generalized OWA Aggregation Operators, Fuzzy Optimization and Decision Making, vol.3, issue.1, pp.93-107, 2004.
DOI : 10.1023/B:FODM.0000013074.68765.97

H. H. Yin, B. J. Knowlton, and B. W. Balleine, Inactivation of dorsolateral striatum enhances sensitivity to changes in the action???outcome contingency in instrumental conditioning, Behavioural Brain Research, vol.166, issue.2, pp.189-196, 2006.
DOI : 10.1016/j.bbr.2005.07.012

H. H. Yin, S. B. Ostlund, B. W. Balleine, and L. Thiele, Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach, Eur. J. Neurosci. IEEE Trans. Evol. Comput, vol.284235, issue.3, pp.1437-1448, 1109.