A. Baranes and P. Oudeyer, The interaction of maturational constraints and intrinsic motivations in active motor development, 2011 IEEE International Conference on Development and Learning (ICDL), pp.1-8, 2011.
DOI : 10.1109/DEVLRN.2011.6037315
URL : https://hal.archives-ouvertes.fr/hal-00646585

L. W. Barsalou, Perceptual symbol systems, Behavioral and Brain Sciences, vol.22, issue.04, pp.577-660, 1999.
DOI : 10.1017/S0140525X99002149

L. W. Barsalou and J. J. Prinz, Mundane creativity in perceptual symbol systems., Conceptual structures and processes: Emergence, discovery, and change, 1997.
DOI : 10.1037/10227-011

Y. Bengio, Learning Deep Architectures for AI, Machine Learning, pp.1-127, 2009.
DOI : 10.1561/2200000006

Y. Bengio, A. Courville, P. Vincent, and U. Montreal, Representation Learning: A Review and New Perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1-34, 2012.
DOI : 10.1109/TPAMI.2013.50

Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol.5, issue.2, pp.157-166, 1994.
DOI : 10.1109/72.279181

M. Botvinick and J. Cohen, Rubber hands 'feel' touch that eyes see, Nature, issue.6669, p.391756, 1998.

R. Calandra, T. Raiko, M. P. Deisenroth, and F. M. Pouzols, Learning Deep Belief Networks from Non-stationary Streams, Artificial Neural Networks and Machine Learning?ICANN 2012, pp.379-386, 2012.
DOI : 10.1007/978-3-642-33266-1_47

L. Cayton, Algorithms for manifold learning, 2005.

C. Ciliberto, S. R. Fanello, M. Santoro, L. Natale, G. Metta et al., On the impact of learning hierarchical representations for visual recognition in robotics, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.3759-3764, 2013.
DOI : 10.1109/IROS.2013.6696893

D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, Deep Big Multilayer Perceptrons for Digit Recognition, Neural Networks Tricks of the Trade, vol.86, issue.11, pp.581-598, 2012.
DOI : 10.1109/5.726791

A. R. Damasio, The Brain Binds Entities and Events by Multiregional Activation from Convergence Zones, Neural Computation, vol.240, issue.1, pp.123-132, 1989.
DOI : 10.1016/0166-2236(83)90167-4

A. R. Damasio, Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition, Cognition, vol.33, issue.1-2, pp.25-62, 1989.
DOI : 10.1016/0010-0277(89)90005-X

A. R. Damasio, Category-related recognition defects as a clue to the neural substrates of knowledge, Trends in Neurosciences, vol.13, issue.3, pp.95-98, 1990.
DOI : 10.1016/0166-2236(90)90184-C

V. R. De-sa and D. H. Ballard, Perceptual learning from cross-modal feedback. Psychology of learning and motivation, pp.309-351, 1997.

V. R. De-sa and D. H. Ballard, Category Learning Through Multimodality Sensing, Neural Computation, vol.28, issue.5, pp.1097-1117, 1998.
DOI : 10.1037/0096-1523.16.2.391

O. Delalleau and Y. Bengio, Shallow versus Deep Sum-Product Networks, NIPS, pp.666-674, 2011.

D. Erhan, P. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, The difficulty of training deep architectures and the effect of unsupervised pretraining, AISTAT, 2009.

A. Falchier, S. Clavagnier, P. Barone, and H. Kennedy, Anatomical evidence of multimodal integration in primate striate cortex, The Journal of neuroscience : the official journal of the Society for Neuroscience, vol.22, issue.13, pp.5749-59, 2002.

A. Fort, C. Delpuech, J. Pernier, and M. H. Giard, Early auditory???visual interactions in human cortex during nonredundant target identification, Cognitive Brain Research, vol.14, issue.1, pp.20-30, 2002.
DOI : 10.1016/S0926-6410(02)00058-7

E. Freeman, J. Driver, D. Sagi, and L. Zhaoping, Top-Down Modulation of Lateral Interactions in Early Vision, Current Biology, vol.13, issue.11, pp.985-989, 2003.
DOI : 10.1016/S0960-9822(03)00333-6

R. M. French, Catastrophic forgetting in connectionist networks, Encyclopedia of Cognitive Science, 1991.

M. H. Giard and F. Peronnet, Auditory-Visual Integration during Multimodal Object Recognition in Humans: A Behavioral and Electrophysiological Study, Journal of Cognitive Neuroscience, vol.76, issue.5, pp.473-490, 1999.
DOI : 10.1016/0013-4694(75)90073-5

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, International Conference on Artificial Intelligence and Statistics (AISTATS10), pp.249-256, 2010.

R. Goldstone and L. Barsalou, Reuniting perception and conception, Cognition, vol.65, issue.2-3, pp.231-262, 1998.
DOI : 10.1016/S0010-0277(97)00047-4

R. L. Goldstone and A. T. Hendrickson, Categorical perception, Wiley Interdisciplinary Reviews: Cognitive Science, vol.137, issue.1, pp.69-78, 2010.
DOI : 10.1002/wcs.26

A. Graves, A. Rahman-mohamed, and G. E. Hinton, Speech recognition with deep recurrent neural networks, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6645-6649, 2013.
DOI : 10.1109/ICASSP.2013.6638947

M. Hermans and B. Schrauwen, Training and analysing deep recurrent neural networks, Advances in Neural Information Processing Systems 26, pp.190-198, 2013.

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.313504-507, 2006.
DOI : 10.1126/science.1127647

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, vol.78, issue.1, pp.193-218, 1985.
DOI : 10.1007/BF01908075

S. Ivaldi, M. Fumagalli, M. Randazzo, F. Nori, G. Metta et al., Computing robot internal/external wrenches by means of inertial, tactile and F/T sensors: Theory and implementation on the iCub, 2011 11th IEEE-RAS International Conference on Humanoid Robots, pp.521-528, 2011.
DOI : 10.1109/Humanoids.2011.6100813

S. Ivaldi, S. M. Nguyen, N. Lyubova, A. Droniou, V. Padois et al., Object Learning Through Active Exploration, IEEE Transactions on Autonomous Mental Development, vol.6, issue.1, pp.1-18, 2013.
DOI : 10.1109/TAMD.2013.2280614
URL : https://hal.archives-ouvertes.fr/hal-00919694

A. Jain, M. Murty, and P. Flynn, Data clustering: a review, ACM Computing Surveys, vol.31, issue.3, pp.264-323, 1999.
DOI : 10.1145/331499.331504

M. Johnsson, C. Balkenius, and G. Hesslow, Associative Self-organizing Map. IJCCI, pp.363-370, 2009.

D. Joyce, L. Richards, A. Cangelosi, and K. Coventry, On the foundations of perceptual symbol systems: Specifying embodied representations via connectionism, The Logic of Cognitive Systems. Proceedings of the Fifth International Conference on Cognitive Modeling, pp.147-152, 2003.

T. Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics, vol.13, issue.1, pp.59-69, 1982.
DOI : 10.1007/BF00337288

O. Kouropteva, O. Okun, and M. Pietikäinen, Incremental locally linear embedding, Pattern Recognition, vol.38, issue.10, pp.1764-1767, 2005.
DOI : 10.1016/j.patcog.2005.04.006

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems, pp.1106-1114, 2012.

S. Lallee and P. Dominey, Multi-modal convergence maps: from body schema and self-representation to mental imagery, Adaptive Behavior, vol.21, issue.4, pp.274-285, 2013.
DOI : 10.1177/1059712313488423

M. H. Law and A. K. Jain, Incremental nonlinear dimensionality reduction by manifold learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, issue.3, pp.377-91, 2006.
DOI : 10.1109/TPAMI.2006.56

Y. Lecun, K. Kavukvuoglu, and C. Farabet, Convolutional networks and applications in vision, Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pp.253-256, 2010.
DOI : 10.1109/ISCAS.2010.5537907

D. Lee and S. Seung, Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, pp.556-562, 2001.

H. Lee, A. Battle, R. Raina, and A. Y. Ng, Efficient sparse coding algorithms, Advances in Neural Information Processing Systems, 2006.

H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.609-616, 2009.
DOI : 10.1145/1553374.1553453

M. Lefort, Y. Boniface, and B. Girau, Self-organization of neural maps using a modulated BCM rule within a multimodal architecture, BICS, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00480028

S. Lemaignan, R. Ros, L. Msenlechner, R. Alami, and M. Beetz, ORO, a knowledge management platform for cognitive architectures in robotics, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.3548-3553, 2010.
DOI : 10.1109/IROS.2010.5649547

N. Lyubova and D. Filliat, Developmental approach for interactive object discovery, The 2012 International Joint Conference on Neural Networks (IJCNN), 2012.
DOI : 10.1109/IJCNN.2012.6252606
URL : https://hal.archives-ouvertes.fr/hal-00755298

J. Macqueen, Some methods for classification and analysis, Berkeley Symposium on Mathematical Statistics and Probability, pp.281-297, 1967.

O. Mangin and P. Oudeyer, Learning semantic components from subsymbolic multimodal perception, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL), pp.1-7, 2013.
DOI : 10.1109/DevLrn.2013.6652563
URL : https://hal.archives-ouvertes.fr/hal-00842453

J. Martens, Deep learning via hessian-free optimization, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.735-742, 2010.
DOI : 10.1007/978-3-642-35289-8_27
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.296.4704

H. Mcgurck and J. W. Macdonald, Hearing lips and seeing voices, Nature, vol.65, issue.5588, pp.246-248, 1976.
DOI : 10.1038/264746a0

R. Memisevic and G. E. Hinton, Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines, Neural Computation, vol.17, issue.6, pp.1473-1492, 2010.
DOI : 10.1007/3-540-47969-4_30

R. Memisevic, C. Zach, G. Hinton, and M. Pollefeys, Gated Softmax Classification, Advances in Neural Information Processing Systems, pp.1603-1611, 2010.

K. Meyer and A. Damasio, Convergence and divergence in a neural architecture for recognition and memory, Trends in Neurosciences, vol.32, issue.7, pp.376-382, 2009.
DOI : 10.1016/j.tins.2009.04.002

L. Montesano, M. Lopes, A. Bernardino, J. Santos, and . Victor, Learning Object Affordances: From Sensory--Motor Coordination to Imitation, IEEE Transactions on Robotics, vol.24, issue.1, pp.15-26, 2008.
DOI : 10.1109/TRO.2007.914848
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.7283

A. F. Morse, T. Belpaeme, A. Cangelosi, and L. B. Smith, Thinking with your body: Modelling spatial biases in categorization using a real humanoid robot, Proc. of 2010 annual meeting of the Cognitive Science Society, pp.1362-1368, 2010.

A. F. Morse, J. De-greeff, T. Belpeame, and A. Cangelosi, Epigenetic robotics architecture (era) Autonomous Mental Development, IEEE Transactions on, vol.2, issue.4, pp.325-339, 2010.
DOI : 10.1109/tamd.2010.2087020
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.697.1550

T. Nakamura, T. Nagai, and N. Iwahashi, Grounding of word meanings in multimodal concepts using LDA, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.3943-3948, 2009.
DOI : 10.1109/IROS.2009.5354736

T. Nakamura, T. Nagai, and N. Iwahashi, Bag of multimodal LDA models for concept formation, 2011 IEEE International Conference on Robotics and Automation, pp.6233-6238, 2011.
DOI : 10.1109/ICRA.2011.5980324

H. Narayanan and S. Mitter, Sample complexity of testing the manifold hypothesis, Advances in Neural Information Processing, 2010.

L. Natale, F. Nori, G. Metta, M. Fumagalli, S. Ivaldi et al., The icub platform: a tool for studying intrinsically motivated learning. In Intrinsically motivated learning in natural and artificial systems, pp.433-458, 2013.

A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems, pp.849-856, 2001.

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee et al., Multimodal Deep Learning, International Conference on Machine Learning, pp.689-696, 2011.

S. O. Hara and B. A. Draper, Introduction to the bag of features paradigm for image classification and retrieval, 2011.

B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, pp.3311-3325, 1997.

L. Pape, F. Gomez, M. Ring, and J. Schmidhuber, Modular deep belief networks that do not forget, The 2011 International Joint Conference on Neural Networks, pp.1191-1198, 2011.
DOI : 10.1109/IJCNN.2011.6033359

A. Papliski and L. Gustafsson, Multimodal FeedForward Self-organizing Maps, Computational Intelligence and Security, pp.81-88, 2005.
DOI : 10.1007/11596448_11

H. Poon and P. Domingos, Sum-product networks: A new deep architecture, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp.337-346, 2011.
DOI : 10.1109/ICCVW.2011.6130310

S. Reed and H. Lee, Learning Deep Representations via Multiplicative Interactions between Factors of Variation, NIPS Workshop, 2013.

B. Ridge, D. Skocaj, and A. Leonardis, Self-supervised cross-modal online learning of basic object affordances for developmental robotic systems, 2010 IEEE International Conference on Robotics and Automation, pp.5047-5054, 2010.
DOI : 10.1109/ROBOT.2010.5509544

S. Rifai, Y. N. Dauphin, P. Vincent, Y. Bengio, and X. Muller, The Manifold Tangent Classifier, Advances in Neural Information Processing Systems, pp.2294-2302, 2011.

S. Rifai, G. Mesnil, P. Vincent, X. Muller, Y. Bengio et al., Higher Order Contractive Auto-Encoder, ECML/PKDD, pp.645-660, 2011.
DOI : 10.1016/S0042-6989(97)00169-7

S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, Contractive Auto- Encoders: Explicit Invariance During Feature Extraction, Proceedings of the 28th International Conference on Machine Learning, pp.833-840, 2011.

R. R. Salakhutdinov, J. Tenenbaum, and A. Torralba, Learning to Learn with Compound HD Models, Advances in Neural Information Processing Systems, pp.2061-2069, 2011.

A. Salman and K. Chen, Exploring speaker-specific characteristics with deep learning, The 2011 International Joint Conference on Neural Networks, pp.103-110, 2011.
DOI : 10.1109/IJCNN.2011.6033207

A. Schneider, J. Sturm, C. Stachniss, M. Reisert, H. Burkhardt et al., Object identification with tactile sensors using bag-of-features, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.243-248, 2009.
DOI : 10.1109/IROS.2009.5354648

L. Smith and M. Gasser, The Development of Embodied Cognition: Six Lessons from Babies, Artificial Life, vol.45, issue.3, pp.13-30, 2005.
DOI : 10.1126/science.134.3491.1692

R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning, Dynamic pooling and unfolding recursive autoencoders for paraphrase detection, Advances in Neural Information Processing Systems, pp.801-809, 2011.

A. Stuhlsatz, J. Lippel, and T. Zielke, Discriminative feature extraction with Deep Neural Networks, The 2010 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2010.
DOI : 10.1109/IJCNN.2010.5596329

G. W. Taylor, G. E. Hinton, and S. T. Roweis, Two Distributed-State Models For Generating High-Dimensional Time Series, J. Mach. Learn. Res, vol.12, pp.1025-1068, 2011.

V. Tikhanoff, A. Cangelosi, and G. Metta, Integration of speech and action in humanoid robots: icub simulation experiments. Autonomous Mental Development, IEEE Transactions on, vol.3, issue.1, pp.17-29, 2011.

E. Ugur, E. Sahin, and E. Oztop, Affordance learning from range data for multi-step planning, EpiRob, 2009.

M. Vavre?ka and I. Farka?, A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language, Cognitive Computation, vol.8, issue.8???9, pp.1-12, 2013.
DOI : 10.1007/s12559-013-9212-5

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol, Extracting and composing robust features with denoising autoencoders, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1096-1103, 2008.
DOI : 10.1145/1390156.1390294

M. Waibel, M. Beetz, J. Civera, R. D-'andrea, J. Elfring et al., RoboEarth -A World Wide Web for Robots, IEEE Robotics and Automation Magazine, 2011.

J. Ward, Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol.58, issue.301, pp.236-244, 1963.
DOI : 10.1007/BF02289263

S. Wermter, C. Weber, M. Elshaw, C. Panchev, H. Erwin et al., Towards multimodal neural robot learning, Robotics and Autonomous Systems, vol.47, issue.2-3, pp.171-175, 2004.
DOI : 10.1016/j.robot.2004.03.011

C. Yu and D. H. Ballard, On the integration of grounding language and learning objects, AAAI, pp.488-493, 2004.

H. Zhao, P. C. Yuen, and J. T. Kwok, A novel incremental principal component analysis and its application for face recognition, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol.36, issue.4, pp.873-86, 2006.
DOI : 10.1109/TSMCB.2006.870645