P. Ekman, E. R. Sorenson, and W. V. Friesen, Pan-cultural elements in facial displays of emotion, Science, vol.164, issue.3875, p.5773719, 1969.

R. E. Jack, O. G. Garrod, H. Yu, R. Caldara, and P. G. Schyns, Facial expressions of emotion are not culturally universal, Proceedings of the National Academy of Sciences, vol.109, pp.7241-7244, 2012.

P. N. Juslin and P. Laukka, Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological bulletin, vol.129, p.12956543, 2003.

R. F. Murray, Classification images: A review, Journal of Vision, vol.11, issue.5, p.21536726, 2011.

R. Adolphs, L. Nummenmaa, A. Todorov, and J. V. Haxby, Data-driven approaches in the investigation of social perception, Phil Trans R Soc B, vol.371, p.20150367, 1693.

D. Boer, E. Kuyper, and P. , Triggered correlation, IEEE Transactions on Biomedical Engineering, issue.3, p.5667803, 1968.

P. Z. Marmarelis and K. I. Naka, White-noise analysis of a neuron chain: an application of the Wiener theory, Science, vol.175, issue.4027, p.5061252, 1972.

J. J. Eggermont, P. Johannesma, and A. Aertsen, Reverse-correlation methods in auditory research, Quarterly reviews of biophysics, vol.16, issue.03, p.6366861, 1983.

D. Ringach and R. Shapley, Reverse correlation in neurophysiology, Cognitive Science, vol.28, issue.2, pp.147-166, 2004.

A. Ahumada and J. Lovell, Stimulus features in signal detection, Journal of the Acoustical Society of America, vol.49, issue.6B, pp.1751-1756, 1971.

E. Ponsot, P. Susini, S. Pierre, G. Meunier, and S. , Temporal loudness weights for sounds with increasing and decreasing intensity profiles, The Journal of the Acoustical Society of America, vol.134, issue.4, p.24116537, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00946815

M. C. Mangini and I. Biederman, Making the ineffable explicit: Estimating the information employed for face classifications, Cognitive Science, vol.28, issue.2, pp.209-226, 2004.

F. Gosselin and P. G. Schyns, Bubbles: a technique to reveal the use of information in recognition tasks, Vision research, vol.41, issue.17, p.11448718, 2001.

R. Dotsch and A. Todorov, Reverse correlating social face perception, Social Psychological and Personality Science, vol.3, issue.5, pp.562-571, 2012.

R. Adolphs, F. Gosselin, T. W. Buchanan, D. Tranel, P. Schyns et al., A mechanism for impaired fear recognition after amygdala damage, Nature, vol.433, issue.7021, p.15635411, 2005.

J. H. Venezia, G. Hickok, and V. M. Richards, Auditory "bubbles": Efficient classification of the spectrotemporal modulations essential for speech intelligibility, The Journal of the Acoustical Society of America, vol.140, issue.2, p.27586738, 2016.

M. I. Mandel, S. E. Yoho, and E. W. Healy, Measuring time-frequency importance functions of speech with bubble noise, Journal of the Acoustical Society of America, vol.140, p.27794278, 2016.

L. Varnet, T. Wang, C. Peter, F. Meunier, and M. Hoen, How musical expertise shapes speech perception: evidence from auditory classification images, Scientific reports, vol.5, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01229543

W. O. Brimijoin, M. A. Akeroyd, E. Tilbury, and B. Porr, The internal representation of vowel spectra investigated using behavioral response-triggered averaging, The Journal of the Acoustical Society of America, vol.133, issue.2, p.23363191, 2013.

V. Isnard, C. Suied, and G. Lemaitre, Auditory bubbles reveal sparse time-frequency cues subserving identification of musical voices and instruments, In: Meeting of the Acoustical Society of America. vol, vol.140, p.3267, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01466181

E. Thoret, P. Depalle, and S. Mcadams, Perceptually Salient Regions of the Modulation Power Spectrum for Musical Instrument Identification, Frontiers in psychology, vol.8, p.28450846, 2017.

F. Gosselin and P. G. Schyns, Superstitious perceptions reveal properties of internal representations, Psychological Science, vol.14, issue.5, p.12930484, 2003.

L. Varnet, T. Wang, C. Peter, F. Meunier, and M. Hoen, How musical expertise shapes speech perception: evidence from auditory classification images, Scientific reports, vol.5, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01229543

E. B. Roesch, L. Tamarit, L. Reveret, D. Grandjean, D. Sander et al., FACSGen: A tool to synthesize emotional facial expressions through systematic manipulation of facial action units, Journal of Nonverbal Behavior, vol.35, issue.1, pp.1-16, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00516995

H. Yu, O. G. Garrod, and P. G. Schyns, Perception-driven facial expression synthesis, Computers & Graphics, vol.36, issue.3, pp.152-162, 2012.

M. Rychlowska, R. E. Jack, O. G. Garrod, P. G. Schyns, J. D. Martin et al., Functional smiles: Tools for love, sympathy, and war. Psychological science, vol.28, p.28741981, 2017.

R. E. Jack, O. G. Garrod, and P. G. Schyns, Dynamic facial expressions of emotion transmit an evolving hierarchy of signals over time, Current biology, vol.24, issue.2, p.24388852, 2014.

T. Stivers, An overview of the question-response system in American English conversation, Journal of Pragmatics, vol.42, issue.10, pp.2772-2781, 2010.

J. R. Saffran, L. E. Newport, and R. N. Aslin, Word segmentation: The role of distributional cues, Journal of Memory and Language, vol.35, pp.606-621, 1996.

G. Kochanski, E. Grabe, J. Coleman, and B. Rosner, Loudness predicts prominence: Fundamental frequency lends little, The Journal of the Acoustical Society of America, vol.118, issue.2, p.16158659, 2005.

E. Ponsot, J. J. Burred, P. Belin, and J. J. Aucouturier, Cracking the social code of speech prosody using reverse correlation, Proceedings of the National Academy of Sciences, p.201716090, 2018.

E. Ponsot, P. Arias, and J. J. Aucouturier, Uncovering mental representations of smiled speech using reverse correlation, The Journal of the Acoustical Society of America, vol.143, issue.1, p.29390775, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01712385

M. Portnoff, Implementation of the digital phase vocoder using the fast Fourier transform, IEEE Trans Acoustics, Speech and Signal Processing, vol.24, issue.3, pp.243-248, 1976.

M. Dolson, The phase vocoder: A tutorial, Computer Music Journal, vol.10, pp.14-27, 1986.

J. Laroche and M. Dolson, Improved phase vocoder time-scale modification of audio, IEEE Trans Speech and Audio Processing, vol.7, issue.3, pp.323-332, 1999.

M. Liuni and A. Roebel, Phase vocoder and beyond, Musica, Tecnologia, vol.7, pp.73-120, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01250848

C. Gussenhoven, The phonology of tone and intonation, 2004.

J. J. Ohala, An ethological perspective on common cross-language utilization of F0 of voice, Phonetica, vol.41, p.6204347, 1984.

R. L. Mitchell and E. D. Ross, Attitudinal prosody: What we know and directions for future study, Neuroscience & Biobehavioral Reviews, vol.37, pp.471-479, 2013.

X. Jiang and M. D. Pell, The sound of confidence and doubt, Speech Communication, vol.88, pp.106-126, 2017.

C. Tang, L. Hamilton, and C. E. , Intonational speech prosody encoding in the human auditory cortex. Science, vol.357, p.28839071, 2017.

T. Stivers, An overview of the question-response system in American English conversation, Journal of Pragmatics, vol.42, issue.10, pp.2772-2781, 2010.

R. Geluykens, On the myth of rising intonation in polar questions, Journal of Pragmatics, vol.12, issue.4, pp.90006-90015, 1988.

F. Liu and Y. Xu, Question intonation as affected by word stress and focus in English, Proceedings of the 16th international congress of phonetic sciences. International Congress of Phonetic Sciences Saarbrücken, pp.1189-1192, 2007.

A. Ahumada and J. Lovell, Stimulus features in signal detection, The Journal of the Acoustical Society of America, vol.49, issue.6B, pp.1751-1756, 1971.

D. J. Povel and P. Essens, Perception of temporal patterns, An Interdisciplinary Journal, vol.2, issue.4, pp.411-440, 1985.

D. J. Levitin and P. R. Cook, Memory for musical tempo: Additional evidence that auditory memory is absolute, Perception & Psychophysics, vol.58, issue.6, pp.927-935, 1996.

K. B. Doelling and D. Poeppel, Cortical entrainment to music and its modulation by expertise, Proceedings of the National Academy of Sciences, vol.112, issue.45, pp.6233-6242, 2015.

C. Palmer, Journal of experimental psychology: human perception and performance, vol.15, p.2525602, 1989.

B. H. Repp, Probing the cognitive representation of musical time: Structural constraints on the perception of timing perturbations, Cognition, vol.44, issue.3, p.1424494, 1992.

H. E. Kragness and L. J. Trainor, Listeners lengthen phrase boundaries in self-paced music, Journal of Experimental Psychology: Human Perception and Performance, vol.42, issue.10, p.27379872, 2016.

R. Brauneis, Copyright and the World's Most Popular Song, J Copyright Soc'y USA, vol.56, p.335, 2008.

L. N. Law and M. Zentner, Assessing musical abilities objectively: Construction and validation of the Profile of Music Perception Skills, PloS one, vol.7, issue.12, p.23285071, 2012.

A. Burgess and B. Colborne, Visual signal detection, IV. Observer inconsistency. JOSA A, vol.5, issue.4, pp.617-627, 1988.

P. Neri, How inherently noisy is human sensory processing?, Psychonomic Bulletin & Review, vol.17, issue.6, pp.802-808, 2010.

A. Penel and C. Drake, Timing variations in music performance: Musical communication, perceptual compensation, and/or motor control?, Perception & Psychophysics, vol.66, issue.4, pp.545-562, 2004.

R. Adolphs, F. Gosselin, T. W. Buchanan, D. Tranel, P. G. Schyns et al., A mechanism for impaired fear recognition after amygdala damage, Nature, vol.433, p.15635411, 2005.

J. Jiang, X. Liu, X. Wan, and C. Jiang, Perception of Melodic Contour and Intonation in Autism Spectrum Disorder: Evidence From Mandarin Speakers, Journal of autism and developmental disorders, vol.45, p.25636678, 2015.

A. P. Pinheiro, E. Del-re, J. Mezin, P. G. Nestor, A. Rauber et al., Sensory-based and higherorder operations contribute to abnormal emotional prosody processing in schizophrenia: an electrophysiological investigation, Psychological Medicine, vol.43, p.22781212, 2013.

F. Liu, A. D. Patel, A. Fourcin, and L. Stewart, Intonation processing in congenital amusia: discrimination, identification and imitation, Brain, vol.133, p.20418275, 2010.

D. A. Sauter, F. Eisner, P. Ekman, and S. K. Scott, Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations, Proceedings of the National Academy of Sciences, vol.107, issue.6, pp.2408-2412, 2010.

P. Arias, C. Soladie, O. Bouafif, A. Robel, R. Seguier et al., Realistic transformation of facial and vocal smiles in real-time audiovisual streams, IEEE Transactions on Affective Computing, 2018.

A. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, Autoencoding beyond pixels using a learned similarity metric, 2015.