N. Obin, MeLos: Analysis and Modelling of Speech Prosody and Speaking Style, 2011.
URL : https://hal.archives-ouvertes.fr/tel-00694687

N. Obin, J. Beliao, C. Veaux, and A. Lacheret, SLAM: Automatic Stylization and Labelling of Speech Melody Available: https://github, Speech Prosody, pp.246-250, 2014.

A. Rosenberg, AutoBI-a tool for automatic toBI annotation, Interspeech, pp.146-149, 2010.

R. Dall and X. Gonzalvo, JNDSLAM: A SLAM extension for speech synthesis, Speech Prosody 2016, pp.1024-1028, 2016.
DOI : 10.21437/SpeechProsody.2016-210

S. Ronanki, G. E. Henter, Z. Wu, and S. King, A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs, Interspeech 2016, pp.2463-2467, 2016.
DOI : 10.21437/Interspeech.2016-96

J. A. Louw, A. Moodley, and A. Govender, The Speect text-to-speech entry for the Blizzard Challenge, Interspeech, 2016.

E. Grabe, G. Kochanski, and J. Coleman, Quantitative modelling of intonational variation, Speech Analysis and Recognition in Technology, Linguistics and Medicine, pp.1-23, 1994.

P. Taylor, Analysis and synthesis of intonation using the Tilt model, The Journal of the Acoustical Society of America, vol.107, issue.3, pp.1697-1714, 2000.
DOI : 10.1121/1.428453

T. Mishra, J. Van-santen, and E. Klabbers, Decomposition of Pitch Curves in the General Superpositional Intonation Model, Speech Prosody, 2006.

J. Teutenberg, C. Watson, and P. Riddle, Modelling and synthesising F0 contours with the discrete cosine transform, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3973-3976, 2008.
DOI : 10.1109/ICASSP.2008.4518524

D. Lolive, N. Barbot, and O. Boëffard, Melodic contour estimation with B-spline models using a MDL criterion, International Conference on Speech and Computer, pp.333-338, 2006.
URL : https://hal.archives-ouvertes.fr/hal-01199087

N. Obin, A. Lacheret, and X. Rodet, Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations, Interspeech, pp.2029-2032, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00598144

X. Yin, M. Lei, Y. Qian, F. K. Soong, L. He et al., Modeling F0 trajectories in hierarchically structured deep neural networks, Speech Communication, vol.76, pp.82-92, 2016.
DOI : 10.1016/j.specom.2015.10.007

E. Klabbers and J. P. Van-santen, Clustering of foot-based pitch contours in expressive speech, ISCA Speech Synthesis Workshop, pp.73-78, 2004.

O. Migliore and N. Obin, At the interface of speech and music: A study of prosody and musical prosody in popular music, submitted to Speech Prosody, 2018.

D. Sacha, Y. Asano, C. Rohrdantz, F. Hamborg, D. Keim et al., Self Organizing Maps for the Visual Analysis of Pitch Contours, Nordic Conference of Computational Linguistics, pp.181-189, 2015.

]. Y. Asano, M. Gubian, and S. Dominik, Cutting down on manual pitch contour annotation using data modelling, Speech Prosody 2016, pp.282-286, 2016.
DOI : 10.21437/SpeechProsody.2016-58

M. Gubian, F. Cangemi, and L. Boves, Automatic and Data Driven Pitch Contour Manipulation with Functional Data Analysis, Speech Prosody, pp.181-189, 2010.

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.504-507, 2006.
DOI : 10.1126/science.1127647

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layer-wise training of deep networks, International Conference on Neural Information Processing Systems (NIPS), pp.153-160, 2006.

D. Arpit, Y. Zhou, H. Ngo, and V. Govindaraj, Why regularized auto-encoders learn sparse representation, International Conference on Machine Learning (ICML), 2016.

M. A. Ranzato, Y. Boureau, and Y. L. Cun, Sparse feature learning for deep belief networks, International Conference on Neural Information Processing Systems (NIPS), pp.1185-1192, 2007.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol.15, pp.1929-1958, 2014.

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010.

V. Zue, S. Seneff, and J. Glass, Speech database development at MIT: Timit and beyond, Speech Communication, vol.9, issue.4, pp.351-356, 1990.
DOI : 10.1016/0167-6393(90)90010-7

D. E. Rumelhart, G. E. Hinton, and V. Williams, Learning representations by back-propagating errors, Nature, vol.85, issue.6088, pp.533-536, 1986.
DOI : 10.1038/323533a0

W. Fisher, Tsylb Syllabification Package Available, 1996.

N. Obin, F. Lamare, and A. , Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
DOI : 10.1109/ICASSP.2013.6638958

URL : https://hal.archives-ouvertes.fr/hal-00943799

A. Camacho, SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music, 2007.
DOI : 10.1121/1.2951592

URL : http://www.cise.ufl.edu/~acamacho/publications/dissertation.pdf

P. J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, vol.20, pp.53-65, 1987.
DOI : 10.1016/0377-0427(87)90125-7

P. D. Allison, Multiple Regression: A Primer, 1998.