K. R. Scherer, R. Banse, H. G. Wallbott, and T. Goldbeck, Vocal cues in emotion encoding and decoding, Motivation and Emotion, vol.15, pp.123-148, 1991.

P. Taylor, Text-to-Speech Synthesis, 2009.

B. Gerazov and G. Bailly, Pysfc-a system for prosody analysis based on the superposition of functional contours prosody model, International Conference on Speech Prosody, pp.774-778, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01821214

J. Teutenberg, C. Watson, and P. Riddle, Modelling and Synthesising F0 contours with the Discrete Cosine Transform, International Conference on Acoustics, Speech, and Signal Processing, pp.3973-3976, 2008.

N. Obin and J. Belião, Sparse coding of pitch contours with deep auto-encoders, International Conference on Speech Prosody, pp.799-803, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01722007

B. Gerazov, G. Bailly, O. Mohammed, and P. N. Garner, A variational prosody model for the decomposition and synthesis of speech prosody, Speech Prosody, 2018.

Z. Luo, J. Chen, T. Takiguchi, and Y. Ariki, Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform, EURASIP Journal on Audio, Speech, and Music Processing, vol.2017, issue.1, 2017.

A. Black, H. Zen, and K. Tokuda, Statistical parametric speech synthesis, International Conference on Audio, Speech, and Signal Processing, pp.1229-1232, 2007.

J. Latorre and M. Akamine, Multilevel parametric-base F0 model for speech synthesis, pp.2274-2277, 2008.

N. Obin, A. Lacheret, and X. Rodet, Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations, Interspeech, pp.2029-2032, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00598144

N. Obin, MeLos: Analysis and Modelling of Speech Prosody and Speaking Style, 2011.
URL : https://hal.archives-ouvertes.fr/tel-00694687

C. Veaux and X. Rodet, Intonation conversion from neutral to expressive speech, pp.2765-2768, 2011.

X. Yin, M. Lei, Y. Qian, F. K. Soong, L. He et al., Modeling F0 trajectories in hierarchically structured deep neural networks, Speech Communication, vol.76, pp.82-92, 2016.

H. Zen, Statistical parametric speech synthesis: from HMM to LSTM-RNN, 2015.

Y. Fan, Y. Qian, F. Xie, and F. K. Soong, TTS synthesis with bidirectional LSTM based recurrent neural networks," Interspeech, 2014.

H. Zen and H. Sak, Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

H. Ming, D. Huang, L. Xie, J. Wu, M. Dong et al., Deep bidirectional LSTM modeling of timbre and prosody for emotional voice conversion, 2016.

R. Li, Z. Wu, H. Meng, and L. Cai, DBLSTM-based multitask learning for pitch transformation in voice conversion, 2016 10th International Symposium on Chinese Spoken Language Processing, 2016.

X. Wang, S. Takaki, and J. Yamagishi, An RNN-Based quantized F0 model with Multi-Tier feedback links for Text-toSpeech synthesis, 2017.

S. Ronanki, G. E. Henter, Z. Wu, and S. King, A TemplateBased approach for speech synthesis intonation generation using LSTMs, 2016.

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, Advances in Neural Information Processing Systems (NIPS), 2014.

Y. Wang, R. J. Skerry-ryan, D. Stanton, Y. Wu, R. J. Weiss et al., Tacotron: A fully end-to-end text-to-speech synthesis model, 2017.

V. Wan, Y. Agiomyrgiannakis, H. Silen, and J. Vít, Google's Next-Generation Real-Time Unit-Selection synthesizer using Sequence-to-Sequence LSTM-Based autoencoders, 2017.

A. Van-den-oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals et al., WaveNet: A generative model for raw audio, 2016.

K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation, Empirical Methods in Natural Language Processing (EMNLP), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, International Conference on Learning Representations (ICLR), 2014.

C. Raffel, M. Luong, P. J. Liu, R. J. Weiss, and D. Eck, Online and linear-time attention by enforcing monotonic alignments, International Conference on Machine Learning (ICML), 2017.

H. Sak, M. Shannon, K. Rao, and F. Beaufays, Recurrent neural aligner: An encoder-decoder neural network model for sequence to sequence mapping, pp.1298-1302, 2017.

P. Lanchantin, A. Morris, X. Rodet, and C. Veaux, Automatic Phoneme Segmentation with Relaxed Textual Constraints, International Conference on Language Resources and Evaluation, pp.2403-2407, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01161385