Skip to Main content Skip to Navigation
Conference papers

SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION

Abstract : Voice interfaces are becoming wildly popular and driving demand for more advanced speech synthesis and voice transformation systems. Current text-to-speech methods produce realistic sounding voices, but they lack the emotional expressivity that listeners expect , given the context of the interaction and the phrase being spoken. Emotional voice conversion is a research domain concerned with generating expressive speech from neutral synthesised speech or natural human voice. This research investigated the effectiveness of using a sequence-to-sequence (seq2seq) encoder-decoder based model to transform the intonation of a human voice from neutral to expressive speech, with some preliminary introduction of linguistic conditioning. A subjective experiment conducted on the task of speech emotion recognition by listeners successfully demonstrated the effectiveness of the proposed sequence-to-sequence models to produce convincing voice emotion transformations. In particular, conditioning the model on the position of the syllable in the phrase significantly improved recognition rates.
Complete list of metadatas

Cited literature [29 references]  Display  Hide  Download

https://hal.sorbonne-universite.fr/hal-02018439
Contributor : Nicolas Obin <>
Submitted on : Wednesday, February 13, 2019 - 8:15:47 PM
Last modification on : Saturday, March 23, 2019 - 1:39:33 AM
Long-term archiving on: : Tuesday, May 14, 2019 - 8:30:46 PM

File

Voice_Emotion_Conversion(1).pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02018439, version 1

Citation

Carl Robinson, Nicolas Obin, Axel Roebel. SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION. IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2019, Brighton, United Kingdom. ⟨hal-02018439⟩

Share