SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION

Carl Robinson; Nicolas Obin; Axel Roebel

Communication Dans Un Congrès Année : 2019

SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION

(1) , (2) , (3)

1
2
3

Carl Robinson

Fonction : Auteur

Sciences et Technologies de la Musique et du Son

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Sciences et Technologies de la Musique et du Son

Axel Roebel

Fonction : Auteur
PersonId : 4527
IdHAL : axel-roebel
ORCID : 0000-0001-6136-4391
IdRef : 227186079

Analyse et synthèse sonores [Paris]

Résumé

Voice interfaces are becoming wildly popular and driving demand for more advanced speech synthesis and voice transformation systems. Current text-to-speech methods produce realistic sounding voices, but they lack the emotional expressivity that listeners expect , given the context of the interaction and the phrase being spoken. Emotional voice conversion is a research domain concerned with generating expressive speech from neutral synthesised speech or natural human voice. This research investigated the effectiveness of using a sequence-to-sequence (seq2seq) encoder-decoder based model to transform the intonation of a human voice from neutral to expressive speech, with some preliminary introduction of linguistic conditioning. A subjective experiment conducted on the task of speech emotion recognition by listeners successfully demonstrated the effectiveness of the proposed sequence-to-sequence models to produce convincing voice emotion transformations. In particular, conditioning the model on the position of the syllable in the phrase significantly improved recognition rates.

Mots clés

Speech emotion conversion intonation Sequence-to-sequence models

Domaines

Traitement du signal et de l'image [eess.SP] Machine Learning [stat.ML]

Fichier principal

Voice_Emotion_Conversion(1).pdf (293.2 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-02018439

Soumis le : mercredi 13 février 2019-20:15:47

Dernière modification le : mercredi 30 octobre 2024-13:28:06

Archivage à long terme le : mardi 14 mai 2019-20:30:46

Dates et versions

hal-02018439 , version 1 (13-02-2019)

Identifiants

HAL Id : hal-02018439 , version 1

Citer

Carl Robinson, Nicolas Obin, Axel Roebel. SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION. IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2019, Brighton, United Kingdom. ⟨hal-02018439⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES

192 Consultations

836 Téléchargements

SEQUENCE-TO-SEQUENCE MODELLING OF F0 FOR SPEECH EMOTION CONVERSION

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager