On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification - Sorbonne Université
Communication Dans Un Congrès Année : 2014

On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

Résumé

This paper presents the first large-scale automatic voice casting system, and explores the adaptation of speaker recognition techniques to measure voice similarities. The proposed system is based on the representation of a voice by classes (e.g., age/gender, voice quality, emotion). First, a multi-label system is used to classify speech into classes. Then, the output probabilities for each class are concatenated to form a vector that represents the vocal signature of a speech recording. Finally, a similarity search is performed on the vocal signatures to determine the set of target actors that are the most similar to a speech recording of a source actor. In a subjective experiment conducted in the real-context of voice casting for video games, the multi-label system clearly outperforms standard speaker recognition systems. This indicates evidence that speech classes successfully capture the principal directions that are used in the perception of voice similarity.
Fichier principal
Vignette du fichier
ICASSP14_NO_AR_GB.pdf (118.18 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00943796 , version 1 (08-02-2014)
hal-00943796 , version 2 (15-02-2014)

Identifiants

  • HAL Id : hal-00943796 , version 2

Citer

Nicolas Obin, Axel Roebel, Grégoire Bachman. On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014, Florence, Italy. ⟨hal-00943796v2⟩
282 Consultations
379 Téléchargements

Partager

More