On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification - Sorbonne Université Access content directly
Conference Papers Year : 2014

On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

Abstract

This paper presents the first large-scale automatic voice casting system, and explores the adaptation of speaker recognition techniques to measure voice similarities. The proposed system is based on the representation of a voice by classes (e.g., age/gender, voice quality, emotion). First, a multi-label system is used to classify speech into classes. Then, the output probabilities for each class are concatenated to form a vector that represents the vocal signature of a speech recording. Finally, a similarity search is performed on the vocal signatures to determine the set of target actors that are the most similar to a speech recording of a source actor. In a subjective experiment conducted in the real-context of voice casting for video games, the multi-label system clearly outperforms standard speaker recognition systems. This indicates evidence that speech classes successfully capture the principal directions that are used in the perception of voice similarity.
Fichier principal
Vignette du fichier
ICASSP14_NO_AR_GB.pdf (118.18 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00943796 , version 1 (08-02-2014)
hal-00943796 , version 2 (15-02-2014)

Identifiers

  • HAL Id : hal-00943796 , version 2

Cite

Nicolas Obin, Axel Roebel, Grégoire Bachman. On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014, Florence, Italy. ⟨hal-00943796v2⟩
263 View
358 Download

Share

Gmail Facebook X LinkedIn More