On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

Nicolas Obin; Axel Roebel; Grégoire Bachman

Communication Dans Un Congrès Année : 2014

On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

(1) , (1) , (2)

1
2

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Sciences et Technologies de la Musique et du Son

Axel Roebel

Fonction : Auteur
PersonId : 4527
IdHAL : axel-roebel
ORCID : 0000-0001-6136-4391
IdRef : 227186079

Sciences et Technologies de la Musique et du Son

Grégoire Bachman

Fonction : Auteur

ExeQuo

Résumé

This paper presents the first large-scale automatic voice casting system, and explores the adaptation of speaker recognition techniques to measure voice similarities. The proposed system is based on the representation of a voice by classes (e.g., age/gender, voice quality, emotion). First, a multi-label system is used to classify speech into classes. Then, the output probabilities for each class are concatenated to form a vector that represents the vocal signature of a speech recording. Finally, a similarity search is performed on the vocal signatures to determine the set of target actors that are the most similar to a speech recording of a source actor. In a subjective experiment conducted in the real-context of voice casting for video games, the multi-label system clearly outperforms standard speaker recognition systems. This indicates evidence that speech classes successfully capture the principal directions that are used in the perception of voice similarity.

Mots clés

speaker recognition speech classification voice casting voice similarity

Domaines

Traitement du signal et de l'image [eess.SP] Son [cs.SD] Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP] Applications [stat.AP]

Fichier principal

ICASSP14_NO_AR_GB.pdf (118.18 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-00943796

Soumis le : samedi 15 février 2014-11:06:05

Dernière modification le : vendredi 24 mars 2023-14:52:58

Archivage à long terme le : jeudi 15 mai 2014-11:11:46

Dates et versions

hal-00943796 , version 1 (08-02-2014)

hal-00943796 , version 2 (15-02-2014)

Identifiants

HAL Id : hal-00943796 , version 2

Citer

Nicolas Obin, Axel Roebel, Grégoire Bachman. On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014, Florence, Italy. ⟨hal-00943796v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES

282 Consultations

379 Téléchargements

On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager