Similarity Search of Acted Voices for Automatic Voice Casting

Nicolas Obin; Axel Roebel

doi:10.1109/TASLP.2016.2580302

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2016

Similarity Search of Acted Voices for Automatic Voice Casting

(1) , (1)

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Analyse et synthèse sonores [Paris]

Axel Roebel

Fonction : Auteur
PersonId : 4527
IdHAL : axel-roebel
ORCID : 0000-0001-6136-4391
IdRef : 227186079

Analyse et synthèse sonores [Paris]

Résumé

This paper presents a large-scale similarity search of professionally acted voices for computer-aided voice casting. The proposed voice casting system explores Gaussian mixture model-based acoustic models and multilabel recognition of perceived paralinguistic content (speaker states and speaker traits, e.g., age/gender, voice quality, emotion) for the voice casting of professionally acted voices. First, acoustic models (universal background model, super-vector, i-vector) are constructed to model the acoustic space of voices, from which the similarity between voices can be measured directly in the acoustic space. Second, multiple binary classification of speaker traits and states is added to the acoustic models in order to represent the vocal signature of a voice, which is then used to measure the similarity between voices in the paralinguistic space. Finally, a similarity search is processed in order to determine the set of target actors that are the most similar to the voice of a source actor. In a subjective experiment conducted in the real-context of cross-language voice casting, the multilabel scoring system significantly outperforms the acoustic scoring system. This constitutes a proof of concept for the role of perceived para-linguistic categories in the perception of voice similarity.

Mots clés

Multi-label classification para-linguistics speaker recognition speaker traits and states voice casting voice similarity

Domaines

Traitement du signal et de l'image [eess.SP] Machine Learning [stat.ML]

Fichier principal

taslp-obin-2580302-proof.pdf (598.18 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-01464715

Soumis le : vendredi 10 février 2017-14:11:14

Dernière modification le : vendredi 24 mars 2023-14:53:03

Archivage à long terme le : jeudi 11 mai 2017-13:54:56

Dates et versions

hal-01464715 , version 1 (10-02-2017)

Identifiants

HAL Id : hal-01464715 , version 1
DOI : 10.1109/TASLP.2016.2580302

Citer

Nicolas Obin, Axel Roebel. Similarity Search of Acted Voices for Automatic Voice Casting. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016, 24 (9), pp.1642 - 1651. ⟨10.1109/TASLP.2016.2580302⟩. ⟨hal-01464715⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES

450 Consultations

514 Téléchargements

Similarity Search of Acted Voices for Automatic Voice Casting

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager