On the Generalization of Shannon Entropy for Speech Recognition

Nicolas Obin; Marco Liuni

Communication Dans Un Congrès Année : 2012

On the Generalization of Shannon Entropy for Speech Recognition

(1) , (2, 3)

1
2
3

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Sciences et Technologies de la Musique et du Son

Marco Liuni

Fonction : Auteur
PersonId : 911040

Institut de Recherche et Coordination Acoustique/Musique

Dipartimento di Matematica "Ulisse Dini"

Résumé

This paper introduces an entropy-based spectral representation as a measure of the degree of noisiness in audio signals, complementary to the standard MFCCs for audio and speech recognition. The proposed representation is based on the Rényi entropy, which is a generalization of the Shannon entropy. In audio signal representation, Rényi entropy presents the advantage of focusing either on the harmonic content (prominent amplitude within a distribution) or on the noise content (equal distribution of amplitudes). The proposed representation outperforms all other noisiness measures - including Shannon and Wiener entropies - in a large-scale classification of vocal effort (whispered-soft/normal/loud-shouted) in the real scenario of multi-language massive role-playing video games. The improvement is around 10% in relative error reduction, and is particularly significant for the recognition of noisy speech - i.e., whispery/breathy speech. This confirms the role of noisiness for speech recognition, and will further be extended to the classification of voice quality for the design of an automatic voice casting system in video games.

Mots clés

information theory spectral entropy speech recognition expressive speech voice quality video games

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP] Théorie [stat.TH] Machine Learning [stat.ML] Applications [stat.AP]

Fichier principal

SLT12_NO_ML.pdf (1.45 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-00737653

Soumis le : mardi 2 octobre 2012-14:04:39

Dernière modification le : vendredi 20 décembre 2024-11:38:17

Archivage à long terme le : vendredi 16 décembre 2016-19:31:01

Dates et versions

hal-00737653 , version 1 (02-10-2012)

Identifiants

HAL Id : hal-00737653 , version 1

Citer

Nicolas Obin, Marco Liuni. On the Generalization of Shannon Entropy for Speech Recognition. IEEE workshop on Spoken Language Technology, Dec 2012, United States. ⟨hal-00737653⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES

300 Consultations

525 Téléchargements

On the Generalization of Shannon Entropy for Speech Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager