Shennong: a Python toolbox for audio speech features extraction

We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an analysis of a Vocal Tract Length Normalization model as a function of the speech duration used for training and a comparison of pitch estimation algorithms under various noise conditions.

Mots clés

Domaines

Fichier principal

2112.05555.pdf (518.42 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Autorisation HAL

Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03901826

Soumis le : jeudi 15 décembre 2022-16:30:20

Dernière modification le : lundi 30 mars 2026-17:35:30

Dates et versions

hal-03901826 , version 1 (15-12-2022)

Licence

Autorisation HAL

Identifiants

HAL Id : hal-03901826 , version 1

Citer

Mathieu Bernard, Maxime Poli, Julien Karadayi, Emmanuel Dupoux. Shennong: a Python toolbox for audio speech features extraction. Behavior Research Methods, 2021. ⟨hal-03901826⟩

Exporter

Collections

252 Consultations

1713 Téléchargements