Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization

—This paper presents non-negative factorization of audio signals for the binaural localization of multiple sound sources within realistic and unknown sound environments. Non-negative tensor factorization (NTF) provides a sparse representation of multi-channel audio signals in time, frequency, and space that can be exploited in computational audio scene analysis and robot audition for the separation and localization of sound sources. In the proposed formulation, each sound source is represented by mean of spectral dictionaries, temporal activation, and its distribution within each channel (here, left and right ears). This distribution, being dependent on the frequency, can be interpreted as an explicit estimation of the Head-Related Transfer Function (HRTF) of a binaural head which can then be converted into the estimated sound source position. Moreover, the semi-supervised formulation of the non-negative factorization allows to integrate prior knowledge about some sound sources of interest whose dictionaries can be learned in advance, whereas the remaining sources are considered as background sound which remains unknown and is estimated on-the-fly. The proposed NTF-based sound source localization is here applied to binaural sound source localization of multiple speakers within realistic sound environments.

Mots clés

robot audition binaural localization computational audio scene analysis non-negative tensor factorization

Domaines

Traitement du signal et de l'image [eess.SP] Machine Learning [stat.ML]

Fichier principal

Binaural_Localization_of_Multiple_Sound.pdf (4 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-01722004

Soumis le : vendredi 2 mars 2018-18:42:53

Dernière modification le : jeudi 12 décembre 2024-03:43:22

Archivage à long terme le : jeudi 31 mai 2018-20:10:24

Dates et versions

hal-01722004 , version 1 (02-03-2018)

Identifiants

HAL Id : hal-01722004 , version 1
DOI : 10.1109/TASLP.2018.2806745

Citer

Laurent Benaroya, Nicolas Obin, Marco Liuni, Axel Roebel, Wilson Raumel, et al.. Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018, pp.1 - 1. ⟨10.1109/TASLP.2018.2806745⟩. ⟨hal-01722004⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS PARISTECH IRCAM ISIR STMS SORBONNE-UNIVERSITE LTCI SU-SCIENCES ISIR_AMAC INSTITUT-MINES-TELECOM

315 Consultations

469 Téléchargements