ONLINE SPEAKER DIARIZATION OF MEETINGS GUIDED BY SPEECH SEPARATION

Elio Gruttadauria; Mathieu Fontaine; Slim Essid

Communication Dans Un Congrès Année : 2024

ONLINE SPEAKER DIARIZATION OF MEETINGS GUIDED BY SPEECH SEPARATION

(1, 2) , (1, 2) , (1, 2)

1
2

Elio Gruttadauria

Fonction : Auteur
PersonId : 1274532
IdHAL : elio-gruttadauria
ORCID : 0009-0009-6084-4110

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Mathieu Fontaine

Fonction : Auteur
PersonId : 13405
IdHAL : mathieu-fontaine
ORCID : 0000-0002-7657-6271
IdRef : 236886681

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Slim Essid

Fonction : Auteur
PersonId : 181234
IdHAL : slimessid
ORCID : 0000-0002-0028-327X
IdRef : 11025130X

Signal, Statistique et Apprentissage

Département Images, Données, Signal

Résumé

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic data because they are trained on simulated mixtures with a fixed number of speakers. In this work, we introduce a new speech separation-guided diarization scheme suitable for the online speaker diarization of long meeting recordings with a variable number of speakers, as present in the AMI corpus. We envisage ConvTasNet and DPRNN as alternatives for the separation networks, with two or three output sources. To obtain the speaker diarization result, voice activity detection is applied on each estimated source. The final model is fine-tuned end-to-end, after first adapting the separation to real data using AMI. The system operates on short segments, and inference is performed by stitching the local predictions using speaker embeddings and incremental clustering. The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech). Finally, we show the strength of our system particularly on overlapped speech sections.

Mots clés

Speaker Diarization Source separation Online inference Overlapped speech AMI dataset Speaker embedding

Domaines

Apprentissage [cs.LG] Son [cs.SD] Traitement du signal et de l'image [eess.SP]

Fichier principal

ICASSP_2024_ELIO_GRUTTADAURIA-final.pdf (606.07 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Elio Gruttadauria : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04419041

Soumis le : lundi 29 janvier 2024-11:06:03

Dernière modification le : vendredi 2 février 2024-16:51:37

Dates et versions

hal-04419041 , version 1 (29-01-2024)

Identifiants

HAL Id : hal-04419041 , version 1

Citer

Elio Gruttadauria, Mathieu Fontaine, Slim Essid. ONLINE SPEAKER DIARIZATION OF MEETINGS GUIDED BY SPEECH SEPARATION. IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2024, Seoul (Korea), South Korea. ⟨hal-04419041⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM LTCI IDS S2A IP_PARIS ANR

170 Consultations

100 Téléchargements

ONLINE SPEAKER DIARIZATION OF MEETINGS GUIDED BY SPEECH SEPARATION

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager