SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Thibault Formal; Benjamin Piwowarski; Stéphane Clinchant

doi:10.1145/3404835.3463098

Communication Dans Un Congrès Année : 2021

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

(1) , (1) , (2)

1
2

Thibault Formal

Fonction : Auteur
PersonId : 1105732
IdRef : 270784918

Machine Learning and Information Access

Benjamin Piwowarski

Fonction : Auteur
PersonId : 9362
IdHAL : benjamin-piwowarski
ORCID : 0000-0001-6792-3262
IdRef : 226846601

Machine Learning and Information Access

Stéphane Clinchant

Fonction : Auteur
PersonId : 1105733

Naver Labs Europe [Meylan]

Résumé

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-ofthe-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization. CCS CONCEPTS • Information systems → Language models.

Mots clés

neural networks indexing sparse representations regularization

Domaines

Informatique [cs]

Fichier principal

3404835.3463098.pdf (1.06 Mo)

Origine	Publication financée par une institution

Gestionnaire HAL 5 Sorbonne Université : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03290774

Soumis le : lundi 19 juillet 2021-14:51:04

Dernière modification le : samedi 7 octobre 2023-21:36:22

Archivage à long terme le : mercredi 20 octobre 2021-18:54:23

Dates et versions

hal-03290774 , version 1 (19-07-2021)

Identifiants

HAL Id : hal-03290774 , version 1
DOI : 10.1145/3404835.3463098

Citer

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2288-2292, ⟨10.1145/3404835.3463098⟩. ⟨hal-03290774⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

101 Consultations

312 Téléchargements

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager