SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking - Sorbonne Université Access content directly
Conference Papers Year : 2021

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Thibault Formal
  • Function : Author
  • PersonId : 1105732
  • IdRef : 270784918
Stéphane Clinchant
  • Function : Author
  • PersonId : 1105733

Abstract

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-ofthe-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization. CCS CONCEPTS • Information systems → Language models.
Fichier principal
Vignette du fichier
3404835.3463098.pdf (1.06 Mo) Télécharger le fichier
Origin : Publication funded by an institution

Dates and versions

hal-03290774 , version 1 (19-07-2021)

Identifiers

Cite

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking. SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2288-2292, ⟨10.1145/3404835.3463098⟩. ⟨hal-03290774⟩
60 View
170 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More