Towards Effective and Efficient Sparse Neural Information Retrieval - Sorbonne Université
Article Dans Une Revue ACM Transactions on Information Systems Année : 2024

Towards Effective and Efficient Sparse Neural Information Retrieval

Résumé

Sparse representation learning based on Pre-trained Language Models has seen a growing interest in Information Retrieval. Such approaches can take advantage of the proven efficiency of inverted indexes and inherit desirable IR priors such as explicit lexical matching or some degree of interpretability. In this work, we thoroughly develop the framework of sparse representation learning in IR, which unifies term weighting and expansion in a supervised setting. We then build on SPLADE—a sparse expansion-based retriever—and show to which extent it is able to benefit from the same training improvements as dense bi-encoders by studying the effect of distillation, hard negative mining, as well as the Pre-trained Language Model’s initialization on its effectiveness , leading to state-of-the-art results in both in- and out-of-domain evaluation settings (SPLADE++). We furthermore propose efficiency improvements, allowing us to reach latency requirements on par with traditional keyword-based approaches (Efficient-SPLADE).

Dates et versions

Identifiants

Citer

Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant. Towards Effective and Efficient Sparse Neural Information Retrieval. ACM Transactions on Information Systems, 2024, 42 (5), pp.1-46. ⟨10.1145/3634912⟩. ⟨hal-04787990⟩
6 Consultations
1 Téléchargements

Altmetric

Partager

More