Simple Domain Adaptation for Sparse Retrievers

Mathias Vast; Yuxuan Zong; Benjamin Piwowarski; Laure Soulier

doi:10.1007/978-3-031-56063-7_32

Chapitre D'ouvrage Année : 2024

Simple Domain Adaptation for Sparse Retrievers

Adaptation de Domaine Simple pour la Recherche Parcimonieuse

(1, 2) , (2, 1) , (1, 3) , (1)

1
2
3

Mathias Vast

Fonction : Auteur correspondant
PersonId : 1366751

Connectez-vous pour contacter l'auteur

Machine Learning and Information Access

Institut des Systèmes Intelligents et de Robotique

Yuxuan Zong

Fonction : Auteur
PersonId : 1366752
ORCID : 0009-0002-0376-1369

Institut des Systèmes Intelligents et de Robotique

Machine Learning and Information Access

Benjamin Piwowarski

Fonction : Auteur
PersonId : 9362
IdHAL : benjamin-piwowarski
ORCID : 0000-0001-6792-3262
IdRef : 226846601

Machine Learning and Information Access

Centre National de la Recherche Scientifique

Laure Soulier

Fonction : Auteur
PersonId : 8070
IdHAL : soulierl
ORCID : 0000-0001-9827-7400
IdRef : 189293683

Machine Learning and Information Access

Résumé

n Information Retrieval, and more generally in Natural Language Processing, adapting models to specific domains is conducted through fine-tuning. Despite the successes achieved by this method and its versa- tility, the need for human-curated and labeled data makes it impractical to transfer to new tasks, domains, and/or languages when training data doesn’t exist. Using the model without training (zero-shot) is another option that however suffers an effectiveness cost, especially in the case of first-stage retrievers. Numerous research directions have emerged to tackle these issues, most of them in the context of adapting to a task or a language. However, the literature is scarcer for domain (or topic) adaptation. In this paper, we address this issue of cross-topic discrepancy for a sparse first-stage retriever by transposing a method initially designed for language adaptation. By leveraging pre-training on the target data to learn domain-specific knowledge, this technique alleviates the need for annotated data and expands the scope of domain adaptation. Despite their relatively good generalization ability, we show that even sparse retrievers can benefit from our simple domain adaptation method.

Dans le cadre de la Recherche d'Information (RI), l'apprentissage des modèles repose fortement sur l'approche "pré-entraîner puis affiner". Malgré ses très bons résultats, cette méthode nécessite d'avoir accès à un jeu de données labellisées ce qui complique son application à de nouveaux domaines ou langues, en particulier si ceux-ci sont faiblement fournis. Cet article propose une solution simple au transfert d'un modèle de recherche parcimonieux, SPLADE, vers des domaines sans données labellisées.

Mots clés

Pretrained Language Models Cross-Topic Adaptation Zero-Shot

Domaines

Recherche d'information [cs.IR]

Fichier principal

camera_ready_pdf.pdf (431.19 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mathias Vast : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-04517668

Soumis le : vendredi 22 mars 2024-16:44:09

Dernière modification le : mercredi 30 octobre 2024-13:28:03

Dates et versions

hal-04517668 , version 1 (22-03-2024)

Identifiants

HAL Id : hal-04517668 , version 1
ARXIV : 2401.11509
DOI : 10.1007/978-3-031-56063-7_32

Citer

Mathias Vast, Yuxuan Zong, Benjamin Piwowarski, Laure Soulier. Simple Domain Adaptation for Sparse Retrievers. Advances in Information Retrieval, 14610, Springer Nature Switzerland, pp.403-412, 2024, Lecture Notes in Computer Science, ⟨10.1007/978-3-031-56063-7_32⟩. ⟨hal-04517668⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR SORBONNE-UNIVERSITE SU-SCIENCES ANR ISIR_MLIA

34 Consultations

46 Téléchargements

Simple Domain Adaptation for Sparse Retrievers

Adaptation de Domaine Simple pour la Recherche Parcimonieuse

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager