Fine-tuning 3D foundation models for geometric object retrieval

Jarne van den Herrewegen; Tom Tourwé; Maks Ovsjanikov; Francis Wyffels

doi:10.1016/j.cag.2024.103993

Article Dans Une Revue Computers and Graphics Année : 2024

Fine-tuning 3D foundation models for geometric object retrieval

(1, 2) , (1) , (3, 4) , (2)

1
2
3
4

Jarne van den Herrewegen

Fonction : Auteur
PersonId : 1479352
ORCID : 0000-0002-7831-5390

Oqton AI

Universiteit Gent = Ghent University = Université de Gand

Tom Tourwé

Fonction : Auteur
PersonId : 1479353

Oqton AI

Maks Ovsjanikov

Fonction : Auteur
PersonId : 1209438
ORCID : 0000-0002-5867-4046

Laboratoire d'informatique de l'École polytechnique [Palaiseau]

La Géometrie au Service du Numérique

Francis Wyffels

Fonction : Auteur
PersonId : 1358080

Universiteit Gent = Ghent University = Université de Gand

Résumé

Foundation models, such as ULIP-2 (Xue et al., 2023) recently projected forward the field of 3D deep learning. These models are trained with significantly more data and show superior representation learning capacity in many downstream tasks like 3D shape classification and few-shot part segmentation. A particular characteristic of the recent 3D foundation models is that they are typically multi-modal, and involve image (2D) as well as caption (text) branches. This leads to an intricate interplay that benefits all modalities. At the same time, the nature of the 3D encoders alone, involved in these foundation models is not well-understood. Specifically, there is little analysis on the utility of both pre-trained 3D features provided by these models, or their capacity to adapt to new downstream 3D data. Furthermore, existing studies typically focus on label-oriented downstream tasks, such as shape classification, and ignore other critical applications, such as 3D content-based object retrieval.

In this paper, we fill this gap and show, for the first time, how 3D foundation models can be leveraged for strong 3D-to-3D retrieval performance on seven different datasets, on par with state-of-the-art view-based architectures. We evaluate both the pre-trained foundation models, as well as their fine-tuned versions using downstream data. We compare supervised fine-tuning using classification labels against two self-supervised label-free fine-tuning methods. Importantly, we introduce and describe a methodology for fine-tuning, as we found this to be crucial to make transfer learning from 3D foundation models work in a stable manner.

Mots clés

Object retrieval Deep learning 3D Transfer learning Foundation models Self-supervised learning

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

3DOR2024_3D_foundation_retrieval.pdf (1.38 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité

Jiong Chen : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04840018

Soumis le : lundi 16 décembre 2024-11:23:53

Dernière modification le : jeudi 19 décembre 2024-09:33:14

Dates et versions

hal-04840018 , version 1 (16-12-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04840018 , version 1
DOI : 10.1016/j.cag.2024.103993

Citer

Jarne van den Herrewegen, Tom Tourwé, Maks Ovsjanikov, Francis Wyffels. Fine-tuning 3D foundation models for geometric object retrieval. Computers and Graphics, 2024, 122, pp.103993. ⟨10.1016/j.cag.2024.103993⟩. ⟨hal-04840018⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INRIA LIX X-LIX X-DEP-INFO INRIA2 IP_PARIS ANR GS-COMPUTER-SCIENCE

0 Consultations

0 Téléchargements

Fine-tuning 3D foundation models for geometric object retrieval

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager