Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection

Taylor Mordan; Nicolas Thome; Gilles Henaff; Matthieu Cord

Communication Dans Un Congrès Année : 2018

Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection

(1, 2) , (3) , (2) , (1)

1
2
3

Taylor Mordan

Fonction : Auteur

Machine Learning and Information Access

Thales LAS France

Nicolas Thome

Fonction : Auteur
PersonId : 181803
IdHAL : nicolas-thome
ORCID : 0000-0003-4871-3045
IdRef : 12878332X

Centre d'études et de recherche en informatique et communications

Gilles Henaff

Fonction : Auteur

Thales LAS France

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Access

Résumé

Multi-Task Learning (MTL) is appealing for deep learning regularization. In this paper, we tackle a specific MTL context denoted as primary MTL, where the ultimate goal is to improve the performance of a given primary task by leveraging several other auxiliary tasks. Our main methodological contribution is to introduce ROCK, a new generic multi-modal fusion block for deep learning tailored to the primary MTL context. ROCK architecture is based on a residual connection, which makes forward prediction explicitly impacted by the intermediate auxiliary representations. The auxiliary predictor's architecture is also specifically designed to our primary MTL context, by incorporating intensive pooling operators for maximizing complementarity of intermediate representations. Extensive experiments on NYUv2 dataset (object detection with scene classification, depth prediction, and surface normal estimation as auxiliary tasks) validate the relevance of the approach and its superiority to flat MTL approaches. Our method outperforms state-of-the-art object detection models on NYUv2 by a large margin, and is also able to handle large-scale heterogeneous inputs (real and synthetic images) with missing annotation modalities.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG] Intelligence artificielle [cs.AI]

Fichier principal

ROCK.pdf (3.67 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Taylor Mordan : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-01922291

Soumis le : mardi 27 novembre 2018-16:56:01

Dernière modification le : vendredi 19 juillet 2024-11:38:04

Dates et versions

hal-01922291 , version 1 (14-11-2018)

hal-01922291 , version 2 (27-11-2018)

hal-01922291 , version 3 (20-12-2018)

Identifiants

HAL Id : hal-01922291 , version 2

Citer

Taylor Mordan, Nicolas Thome, Gilles Henaff, Matthieu Cord. Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection. Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada. ⟨hal-01922291v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

429 Consultations

167 Téléchargements

Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager