RUBi: Reducing Unimodal Biases for Visual Question Answering

Remi Cadene; Corentin Dancette; Hedi Ben-Younes; Matthieu Cord; Devi Parikh

Communication Dans Un Congrès Année : 2019

RUBi: Reducing Unimodal Biases for Visual Question Answering

(1) , (1) , (1) , (1) , (2, 3)

1
2
3

Remi Cadene

Fonction : Auteur
PersonId : 173554
IdHAL : rcadene
IdRef : 258760966

Machine Learning and Information Access

Corentin Dancette

Fonction : Auteur
PersonId : 1066275
IdRef : 270396497

Machine Learning and Information Access

Hedi Ben-Younes

Fonction : Auteur

Machine Learning and Information Access

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Access

Devi Parikh

Fonction : Auteur
PersonId : 1066276

Facebook AI Research

Georgia Institute of Technology [Atlanta]

Résumé

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

Domaines

Apprentissage [cs.LG] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

8371-rubi-reducing-unimodal-biases-for-visual-question-answering.pdf (6.16 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Corentin Dancette : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-02507524

Soumis le : vendredi 13 mars 2020-11:47:11

Dernière modification le : samedi 7 octobre 2023-21:36:22

Archivage à long terme le : dimanche 14 juin 2020-13:37:08

Dates et versions

hal-02507524 , version 1 (13-03-2020)

Identifiants

HAL Id : hal-02507524 , version 1

Citer

Remi Cadene, Corentin Dancette, Hedi Ben-Younes, Matthieu Cord, Devi Parikh. RUBi: Reducing Unimodal Biases for Visual Question Answering. Neural Information Processing Systems, Dec 2019, Vancouver, Canada. pp.841-852. ⟨hal-02507524⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

98 Consultations

93 Téléchargements

RUBi: Reducing Unimodal Biases for Visual Question Answering

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager