Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, vol.25, pp.1097-1105, 2012. ,
Efficient estimation of word representations in vector space, 2013. ,
Skip-thought vectors, Advances in neural information processing systems, pp.3294-3302, 2015. ,
Deep visual-semantic alignments for generating image descriptions, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.3128-3137, 2015. ,
Visual relationship detection with language priors, European Conference on Computer Vision, pp.852-869, 2016. ,
Visual Dialog, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017. ,
GuessWhat?! Visual object discovery through multi-modal dialogue, Conference on Computer Vision and Pattern Recognition (CVPR, 2017. ,
VQA: Visual Question Answering, International Conference on Computer Vision (ICCV), 2015. ,
Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering, IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2017. ,
Don't just assume; look and answer: Overcoming priors for visual question answering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. ,
An analysis of visual question answering algorithms, The IEEE International Conference on Computer Vision (ICCV), 2017. ,
Vizwiz grand challenge: Answering visual questions from blind people, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3608-3617, 2018. ,
, Gqa: a new dataset for compositional question answering over real-world images, 2019.
From recognition to cognition: Visual commonsense reasoning, 2019. ,
Bottom-up and top-down attention for image captioning and visual question answering, IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2018. ,
Murel: Multimodal Relational Reasoning for Visual Question Answering, IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02073649
Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection, Proceedings of the 33st Conference on Artificial Intelligence (AAAI), 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02073644
Explainable neural computation via stack neural module networks, ECCV, 2018. ,
Bilinear attention networks, Advances in Neural Information Processing Systems, pp.1564-1574, 2018. ,
Explainable and explicit visual reasoning over scene graphs, CVPR, 2019. ,
Chain of Reasoning for Visual Question Answering, Advances in Neural Information Processing Systems, vol.31, pp.275-285, 2018. ,
Dynamic Fusion with Intra-and Inter-Modality Attention Flow for Visual Question Answering, CVPR, 2019. ,
CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning, IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2017. ,
Analyzing the behavior of visual question answering models, EMNLP, 2016. ,
Overcoming language priors in visual question answering with adversarial regularization, Advances in Neural Information Processing Systems, pp.1541-1551, 2018. ,
Stacked attention networks for image question answering, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.21-29, 2016. ,
Reporting bias and knowledge acquisition, Proceedings of the 2013 workshop on Automated knowledge base construction, pp.25-30, 2013. ,
Being negative but constructively: Lessons learnt from creating better visual question answering datasets, 2018. ,
Unbiased look at dataset bias. CVPR, 2011. ,
Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases, The European Conference on Computer Vision (ECCV), 2018. ,
Right for the Right Reason: Training Agnostic Networks, Lecture Notes in Computer Science, pp.164-174, 2018. ,
Explicit Bias Discovery in Visual Question Answering Models, CVPR, 2019. ,
Women also snowboard: Overcoming bias in captioning models, In ECCV, 2018. ,
Men also like shopping: Reducing gender bias amplification using corpus-level constraints, Conference on Empirical Methods in Natural Language Processing, 2017. ,
Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2930-2939, 2016. ,
Robot learning in homes: Improving generalization and reducing dataset bias, Advances in Neural Information Processing Systems, pp.9094-9104, 2018. ,
Blindfold baselines for embodied qa, 2018. ,
Shifting the baseline: Single modality performance on visual navigation & qa, NACL, 2019. ,
Object hallucination in image captioning, In EMNLP, 2018. ,
Revisiting visual question answering baselines, European conference on computer vision, pp.727-739, 2016. ,
Hierarchical question-image co-attention for visual question answering, Advances In Neural Information Processing Systems, pp.289-297, 2016. ,
Mutan: Multimodal tucker fusion for visual question answering, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02073637
Beyond bilinear: Generalized multi-modal factorized high-order pooling for visual question answering, IEEE Transactions on Neural Networks and Learning Systems, 2018. ,
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016. ,
Answer them all! toward universal visual question answering models, 2019. ,