H. Azizpour, A. Sharif-razavian, J. Sullivan, A. Maki, and S. Carlsson, Factors of transferability for a generic ConvNet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol.38, pp.1790-1802, 2016.

H. Ben-younes, R. Cadene, M. Cord, and N. Thome, MUTAN: Multimodal tucker fusion for visual question answering, Proceedings of the IEEE International Conference on Computer Vision (ICCV, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02073637

H. Bilen and A. Vedaldi, Integrated perception with recurrent multi-task neural networks, Advances in Neural Information Processing Systems (NIPS), pp.235-243, 2016.

R. Caruana, Multitask learning, Machine Learning, vol.28, pp.41-75, 1997.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018.
DOI : 10.1109/tpami.2017.2699184

URL : http://arxiv.org/pdf/1606.00915

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler et al., The Cityscapes dataset for semantic urban scene understanding, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

J. Dai, Y. Li, K. He, J. Sun, and .. , Object detection via region-based fully convolutional networks, Advances in Neural Information Processing Systems (NIPS), 2016.

T. Dharmasiri, A. Spek, and T. Drummond, Joint prediction of depths, normals and surface curvature from RGB images using CNNs, Proceedings of the International Conference on Intelligent Robots and Systems (IROS), 2017.

A. Droniou and O. Sigaud, Gated autoencoders with tied input weights, Proceedings of the International Conference on Machine Learning (ICML), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00817035

T. Durand, T. Mordan, N. Thome, and M. Cord, WILDCAT: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01515640

T. Durand, N. Thome, and M. Cord, MANTRA: Minimum maximum latent structural SVM for image classification and ranking, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.2713-2721, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01343784

T. Durand, N. Thome, and M. Cord, WELDON: Weakly supervised learning of deep convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4743-4752, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01343785

D. Eigen and R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.2650-2658, 2015.

H. Fu, M. Gong, C. Wang, and D. Tao, A compromise principle in deep monocular depth estimation, 2017.

A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell et al., Multimodal compact bilinear pooling for visual question answering and visual grounding, 2016.
DOI : 10.18653/v1/d16-1044

URL : https://doi.org/10.18653/v1/d16-1044

R. Girshick and . Fast-r-cnn, Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp.1440-1448, 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.580-587, 2014.
DOI : 10.1109/cvpr.2014.81

URL : http://arxiv.org/pdf/1311.2524

S. Gupta, P. Arbeláez, R. Girshick, and J. Malik, Inferring 3D object pose in RGB-D images, vol.5, p.7, 2015.

S. Gupta, R. Girshick, P. Arbeláez, and J. Malik, Learning rich features from RGB-D images for object detection and segmentation, Proceedings of the IEEE European Conference on Computer Vision (ECCV), vol.6, p.7, 2014.
DOI : 10.1007/978-3-319-10584-0_23

URL : http://arxiv.org/pdf/1407.5736

C. Hane, L. Ladicky, and M. Pollefeys, Direction matters: Depth estimation with a surface normal classifier, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.381-389, 2015.
DOI : 10.1109/cvpr.2015.7298635

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/cvpr.2016.90

URL : http://arxiv.org/pdf/1512.03385

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol.9, issue.8, pp.1735-1780, 1997.

J. Hoffman, S. Gupta, and T. Darrell, Learning with side information through modality hallucination, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol.6, p.7, 2005.
DOI : 10.1109/cvpr.2016.96

D. Kingma and J. Ba, Adam: A method for stochastic optimization, Proceedings of the International Conference on Learning Representations (ICLR), 2015.

I. Kokkinos, UberNet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
DOI : 10.1109/cvpr.2017.579

URL : http://arxiv.org/pdf/1609.02132

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (NIPS), 2012.
DOI : 10.1145/3065386

URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

L. Ladicky, J. Shi, and M. Pollefeys, Pulling things out of perspective, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.89-96, 2014.
DOI : 10.1109/cvpr.2014.19

I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, Deeper depth prediction with fully convolutional residual networks, Proceedings of the IEEE International Conference on 3D Vision (3DV), vol.2, p.5, 2016.
DOI : 10.1109/3dv.2016.32

URL : http://arxiv.org/pdf/1606.00373

J. Li, R. Klein, and A. Yao, A two-streamed network for estimating fine-scaled depth maps from single RGB images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3372-3380, 2017.
DOI : 10.1109/iccv.2017.365

URL : http://arxiv.org/pdf/1607.00730

B. Liu, S. Gould, and D. Koller, Single image depth estimation from predicted semantic labels, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1253-1260, 2010.
DOI : 10.1109/cvpr.2010.5539823

URL : http://ai.stanford.edu/%7Ekoller/Papers/Liu%2Bal%3ACVPR10.pdf

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, SSD: Single shot multibox detector, Proceedings of the IEEE European Conference on Computer Vision (ECCV), p.6, 2016.
DOI : 10.1007/978-3-319-46448-0_2

URL : http://arxiv.org/pdf/1512.02325

Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi et al., Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1131-1140, 2017.

D. Diogo-luvizon, H. Picard, and . Tabia, 2D/3D pose estimation and action recognition using multitask deep learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

E. Meyerson and R. Miikkulainen, Beyond shared hierarchies: Deep multitask learning through soft layer ordering, Proceedings of the International Conference on Learning Representations (ICLR), 2018.

I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, Cross-stitch networks for multi-task learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3994-4003, 2016.
DOI : 10.1109/cvpr.2016.433

URL : http://arxiv.org/pdf/1604.03539

T. Mordan, N. Thome, G. Henaff, and M. Cord, End-to-end learning of latent deformable part-based representations for object detection, International Journal of Computer Vision (IJCV), issue.2, pp.1-21, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01842031

S. Park, K. Hong, and S. Lee, RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision (ICCV, 2017.

D. Pechyony and V. Vapnik, On the theory of learnining with privileged information, Advances in Neural Information Processing Systems (NIPS), pp.1894-1902, 2010.

Z. Ren and Y. Lee, Cross-domain self-supervised multi-task feature learning using synthetic imagery, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
DOI : 10.1109/cvpr.2018.00086

URL : http://arxiv.org/pdf/1711.09082

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet large scale visual recognition challenge, International Journal of Computer Vision (IJCV), vol.115, issue.3, p.6, 2015.
DOI : 10.1007/s11263-015-0816-y

URL : http://arxiv.org/pdf/1409.0575

V. Sharmanska, N. Quadrianto, and C. Lampert, Learning to rank using privileged information, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013.

V. Sharmanska, N. Quadrianto, and C. Lampert, Learning to transfer privileged information, 2014.

Z. Shi and T. Kim, Learning and refining of privileged information-based RNNs for action recognition from depth sequences, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2017.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, Proceedings of the IEEE European Conference on Computer Vision (ECCV), vol.5, p.6, 2012.

L. Spinello and K. Arras, Leveraging RGB-D data: Adaptive fusion and domain adaptation for object detection, Proceedings of the IEEE Conference on Robotics and Automation (ICRA), pp.4469-4474, 2012.

V. Vapnik and R. Izmailov, Learning using privileged information: Similarity control and knowledge transfer, Journal of Machine Learning Research, vol.16, issue.2, pp.2023-2049, 2015.

V. Vapnik and A. Vashist, A new learning paradigm: Learning using privileged information, Neural Networks, vol.22, issue.5-6, pp.544-557, 2009.

C. Wang and K. Siddiqi, Differential geometry boosts convolutional neural networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, vol.6, p.7, 2016.

P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price et al., Towards unified depth and semantic prediction from a single image, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2800-2809, 2015.

Y. Yang and T. Hospedales, Deep multi-task representation learning: A tensor factorisation approach, Proceedings of the International Conference on Learning Representations (ICLR), 2017.

Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee et al., Physically-based rendering for indoor scene understanding using convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition