A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012.

K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, CVPR, 2016.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Region-based convolutional networks for accurate object detection and segmentation, TPAMI, 2016.

R. Girshick, Fast R-CNN, ICCV, 2015.

J. Dai, Y. Li, K. He, and J. Sun, R-FCN: Object detection via regionbased fully convolutional networks, NIPS, 2016.

J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR, 2015.

J. Dai, K. He, Y. Li, S. Ren, and J. Sun, Instance-sensitive fully convolutional networks, ECCV, 2016.

H. Azizpour, A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson, Factors of transferability for a generic convnet representation, 2016.

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, BMVC, 2014.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., ImageNet large scale visual recognition challenge, International Journal of Computer Vision, 2015.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks, CVPR, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00911179

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.

M. Blaschko, P. Kumar, and B. Taskar, Tutorial: Visual learning with weak supervision, 2013.

A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-fei, What's the Point: Semantic Segmentation with Point Supervision, ECCV, 2016.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Is object localization for free? Weakly-supervised learning with convolutional neural networks, CVPR, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01015140

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, Learning Deep Features for Discriminative Localization, CVPR, 2016.

G. Papandreou, I. Kokkinos, and P. Savalle, Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection, CVPR, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01263611

P. O. Pinheiro and R. Collobert, From image-level to pixel-level labeling with convolutional networks, CVPR, 2015.

D. Pathak, E. Shelhamer, J. Long, and T. Darrell, Fully Convolutional Multi-Class Multiple Instance Learning, ICLR (Workshop), 2015.

C. Sun, M. Paluri, R. Collobert, R. Nevatia, and L. Bourdev, ProNet: Learning to Propose Object-Specific Boxes for Cascaded Neural Networks, CVPR, 2016.

C. Yu and T. Joachims, Learning structural svms with latent variables, ICML, 2009.

A. Quattoni, S. Wang, L. Morency, M. Collins, and T. Darrell, Hidden conditional random fields, 2007.

W. Ping, Q. Liu, and A. Ihler, Marginal structured svm with hidden variables, ICML, 2014.

A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun, Efficient Structured Prediction with Latent Variables for General Graphical Models, ICML, 2012.

T. G. Dietterich, R. H. Lathrop, and T. Lozano-pérez, Solving the multiple instance problem with axis-parallel rectangles, Artif. Intell, 1997.

A. Behl, P. Mohapatra, C. V. Jawahar, and M. P. Kumar, Optimizing average precision using weakly supervised data, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01223977

T. Durand, N. Thome, and M. Cord, MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking, ICCV, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01343784

&. Weldon, Weakly Supervised Learning of Deep Convolutional Neural Networks, 2016.

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning Deep Features for Scene Recognition using Places Database, NIPS, 2014.

N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev, PANDA: Pose Aligned Networks for Deep Attribute Modeling, ECCV, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCV, 2014.

Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, ECCV, 2014.

R. Arandjelovi´carandjelovi´c, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, NetVLAD: CNN architecture for weakly supervised place recognition, CVPR, 2016.

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, 2010.

R. G. Cinbis, J. Verbeek, and C. Schmid, Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01123482

A. Kolesnikov and C. H. Lampert, Seed, expand and constrain: Three principles for weakly-supervised image segmentation, ECCV, 2016.

F. X. Yu, D. Liu, S. Kumar, T. Jebara, and S. Chang, ?svm for learning with label proportions, ICML, 2013.

K. Lai, F. X. Yu, M. Chen, and S. Chang, Video event detection by inferring temporal instance labels, CVPR, 2014.

W. Li and N. Vasconcelos, Multiple Instance Learning for Soft Bags via Top Instances, CVPR, 2015.

S. N. Parizi, A. Vedaldi, A. Zisserman, and P. F. Felzenszwalb, Automatic discovery and optimization of parts for image classification, ICLR, 2015.

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, NIPS, 2003.

H. Azizpour, M. Arefiyan, S. N. Parizi, and S. Carlsson, Spotlight the negatives: A generalized discriminative latent model, BMVC, 2015.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, ICLR, 2015.
DOI : 10.1109/tpami.2017.2699184

URL : http://arxiv.org/pdf/1606.00915

P. Krähenbkrähenb¨krähenbühl and V. Koltun, Efficient inference in fully connected crfs with gaussian edge potentials, NIPS, 2011.

S. Zheng, S. Jayasumana, B. Romera-paredes, V. Vineet, Z. Su et al., Conditional random fields as recurrent neural networks," in ICCV, 2015.

L. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, Learning deep structured models, ICML, 2015.

S. Wang, S. Fidler, and R. Urtasun, Proximal deep structured models, NIPS, 2016.

D. Belanger and A. Mccallum, Structured prediction energy networks, ICML, 2016.

Y. Song, A. G. Schwing, R. S. Zemel, and R. Urtasun, Training deep neural networks via direct loss minimization, ICML, 2016.

K. Miller, M. P. Kumar, B. Packer, D. Goodman, and D. Koller, Max-margin min-entropy models, AISTATS, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00773602

D. Bouchacourt, S. Nowozin, and M. P. Kumar, Entropybased latent structured output prediction, ICCV, 2015.
DOI : 10.1109/iccv.2015.334

URL : https://hal.archives-ouvertes.fr/hal-01223968

D. Bouchacourt, P. K. Mudigonda, and S. Nowozin, Disco nets : Dissimilarity coefficients networks, NIPS, 2016.

Y. Yue, T. Finley, F. Radlinski, and T. Joachims, A support vector method for optimizing average precision, SIGIR, 2007.
DOI : 10.1145/1277741.1277790

URL : http://radlinski.org/papers/YueEtAl_SIGIR2007.pdf

G. Papandreou, L. Chen, K. Murphy, and A. L. Yuille, Weaklyand semi-supervised learning of a DCNN for semantic image segmentation, ICCV, 2015.

D. Pathak, P. Krahenbuhl, and T. Darrell, Constrained Convolutional Neural Networks for Weakly Supervised Segmentation, ICCV, 2015.
DOI : 10.1109/iccv.2015.209

URL : http://arxiv.org/pdf/1506.03648

A. Quattoni and A. Torralba, Recognizing indoor scenes, CVPR, 2009.
DOI : 10.1109/cvpr.2009.5206537

URL : http://people.csail.mit.edu/torralba/publications/indoor.pdf

P. Mohapatra, C. Jawahar, and M. P. Kumar, Efficient optimization for average precision svm, NIPS, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01069917

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results, vol.61
DOI : 10.1007/s11263-014-0733-5

URL : https://www.pure.ed.ac.uk/ws/files/20017166/ijcv_voc14.pdf

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona et al., Microsoft COCO: Common Objects in Context," in ECCV, 2014.
DOI : 10.1007/978-3-319-10602-1_48

URL : http://arxiv.org/pdf/1405.0312.pdf

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The Caltech-UCSD Birds-200-2011 Dataset, 2011.

L. Li, H. Su, E. P. Xing, and L. Fei-fei, Object Bank: A HighLevel Image Representation for Scene Classification & Semantic Feature Sparsification, NIPS, 2010.

Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong et al., CNN: single-label to multi-label, CoRR, 2014.

G. Sharma and B. Schiele, Scalable nonlinear embeddings for semantic category-based image retrieval, ICCV, 2015.

Z. Wei and M. Hoai, Region Ranking SVM for Image Classification, CVPR, 2016.

P. Kulkarni, F. Jurie, J. Zepeda, P. Pérez, and L. Chevallier, Spleap: Soft pooling of learned parts for image classification, ECCV, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01350562

Y. Gao, O. Beijbom, N. Zhang, and T. Darrell, , 2016.

T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng et al., The application of two-level attention models in deep convolutional neural network for fine-grained image classification, CVPR, 2015.

M. Jaderberg, K. Simonyan, and A. Zisserman, Spatial Transformer Networks, NIPS, 2015.

R. Wu, B. Wang, W. Wang, and Y. Yu, Harvesting discriminative meta objects with deep cnn features for scene classification, ICCV, 2015.

M. Simon and E. Rodner, Neural activation constellations: Unsupervised part model discovery with convolutional networks, ICCV, 2015.

S. Huang, Z. Xu, D. Tao, and Y. Zhang, Part-stacked cnn for finegrained visual categorization, CVPR, 2016.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, CVPR, 2015.

S. Xie, R. Girshick, P. Dollr, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, 2016.

C. Szegedy, S. Ioffe, and V. Vanhoucke, Inception-v4, inceptionresnet and the impact of residual connections on learning, 2016.

G. Gkioxari, R. Girshick, and J. Malik, Contextual action recognition with r* cnn, CVPR, 2015.

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, ICLR, 2015.

A. J. Bency, H. Kwon, H. Lee, S. Karthikeyan, and B. S. Manjunath, Weakly supervised localization using deep feature maps, ECCV, 2016.

B. Hariharan, P. Arbelaez, L. D. Bourdev, S. Maji, and J. Malik, Semantic contours from inverse detectors, ICCV, 2011.