Information Dropout: learning optimal representations through noisy computation, 2016. ,

Emergence of invariance and disentanglement in deep representations, ArXiv, 2017. ,

Deep variational information bottleneck, International Conference on Learning Representations, 2017. ,

How (not) to train your neural network using the information bottleneck principle, 2018. ,

Information-theoretic analysis of generalization capability of learning algorithms, Advances in Neural Information Processing Systems, vol.30, pp.2524-2533, 2017. ,

Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, 2002. ,

, , 2014.

Invariant scattering convolution networks, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013. ,

Infogan: Interpretable representation learning by information maximizing generative adversarial nets, Advances in Neural Information Processing Systems, 2016. ,

Elements of information theory, 1991. ,

, Stochastic pooling for regularization of deep convolutional neural networks, 2013.

Dropout as a bayesian approximation: Representing model uncertainty in deep learning, Proceedings of the International Conference on Machine Learning, 2017. ,

Unsupervised domain adaptation by backpropagation, ICML, pp.1180-1189, 2015. ,

Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. ,

DOI : 10.1109/cvpr.2016.90

URL : http://arxiv.org/pdf/1512.03385

Batch normalization: Accelerating deep network training by reducing internal covariate shift, Journal of Machine Learning Research, 2016. ,

On large batch training for deep learning: generalization gap and sharp minima, 2017. ,

Auto-encoding variational bayes, International Conference on Learning Representations, 2014. ,

Learning multiple layers of features from tiny images, 2009. ,

Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25, 2012. ,

DOI : 10.1145/3065386

URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

A simple weight decay can improve generalization, Advances in Neural Information Processing Systems, 1992. ,

Group invariant scattering, Communications on Pure and Applied Mathematics, 2012. ,

DOI : 10.1002/cpa.21413

URL : http://arxiv.org/pdf/1101.2286

On the uniform convergence of relative frequencies of events to their probabilities, Measures of Complexity, 1972. ,

Deep learning and the information bottleneck principle, Information Theory Workshop (ITW), 2015. ,

Estimation of entropy and mutual information, Neural Computation, 2003. ,

DOI : 10.1162/089976603321780272

URL : http://www.cns.nyu.edu/pub/eero/paninski03-reprint.pdf

Regularizing neural networks by penalizing confident output distributions, International Conference on Learning Representations Workshop, 2017. ,

Modeling by shortest data description, Automatica, 1978. ,

DOI : 10.1016/0005-1098(78)90005-5

Imagenet large scale visual recognition challeng, International Journal of Computer Vision, 2015. ,

DOI : 10.1007/s11263-015-0816-y

URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf

Learning and generalization with the information bottleneck, Theoretical Computer Science, 2010. ,

DOI : 10.1007/978-3-540-87987-9_12

URL : http://www.cs.huji.ac.il/labs/learning/Papers/ibgen_full.pdf

Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 2014. ,

, Weakly Supervised Learning of Deep Convolutional Neural Networks. In CVPR, 2016.

, The information bottleneck method. Annual Allerton Conference on Communication, Control and Computing, 1999.