Information Dropout: learning optimal representations through noisy computation, 2016. ,
Emergence of invariance and disentanglement in deep representations, ArXiv, 2017. ,
Deep variational information bottleneck, International Conference on Learning Representations, 2017. ,
How (not) to train your neural network using the information bottleneck principle, 2018. ,
Information-theoretic analysis of generalization capability of learning algorithms, Advances in Neural Information Processing Systems, vol.30, pp.2524-2533, 2017. ,
Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, 2002. ,
, , 2014.
Invariant scattering convolution networks, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013. ,
Infogan: Interpretable representation learning by information maximizing generative adversarial nets, Advances in Neural Information Processing Systems, 2016. ,
Elements of information theory, 1991. ,
, Stochastic pooling for regularization of deep convolutional neural networks, 2013.
Dropout as a bayesian approximation: Representing model uncertainty in deep learning, Proceedings of the International Conference on Machine Learning, 2017. ,
Unsupervised domain adaptation by backpropagation, ICML, pp.1180-1189, 2015. ,
Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. ,
DOI : 10.1109/cvpr.2016.90
URL : http://arxiv.org/pdf/1512.03385
Batch normalization: Accelerating deep network training by reducing internal covariate shift, Journal of Machine Learning Research, 2016. ,
On large batch training for deep learning: generalization gap and sharp minima, 2017. ,
Auto-encoding variational bayes, International Conference on Learning Representations, 2014. ,
Learning multiple layers of features from tiny images, 2009. ,
Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25, 2012. ,
DOI : 10.1145/3065386
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf
A simple weight decay can improve generalization, Advances in Neural Information Processing Systems, 1992. ,
Group invariant scattering, Communications on Pure and Applied Mathematics, 2012. ,
DOI : 10.1002/cpa.21413
URL : http://arxiv.org/pdf/1101.2286
On the uniform convergence of relative frequencies of events to their probabilities, Measures of Complexity, 1972. ,
Deep learning and the information bottleneck principle, Information Theory Workshop (ITW), 2015. ,
Estimation of entropy and mutual information, Neural Computation, 2003. ,
DOI : 10.1162/089976603321780272
URL : http://www.cns.nyu.edu/pub/eero/paninski03-reprint.pdf
Regularizing neural networks by penalizing confident output distributions, International Conference on Learning Representations Workshop, 2017. ,
Modeling by shortest data description, Automatica, 1978. ,
DOI : 10.1016/0005-1098(78)90005-5
Imagenet large scale visual recognition challeng, International Journal of Computer Vision, 2015. ,
DOI : 10.1007/s11263-015-0816-y
URL : http://dspace.mit.edu/bitstream/1721.1/104944/1/11263_2015_Article_816.pdf
Learning and generalization with the information bottleneck, Theoretical Computer Science, 2010. ,
DOI : 10.1007/978-3-540-87987-9_12
URL : http://www.cs.huji.ac.il/labs/learning/Papers/ibgen_full.pdf
Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 2014. ,
, Weakly Supervised Learning of Deep Convolutional Neural Networks. In CVPR, 2016.
, The information bottleneck method. Annual Allerton Conference on Communication, Control and Computing, 1999.