Improving Latent Representations of ConvNets for Visual Understanding

Thomas Robert

Thèse Année : 2019

Improving Latent Representations of ConvNets for Visual Understanding

Amélioration des représentations latentes des ConvNets pour l'interprétation de données visuelles

(1)

Thomas Robert

Fonction : Auteur

Machine Learning and Information Access

Résumé

For a decade now, convolutional deep neural networks have demonstrated their ability to produce excellent results for computer vision. For this, these models transform the input image into a series of latent representations. In this thesis, we work on improving the "quality'' of the latent representations of ConvNets for different tasks. First, we work on regularizing those representations to increase their robustness toward intra-class variations and thus improve their performance for classification. To do so, we develop a loss based on information theory metrics to decrease the entropy conditionally to the class. Then, we propose to structure the information in two complementary latent spaces, solving a conflict between the invariance of the representations and the reconstruction task. This structure allows to release the constraint posed by classical architecture, allowing to obtain better results in the context of semi-supervised learning. Finally, we address the problem of disentangling, i.e. explicitly separating and representing independent factors of variation of the dataset. We pursue our work on structuring the latent spaces and use adversarial costs to ensure an effective separation of the information. This allows to improve the quality of the representations and allows semantic image editing.

Depuis le début de la décennie, les réseaux de neurones convolutifs profonds pour le traitement d'images ont démontré leur capacité à produire d'excellent résultats. Pour cela, ces modèles transforment une image en une succession de représentations latentes. Dans cette thèse, nous travaillerons à l'amélioration de la qualité de ces représentations latentes. Dans un premier temps, nous travaillons à la régularisation de ces représentations pour les rendre plus robustes aux variations intra-classe et améliorer les performances de classification via une pénalité basée sur des métriques liées à la théorie de l'information. Dans un second temps, nous proposons de structurer l'information en deux sous-espaces latents complémentaires, résolvant un conflit entre l'invariance des représentations et la reconstruction. La structuration en deux espaces permet ainsi de relâcher la contrainte posée par les architectures classiques, permettant ainsi d'obtenir de meilleurs résultats en classification semi-supervisé. Enfin, nous nous intéressons au disentangling, c'est-à-dire la séparation de facteurs sémantiques indépendants. Nous poursuivons nos travaux de structuration des espaces latent et utilisons des coûts adverses pour assurer une séparation efficace de l'information. Cela permet d'améliorer la qualité des représentations ainsi que l'édition sémantique d'images.

Mots clés

Machine learning Deep learning Learning Semi-supervised Regularization Disentangling Computer vision Autoencoders

Machine learning Deep learning Apprentissage Semi-Supervisé Régularisation Disentangling Vision par ordinateur Auto-encodeur

Domaines

Intelligence artificielle [cs.AI] Réseau de neurones [cs.NE] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

ROBERT_Thomas_2019.pdf (13.82 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://hal.science/tel-02309812

Soumis le : vendredi 20 novembre 2020-17:18:21

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

tel-02309812 , version 1 (09-10-2019)

tel-02309812 , version 2 (20-11-2020)

Identifiants

HAL Id : tel-02309812 , version 2

Citer

Thomas Robert. Improving Latent Representations of ConvNets for Visual Understanding. Artificial Intelligence [cs.AI]. Sorbonne Université, 2019. English. ⟨NNT : 2019SORUS343⟩. ⟨tel-02309812v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS STAR LIP6 SORBONNE-UNIVERSITE THESES-SU SU-SCIENCES

1251 Consultations

723 Téléchargements

Improving Latent Representations of ConvNets for Visual Understanding

Amélioration des représentations latentes des ConvNets pour l'interprétation de données visuelles

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager