Naive imputation implicitly regularizes high-dimensional linear models

Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet

Communication Dans Un Congrès Année : 2023

Naive imputation implicitly regularizes high-dimensional linear models

(1) , (1) , (2) , (2)

1
2

Alexis Ayme

Fonction : Auteur
PersonId : 1382603
IdRef : 281644004

Laboratoire de Probabilités, Statistique et Modélisation

Claire Boyer

Fonction : Auteur

Laboratoire de Probabilités, Statistique et Modélisation

Aymeric Dieuleveut

Fonction : Auteur
PersonId : 1109167
IdHAL : aymeric-dieuleveut
ORCID : 0009-0005-1848-1724

Centre de Mathématiques Appliquées de l'Ecole polytechnique

Erwan Scornet

Fonction : Auteur

Centre de Mathématiques Appliquées de l'Ecole polytechnique

Résumé

Two different approaches exist to handle missing values for prediction: either imputation, prior to fitting any predictive algorithms, or dedicated methods able to natively incorporate missing values. While imputation is widely (and easily) use, it is unfortunately biased when low-capacity predictors (such as linear models) are applied afterward. However, in practice, naive imputation exhibits good predictive performance. In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. We prove that zero imputation performs an implicit regularization closely related to the ridge method, often used in high-dimensional problems. Leveraging on this connection, we establish that the imputation bias is controlled by a ridge bias, which vanishes in high dimension. As a predictor, we argue in favor of the averaged SGD strategy, applied to zero-imputed data. We establish an upper bound on its generalization error, highlighting that imputation is benign in the d √ n regime. Experiments illustrate our findings.

Domaines

Théorie [stat.TH]

Fichier principal

main.pdf (421.04 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Claire Boyer : Connectez-vous pour contacter le contributeur

https://hal.sorbonne-universite.fr/hal-03958825

Soumis le : lundi 30 janvier 2023-14:59:20

Dernière modification le : mardi 3 décembre 2024-15:14:03

Dates et versions

hal-03958825 , version 1 (30-01-2023)

Identifiants

HAL Id : hal-03958825 , version 1
ARXIV : 2301.13585

Citer

Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet. Naive imputation implicitly regularizes high-dimensional linear models. International Conference on Machine Learning, Jul 2023, Hawai, USA, United States. ⟨hal-03958825⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INRIA INSMI X-CMAP X-DEP-MATHA CMAP LPSM SORBONNE-UNIVERSITE SU-SCIENCES IP_PARIS UP-SCIENCES

69 Consultations

136 Téléchargements

Naive imputation implicitly regularizes high-dimensional linear models

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager