PETRA: Parallel End-to-end Training with Reversible Architectures

Stéphane Rivaud; Louis Fournier; Thomas Pumir; Eugene Belilovsky; Michael Eickenberg; Edouard Oyallon

Pré-Publication, Document De Travail Année : 2024

PETRA: Parallel End-to-end Training with Reversible Architectures

(1) , (1) , (2) , (3, 4) , (5) , (5)

1
2
3
4
5

Stéphane Rivaud

Fonction : Auteur
PersonId : 1259952
IdHAL : stephane-rivaud
ORCID : 0000-0002-1363-442X

Machine Learning and Information Access

Louis Fournier

Fonction : Auteur
PersonId : 1259950
IdHAL : louis-fournier
ORCID : 0009-0007-9912-8061
IdRef : 28156972X

Machine Learning and Information Access

Thomas Pumir

Fonction : Auteur
PersonId : 1386151

Helm.ai

Eugene Belilovsky

Fonction : Auteur
PersonId : 1100884

Montreal Institute for Learning Algorithms [Montréal]

Concordia University [Montreal]

Michael Eickenberg

Fonction : Auteur
PersonId : 1100887

Flatiron Institute

Edouard Oyallon

Fonction : Auteur
PersonId : 179157
IdHAL : edouard-oyallon
ORCID : 0000-0002-4826-7527
IdRef : 228745500

Flatiron Institute

Résumé

Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.

Domaines

Machine Learning [stat.ML] Apprentissage [cs.LG]

Fichier principal

neurips_2024.pdf (811.72 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Edouard Oyallon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04594647

Soumis le : lundi 3 juin 2024-14:11:02

Dernière modification le : mercredi 30 octobre 2024-13:28:43

Dates et versions

hal-04594647 , version 1 (03-06-2024)

Identifiants

HAL Id : hal-04594647 , version 1
ARXIV : 2406.02052

Citer

Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, et al.. PETRA: Parallel End-to-end Training with Reversible Architectures. 2024. ⟨hal-04594647⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS ISIR GENCI SORBONNE-UNIVERSITE SU-SCIENCES ANR ISIR_MLIA

111 Consultations

85 Téléchargements

PETRA: Parallel End-to-end Training with Reversible Architectures

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager