PETRA: Parallel End-to-end Training with Reversible Architectures - Machine Learning and Information Access
Pré-Publication, Document De Travail Année : 2024

PETRA: Parallel End-to-end Training with Reversible Architectures

Louis Fournier
Thomas Pumir
  • Fonction : Auteur
  • PersonId : 1386151
Michael Eickenberg
  • Fonction : Auteur
  • PersonId : 1100887
Edouard Oyallon

Résumé

Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep learning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
Fichier principal
Vignette du fichier
neurips_2024.pdf (811.72 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04594647 , version 1 (03-06-2024)

Identifiants

Citer

Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, et al.. PETRA: Parallel End-to-end Training with Reversible Architectures. 2024. ⟨hal-04594647⟩
111 Consultations
85 Téléchargements

Altmetric

Partager

More