Preprints, Working Papers, ... Year : 2020

Pipelined Model Parallelism: Complexity Results and Memory Considerations

Abstract

The training phase in Deep Neural Networks has become an important source of computing resource usage and because of the resulting volume of computation, it is crucial to perform it efficiently on parallel architectures. Even today, data parallelism is the most widely used method, but the associated requirement to replicate all the weights on the totality of computation resources poses problems of memory at the level of each node and of collective communications at the level of the platform. In this context, the model parallelism, which consists in distributing the different layers of the network over the computing nodes, is an attractive alternative. Indeed, it is expected to better distribute weights (to cope with memory problems) and it does not imply large collective communications since only forward activations are communicated. However, to be efficient, it must be combined with a pipelined / streaming approach, which leads in turn to new memory costs. The goal of this paper is to model these memory costs in detail, to analyze the complexity of the associated throughput optimization problem under memory constraints and to show that it is possible to formalize this optimization problem as an Integer Linear Program (ILP).
Fichier principal
Vignette du fichier
paperILP.pdf (646) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-02968802 , version 1 (16-10-2020)
hal-02968802 , version 2 (16-10-2020)
hal-02968802 , version 3 (18-02-2021)

Identifiers

  • HAL Id : hal-02968802 , version 2

Cite

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova. Pipelined Model Parallelism: Complexity Results and Memory Considerations. 2020. ⟨hal-02968802v2⟩
402 View
518 Download

Share

More