Qualitatively Analyzing Optimization Objectives in the Design of HPC Resource Manager - DATAMOVE - Mouvement de données pour le calcul haute performance
Pré-Publication, Document De Travail Année : 2023

Qualitatively Analyzing Optimization Objectives in the Design of HPC Resource Manager

Résumé

A correct evaluation of scheduling algorithms and a good understanding of their optimization criteria are key components of resource management in HPC. In this work, we discuss bias and limitations of the most frequent optimization metrics from the literature. We provide elements on how to evaluate performance when studying HPC batch scheduling. We experimentally demonstrate these limitations by focusing on two use-cases: a study on the impact of runtime estimates on scheduling performance, and the reproduction of a recent high-impact work that designed an HPC batch scheduler based on a network trained with reinforcement learning. We demonstrate that focusing on quantitative optimization criterion (``our work improves the literature by X\%'') may hide extremely important caveat, to the point that the results obtained are opposed to the actual goals of the authors. Key findings show that mean bounded slowdown and mean response time are hazardous for a purely quantitative analysis in the context of HPC. Despite some limitations, utilization appears to be a good objective. We propose to complement it with the standard deviation of the throughput in some pathological cases. Finally, we argue for a larger use of area-weighted response time, that we find to be a very relevant objective.
Fichier principal
Vignette du fichier
main.pdf (2.57 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04187517 , version 1 (25-08-2023)
hal-04187517 , version 2 (02-09-2024)
hal-04187517 , version 3 (28-11-2024)

Licence

Identifiants

  • HAL Id : hal-04187517 , version 2

Citer

Robin Boëzennec, Fanny Dufossé, Guillaume Pallez. Qualitatively Analyzing Optimization Objectives in the Design of HPC Resource Manager. 2023. ⟨hal-04187517v2⟩
216 Consultations
130 Téléchargements

Partager

More