Stab-FD: a cooperative and adaptive failure detector for wide area networks - Sorbonne Université
Journal Articles Journal of Parallel and Distributed Computing Year : 2024

Stab-FD: a cooperative and adaptive failure detector for wide area networks

Abstract

Failure detectors (FDs) are a fundamental abstraction that plays a central role in the design of distributed systems. FDs are distributed oracles that provide processes with unreliable information about process failures, often in the form of a list of trusted or suspected process identities. In this article, we propose a timer-based FD which assesses the quality of its input links, and exchanges its local estimations with other nodes. Nodes use this information to adjust their timers dynamically. Capturing the variations in the quality of each link reduces the number of false suspicions without degrading failure detection time. We present experiments on a dataset of real traces collected on PlanetLab, and compare our approach to well-known state-of-the-art algorithms. Our results show that our new algorithms yield a good trade-off in terms of failure detection speed and accuracy in real scenarios.
Fichier principal
Vignette du fichier
JPDC-2024.pdf (935.14 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-04389132 , version 1 (11-01-2024)

Identifiers

Cite

Pierre Sens, Luciana Arantes, Anubis Graciela de Moraes Rossetto, Olivier Marin. Stab-FD: a cooperative and adaptive failure detector for wide area networks. Journal of Parallel and Distributed Computing, 2024, 186, pp.104803. ⟨10.1016/j.jpdc.2023.104803⟩. ⟨hal-04389132⟩
85 View
40 Download

Altmetric

Share

More