Towards an efficient implementation of CADNA in the BLAS : Example of DgemmCADNA routine. - Sorbonne Université
Conference Papers Year : 2012

Towards an efficient implementation of CADNA in the BLAS : Example of DgemmCADNA routine.

Abstract

Several approximations occur during a numerical simulation : physical phenomena are modelled using mathematical equations, continuous functions are replaced by discretized ones and real numbers are replaced by _nite-precision representations (oating-point numbers). The use of the IEEE-754 arithmetic generates round-o_ errors at each elementary arithmetic operation. By accumulation, these errors can affect the accuracy of computed results, possibly leading to partial or total inaccuracy. The effeect of these rounding errors can be analyzed and studied by some methods like forward/backard analysis, interval arithmetic or stochastic arithmetic (which is implemented in the CADNA validation tool). A numerical veri_cation of industrial codes, such those that are developed at EDF R&D {the French provider of electricity{, is required to estimate the precision and the quality of computed results, even more for code running in HPC environments where millions instructions are performed each second. These pro- grams usually use external libraries (MPI, BLACS, BLAS, LAPACK) [1]. In this context, it is required to have a tool as nonintrusive as possible to avoid rewriting the original code. In this regard, the CADNA library appears to be one of the promising approach for industrial applications. The CADNA library, developed by the Laboratoire d'Informatique de Paris 6, enables us to estimate round-o_ error propagation using a probabilistic approach in any simulation program (written in C/C++ or Fortran) and to control its numerical quality by detecting numerical instabilities that may occur at run time [2]. CADNA implements Discrete Stochastic Arithmetic which is based on a probabilistic model of round-o_ errors (this arithmetic is de_ned with the CESTAC Method). CADNA provides new numerical types, the so-called stochastic types, on which round-o_ errors can be estimated. However, a problem remains: stochastic types are not compatible with the aforementioned libraries. It is, therefore, necessary to develop some extensions for these external libraries. We are interested in an e_cient implementation of the BLAS routine xGEMM compatible with CADNA.We have called this new routine DgemmCADNA. The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations and xGEMM is the routine which goal is to perform matrix multiplication [5]. The implementation of a basic algorithm for matrix product compatible with stochastic types leads to an overhead greater than 1000 for a matrix of 1024*1024 compared to the standard version and commercial versions of xGEMM. This overhead is due to the use of stochastic types, the rounding mode which changes randomly at each elementary operation (_; =;+;_), and a non optimized use of the memory (cache and TLB misses). We will present di_erent solutions to reduce this overhead and the results we have obtained. In order to improve the hierarchical memory usage, special data structures (Block Data Layout) are used. This allows us to improve the memory performance to reduce cache and TLB misses. A new implementation of CESTAC Method has been introduced to reduce the overhead due to the random rounding mode. Finally, we have obtained an overhead about 25 compared to GotoBLAS in a sequential mode. We will also present, briey, new extensions for CADNA : CADNA MPI and CADNA BLACS which allow to use stochastic data in programs using the communications standard routines (MPI or BLACS).
Fichier principal
Vignette du fichier
SethyMontan_scan2012.pdf (47.66 Ko) Télécharger le fichier
Origin Files produced by the author(s)
Loading...

Dates and versions

hal-00765529 , version 1 (14-12-2012)

Identifiers

  • HAL Id : hal-00765529 , version 1

Cite

Séthy Montan, Jean-Marie Chesneaux, Christophe Denis, Jean-Luc Lamotte. Towards an efficient implementation of CADNA in the BLAS : Example of DgemmCADNA routine.. 15th GAMM - IMACS International Symposium on Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN), Sep 2012, Novosibirsk, Russia. ⟨hal-00765529⟩
304 View
80 Download

Share

More