HyPoRes: An Hybrid Representation System for ECC

The Residue Number System (RNS) is a numeral representation enabling for more efficient addition and multiplication implementations. However, due its non-positional nature, modular reductions, required for example by Elliptic Curve (EC) Cryptography (ECC), become costlier. Traditional approaches to RNS modular reduction resort to the Montgomery algorithm, underpinned by large basis extensions. Recently, Hybrid-Positional Residue Number Systems (HPRs) have been proposed, providing a trade-off between the efficiency of RNS and the flexibility of positional number representations. Numbers are represented in a positional representation with the coefficients represented in RNS. By crafting primes of a special form, the complexity of reductions modulo those primes is mitigated, relying on extensions of smaller bases. Due to the need of crafting special primes, this approach is not directly extensible to group operations over currently standardised elliptic curves. In this paper, the Hybrid-Polynomial Residue Number System (HyPoRes) is proposed, enabling for improved modular reductions for any prime. Experimental results show that the modular reduction of HyPoRes, although at most 1.4 times slower than HPR for HPR-crafted primes, is up to 1.4 times faster than a generic RNS approach for primes of ECC standards.


I. INTRODUCTION
The RNS has found extensive application on cryptographic systems [1], since it reduces the complexity of long-integer additions and multiplications.However, due to the non-positional nature of RNS, operations such as modular reductions and comparisons are more difficult to implement in an efficient manner.The complexity of most RNS-based modular multiplications is dominated by RNS basis extensions [2], [3].In [4], this problem is mitigated through a mixed positional-RNS representation, called Hybrid-Positional Residue Number System (HPR).Numbers are expressed in a positional representation with the coefficients represented in RNS.This approach reduces the size of the RNS bases, thus reducing the complexity of basis extensions, while still benefiting from the arithmetic independence of RNS channels.
Bigou and Tisserand have applied [4] to ECC by crafting special primes with the form P = B n 1 − β, where B 1 corresponds to the dynamic range of an RNS basis, n is the number of positional digits and β is a small integer.This enables efficient reductions modulo P .After a product, the Most Significant Digits (MSDs) are multiplied by β and added to the Least Significant Digits (LSDs).Then, the size of the coefficients is reduced through carry propagation.Since carry propagation requires computing a division and flooring, basis extensions are still required, but with bases of a smaller size than a generic RNS approach.
While the methods described in [4] are an efficient approach to ECC, one does not often have the opportunity to choose the underlying prime P since these are standardised for most cryptographic applications.Herein, we propose the Hybrid-Polynomial Residue Number System (HyPoRes) that is of an hybrid nature, but allows for efficient reductions for any prime.We achieve so by decoupling the positional representation radix from the RNS dynamic range.After a product, one can still multiply the MSDs by a small constant and add them to the LSDs.However, carry propagation is no longer possible.Hence, to reduce the size of the coefficients, a Montgomery reduction that exploits a redundant representation of zero is used.
Since the Montgomery reduction is more computationally expensive than carry propagation, experimental results show that HyPoRes is at most 1.4 times slower than the HPR.Nevertheless, while the HPR does not support primes currently standardised for ECC, the proposed approach is applicable for any prime, achieving speed-ups of up to 1.4 when compared to generic RNS approaches.These results show a connection between the strength of the assumptions a representation relies on and its efficiency.First, the HPR restricts itself to primes of a special form and achieves the best performance.Second, the HyPoRes relies on the precomputation of values that require one to know the factorisation of the underlying modulus, producing good performance.Finally, RNS generic approaches make no assumption about the underlying modulus, achieving reasonable performance.
The remaining of the paper is organised as follows.Section II provides the necessary background on modular arithmetic, the RNS and lattices to build the proposed HyPoRes.The proposed representation system is described in Section III.Section IV presents the related art.The proposed approach is compared with the related art in Sections V and VI from theoretical and practical perspectives, respectively.Representational techniques to improve resistance against Side-Channel Attacks (SCAs) are introduced in Section VII.Finally, Section VIII concludes the paper.

II. BACKGROUND
In this section, fundamental results about modular arithmetic, the RNS and lattices are reviewed.In this paper, [•] P is used to denote the centred remainder of the division by P and lcm the least common multiple.Note that the techniques herein proposed can be slightly modified to handle positive instead of centred remainders.

A. Modular Arithmetic
Most branches of cryptography deal with modular arithmetic.In such a system, two integers a, b ∈ Z are said to be congruent if their difference (a − b) is divisible by a modulus P .This is denoted as a = b mod P .Furthermore, we denote by Z P the set of all congruency classes modulo P .After each addition (a + b) or multiplication (a × b) in Z P , the results c are mapped to a smaller integer congruent with c modulo P to reduce memory requirements and the computational cost of further operations.Still, in ECC, P is a large prime, typically hundreds of bit wide, and there is a need to map operations modulo P to smaller moduli b i,j that are compatible with current computer architectures.To do so efficiently, the representation system herein proposed requires precomputing n th roots in Z × P .Lemma II.1 states under which conditions a value r has an n th root in Z × P , i.e. there exists x ∈ Z × P such that [x n ] P = r.This root may be computed with [5].Moreover, herein a −1 mod P and 1/a mod P are both used to denote the modular inverse of a modulo P , i.e. the integer −P/2 ≤ x < P/2 such that xa = 1 mod P .
Lemma II.1.r ∈ Z × P has an n th root modulo P , where P is a prime number, iff [r t ] P = 1 for lcm(n, P − 1) = nt.

P
. Otherwise, let g be a generator of Z × P .Then r = g u mod P for some u ∈ Z. r t = g tu = 1 mod P iff P − 1 divides tu.In this case, and so n divides u since lcm(n, P − 1)/(P − 1) does not.
Modular arithmetic extends naturally to polynomials.Two polynomials in Z[X]/f (X) are congruent when their difference is divisible by f (X); and in Z P [X]/f (X) when their difference is divisible by f (X) or their coefficients are congruent in Z P .
More concretely, the RNS exploits the CRT to replace additions, subtractions and multiplications over large integers in Z B1 by the coefficient-wise additions, subtractions and mutltiplications over the smaller channels Z b1,0 , . . ., Z b 1,h 1 −1 .While these operations are made faster with the RNS, operations such as divisions and modular reductions are harder to implement.One often has to use basis extensions to deal with these operations.This procedure exploits (1) to extend the representation of a number in basis B 1 = {b 1,0 , . . ., b 1,h1−1 } to another basis B 2 = {b 2,0 , . . ., b 2,h2−1 }.In cases where an error of αB 1 can be tolerated, extensions may be performed with FastBConv to approximate (1) in an efficient way [2]: When such an error cannot be tolerated, an extra residue a sk = a mod b sk may be computed.This enables the computation of α as α = (FastBConv(a, B 1 ) − a sk )B −1 Lemma 6].In this case, the basis extension may be terminated with FastBConvSK:

C. Lattices
Given a basis the lattice L(R) generated by the rows of R ∈ R m×n corresponds to the following discrete subgroup of R n : Herein, we will only be dealing with full rank integer lattices, corresponding to the case where m = n and R ∈ Z n×n .In this case, the lattice L(R) induces a congruence relation over Z n×n .Two vectors are said to be congruent when their difference is in L(R): Reducing a vector u modulo a basis R corresponds to finding the vector v satisfying v = u mod L(R) and

III. PROPOSED HYPORES SYSTEM
The proposed representation system can be seen as a generalisation of [7], and is described in Definition III.1.A number a ∈ Z P is represented as a polynomial A(X) with coefficients of norm smaller than ρ that when evaluated in γ produces: γ satisfies [γ n ] P = β for a small integer β.Thus, operating with these polynomials modulo X n − β is isomorphic to operating with the corresponding integers modulo P .In [7], it is proven that for digits satisfying |a (i) | < ρ, a ρ ≥ βP 1/n suffices to represent all congruency classes a ∈ Z P .Herein, digits are represented with respect to two RNS bases B 1 and B 2 and a modulus b sk .The need for these moduli will become evident when describing the HyPoRes multiplication algorithm.
β is defined to be the smallest integer that is not an n th power over Z, but that has an n th root modulo P (see Lemma II.1).We identify this root with γ n = β mod P .Positive integers 0 ≤ a < P are represented as a polynomial of n coefficients (a (0) , . . ., a (n−1) ), wherein each coefficient a (i) is represented with respect to the two RNS bases and the modulus b sk , satisfying: , |a (i) | < ρ and a i,k,j = a (i) mod b k,j .We use a i,k to denote a (i) mod B k , capital values A to denote a representation of a under H, A 1 to denote the representation of a under B 1 , A 2 the representation of a under B 2 , a sk the representation of a modulo b sk and define the norm
where element-wise multiplications are conducted in RNS.
The vector M used in Algorithm 1 corresponds to a small nonzero representation of zero under H.A representation of M with norm smaller than P 1/n is guaranteed to exist, as described in Lemma III.1.
Lemma III.1.A nonzero representation of zero of norm smaller than P 1/n exists under H Proof.We start by building the lattice L(Γ) of the representations of zero under H where Each line in Γ corresponds to either P = 0 mod P or −γ i + X i , which when evaluated at X = γ produces a value congruent with 0. Minskowski's theorem [8] guarantees that L(Γ) contains a nonzero vector of norm at most (detL(Γ)) 1/n = P 1/n .Thus M can be obtained by finding the nonzero shortest lattice point in L(Γ).While this problem is complex in general, herein we are dealing with lattices of a small dimension, making it solvable in a short time.
In essence, Algorithm 1 starts by computing D = A C ∼ = A × C mod P .Then, to reduce the size of the coefficients, a multiple of a nonzero representation of zero M is added to D, making the result divisible by B 1 .Since the scaling factor Q is computed modulo B 1 , it is first produced in B 1 and then extended to B 2 .Afterwards, R = D+Q M B1 is outputted.The division by B 1 guarantees that the norm of the result is small.However, since this division is not possible in B 1 , this value is first produced in B 2 and then extended to B 1 .
Algorithm 1 requires that an inverse of M exists in the ring Z B1 [X]/ (X n − β).Lemma III.2 guarantees that this is the case when B 1 is built from prime numbers b 1,i that do not divide the resultant of M and X n − β and M = 0 mod B 1 .
where r is the resultant of M and X n − β and r = 0 [9].We find the inverse of M modulo B 1 and X n − β by computing U r −1 mod b 1,i for 0 ≤ i ≤ h 1 − 1 and lifting the result to Z B1 with the CRT.The resultant r must be invertible modulo b 1,i , i.e. coprime to b 1,i .Since b 1,i is prime, it must not divide r.
Finally, Theorem III.1 proves that Algorithm 1 produces the correct result.Even though values represented under H will normally satisfy ||A|| ∞ < ρ, their norm might grow after non-reduced additions.Hence, we assume that the inputs to Algorithm 1 have their norm bounded by ||A|| ∞ < kρ.One of the main conditions of Theorem III.1 is that Moreover, an appropriate choice of λ might in some cases reduce the number of moduli of B 2 by one when compared with the case λ = 1.
, for an integer λ, pairwise coprime B 1 , B 2 and b sk and To prove that Algorithm 1 produces a correct result, we notice that D +Q M = D −(D M −1 +αB 1 ) M = 0 mod B 1 (for a polynomial error α that results from an inexact basis extension) and hence D +Q M is divisible by B 1 .Moreover, since M is a representation of zero modulo P , we have that R ≡ D+Q M B1 = D B1 = DB −1 1 mod P .We assume that the inputs to Algorithm 1 satisfy ||A|| ∞ , ||C|| ∞ < kρ.Furthermore, we assume that ρ < B 1 .Notice that since the value of Q is extended from B 1 to B 2 in an inexact way, its norm is bounded by h 1 B 1 .In this case, the norm of R will satisfy Since we require that ||R|| < ρ, should satisfy: 1 factor, one should multiply values by B 1 mod P before representing them in HyPoRes, as represented in Fig. 1.Additions and subtractions are performed without any modular reduction.While this implies that coefficients may grow, we have taken this into account during the design of the modular multiplication algorithm.Therein, one assumes the inputs' coefficients to be bounded by kρ where ρ is the maximum norm of the coefficients of the output.It is clear that the format aB 1 is maintained after additions since aB 1 + bB 1 = (a + b)B 1 .This format is also maintained after modular multiplications.In particular, Algorithm 1 computes (aB 1 )(bB 1 )B −1 1 = abB 1 mod P .A first approach to convert a value a from a binary representation to the HyPoRes system would start by considering the lattice generated by M X i where M is the nonzero representation of zero referred to in Lemma III.1 and 0 ≤ i < n.This lattice has a basis m (1) . . .m (n−1) βm (n−1) m (0) . . .m (n−2)  . . . . . . . . . . . .βm (1)  βm (2) . . .
with vectors of norm at most β||M || ∞ .Since M X i are representations of 0 under H, the vector U that results from the reduction of (a, 0, . . ., 0) modulo R represents a under H.Moreover, since the vectors M X i have norm at most β||M || ∞ , so does U .A standard HyPoRes representation is achieved by reducing the coefficients of U modulo each value in B 1 , B 2 and b sk .
A second, more efficient approach, relies on the precomputation of the values T [i] = [2 i log 2 P /n B 2  1 ] P under H using the aforementioned conversion.With this second approach, the multiplication by B 1 is integrated in the conversion process and does not need to be performed in a separate step.The value a to be converted is decomposed into n words of log 2 P /n bits each: Each a[i] can be directly represented in H by reducing it modulo each value in B 1 , B 2 and b sk , and associating it with the first entry of the vector A[i] ∈ H, which has 0 in all the other entries.Then, [aB 1 ] P is computed as

IV. RELATED ART
The optimisation of modular reductions has often focused on primes of particular forms [10], [4].A first approach [10] induces a quadratic-time multiplication algorithm.A value a is represented as a polynomial A in a ring Z[X]/ X b − β such that A(γ) = a mod P .The coefficients of the polynomials are represented as a single computer word; and P and γ are chosen such that γ n − β = 0 mod P and 2 k can be represented as a polynomial M with small coefficients.After computing the product D = A C, the magnitude of the coefficients of D is recursively reduced by rewriting D as D = D L + D H 2 k with ||D L || < 2 k , and updating D with the value of D L + D H M using only shifts and additions.A second approach [4] induces a subquadratic-time multiplication algorithm.It similarly represents a as a polynomial A in a ring Z[X]/ X b − β such that A(γ) = a mod P .However, coefficients are represented in RNS with two bases of dynamic range B 1 and B 2 ; with γ = B 1 and P = B n 1 − β.After computing the product D = d (0) + d (1) X + . . .+ d (n−1) X n−1 = A C, the norm of the coefficients of D is reduced through two rounds of carry propagation.Carries are computed approximately with basis extensions.In particular, for i ∈ {0, . . ., n − 1}, the carry e i,2 is computed as e i,2 = di,2−FastBConv(d,B1) B1 mod B 2 , and then e i,1 is computed through an exact extension from A second round of carry propagation is necessary to achieve small enough coefficients.In this case, the carries are small enough that it suffices to compute e i,2 = di,2−FastBConv(d,B1) B1 in a single modulus of B 2 , and then copy the residue to the remaining moduli in B 1 and B 2 .The two above described modular reduction methods do not extend to primes currently used by ECC standards.First, most standards follow the approach in [11] by choosing a P that is optimised for binary representation systems, i.e.P = f (2) for a very sparse polynomial f .Second, isogenybased ECC [12] is based on primes built as P = l e A A l e B B f ± 1, for two small different primes l A and l B , two large exponents e A and e B and a small cofactor f .Both these cases are incompatible with [10], [4].In contrast, [7], [3] are suitable for any underlying prime.[7] builds polynomial systems as in [10]  The iterative nature of this approach, along with the need to look up precomputed tables make it more appropriate for hardware implementations.Finally, [3] can be seen as a specific case of HyPoRes with n = 1, i.e. when polynomial reductions play no role in the algorithm.
While [7], [3] can handle any P , both lead to quadratic-time modular multiplication algorithms.In contrast, it will be seen in Section V that HyPoRes achieves a subquadratic-time complexity for any prime P .Comparisons in Sections V and VI will focus on [4], [3] since [4] also achieves subquadratic-time complexity but for specially crafted primes, and [3] works for any moduli, even when their factorisation is not known, enabling us to evaluate the impact of the assumptions one is allowed to make on the performance of the resulting systems.

V. COMPUTATIONAL COMPLEXITY
The efficiency of Algorithm 1 can be evaluated in terms of the amount of Single-precision Modular Multiplications (SMMs).We assume that in (4) multiplications by β can either be computed through shifts and additions, since β is a small integer, or, when multiplying by M and M , β can be integrated on the precomputed M and M .Hence, (4) requires n 2 multiplications for each moduli it is being operated on.Part of the constants needed for the basis extensions, related for instance with the multiplication of the residues of q by b 1,j /B 1 in preparation of the extension of q to B 2 can also be integrated in the precomputation of M and dealt with at no cost.A further nh 2 multiplications may be saved by storing the values ξ i,2,j = a i,2,j b 2,j /B 2 mod b 2,j instead of a i,2,j mod b 2,j in basis B 2 for any a represented under H.One can conclude that the cost of Algorithm 1 in terms of SMMs is: To achieve a fairer comparison, the approach of [4] has been adapted to make use of the basis extensions described in Section II-B.In this case, the amount of SMMs required to compute a multiplication modulo P , including the two rounds of carry propagation described in Section IV, is: Under similar assumptions, a classical RNS modular multiplication, as described in [3], with RNS bases B 1 and B 2 with 2 P , both the proposed scheme and [4] have complexities of O(log 3/2 2 P ) SMMs.In contrast, with a pure RNS approach, one has that H 1 ∼ H 2 ∼ log 2 P , leading to a complexity of O(log 2 2 P ).

VI. EXPERIMENTAL RESULTS
The proposed method for modular multiplication was described in C++.Also, [4], [3] were implemented for comparison.The pure RNS-based multiplication can be seen as a simplification of the proposed method when n = 1, and thus M = P and γ and β play no role.We have   I: The HyPoRes parameters used to evaluate the performance of the proposed method for the primes P 383 , P 448 and P 521 considered the primes P 383 , P 448 and P 521 , and the HyPoRes parameters described in Table I.Notice that the primes P 383 , P 448 and P 521 have been used to define the elliptic curves M-383, Ed448-Goldilocks and E-521 [13], [14], respectively.Moreover, HPR-crafted primes P 384 , P 448 and P 512 of 384, 448 and 512 bits, respectively, have been considered for the implementation of [4].The bases B 1 and B 2 in Table I, as well as those chosen for [4], [3], are composed of integers of the form b i,j = 2 32 − c i,j for small c i,j , enabling for fast reductions [15].In particular, a number a resulting from a product is rewritten as a = a 0 + 2 32 a 1 , and the equality 2 32 = c mod b i,j is applied iteratively as a = a 0 + ca 1 to reduce the magnitude of a.
Fig. 2 presents the required amount of elementary multiplications for the proposed approach with the parameters described in Table I; for a pure-RNS approach with equivalent parameters (h 1 = 13 for P 383 ; h 1 = 15 for P 448 ; and h 1 = 17 for P 521 ); and for HPR with the primes P 384 (n = 2 and h 1 = 6), P 448 (n = 2 and h 1 = 7) and P 512 (n = 4 and h 1 = 4).Moreover, the above-described code was compiled with gcc 4.8.5 with the -Ofast and -march=native flags and executed on a i7-3770K processor with 8GB of main memory operated by CentOS 7.3.No parallelism was exploited.The average modular multiplication time for the HyPoRes, pure-RNS and HPR representations can be seen in Fig. 3.
Figures 2 and 3 suggest that, although a similar performance is attained for both HyPoRes and a pure-RNS approach for the prime P 383 , the HyPoRes system has a better scalability as the bit-length of the primes increases.This behaviour was predicted in Section V.Moreover, while second-order factors, such as the number of additions, limit the obtained speed-up of HyPoRes when compared with a pure-RNS approach for the smaller primes, when comparing the theoretical predictions of Fig. 2 with the experimental results of Fig. 3, a maximum speed-up of approximately 1.4 is obtained in both cases for P 521 .
While Figures 2 and 3 show that the performance of HyPoRes is slightly worse than HPR, it relies on weaker assumptions (since it does not require the use of specially crafted primes), making it more flexible and applicable in practice.Fig. 4 emphasises the relation between the assumptions pure-RNS, HyPoRes and HPR rely on and the performance of the resulting system.Since HPR relies on primes of a particular kind, this makes it hardly practical, because the primes for cryptographic applications have already been standardised with a different shape.In contrast, HyPoRes can be used whenever one knows the factoring of the underlying modulus.While the applicability to ECC has been herein demonstrated, HyPoRes can also be applied to ElGamal [16] and Rivest-Shamir-Adleman (RSA) [17] decryption and signing.In addition, while HyPoRes is less widely applicable than a pure- Pure-RNS HyPoRes HPR Fig. 3: Average execution time of a pure-RNS and the proposed approaches, as well as of with HPR with specially crafted primes.The settings in parenthesis were used for the HPR modular multiplication Better Performance Pure-RNS [3] HyPoRes HPR [4] Weaker Assumptions Fig. 4: Qualitative comparison of HyPoRes with related art [3], [4] RNS approach, it makes the application of small RNS bases of near power-of-two moduli [18], typically employed in signal processing, viable for cryptographic applications.

VII. BEYOND PERFORMANCE: PROTECTION AGAINST SCAS
SCAs exploit weaknesses in the implementation of cryptosystems to derive sensitive information from power traces, timing analysis and other physical sources of information.In the context of ECC, Simple Power Analyses (SPAs), wherein one tries to distinguish between EC point-doubling and adding by directly analysing power traces, can be mitigated through the use of formulae in which the two operations are realised with the same basic operations [19].In contrast, Differential Power Analyses (DPAs) [20] predict power consumption based on an hypothesis for a subset of the bits of the sensitive information and correlate it with actual power measurements to find the most likely hypothesis, requiring a large number of power traces to retrieve the private data.Certain techniques [21] use internal correlations to reduce the necessary number of traces for a successful attack.A single trace may sometimes suffice.
The resistance against this type of attacks may be improved through message blinding, wherein the representation of the values being operated on is randomised, to prevent the prediction of the consumed power.First, resistance against DPAs [20] may be achieved by randomising the representation of values at the beginning of EC point multiplication.Second, if one wants to protect an implementation against [21], values should be randomised during point multiplication.

A. Generalisation of the Reduction Polynomial
Definition III.1 is herein relaxed to allow for multiple representations of the same value, and enable randomisation through an arbitrary choice of one of them.Instead of selecting γ as a root of X n − β modulo P , γ is defined to be a root of E(X) = e (0) +. ..+e (n−1) X n−1 +X n modulo P , where E is an irreducible polynomial over Z[X] with small coefficients.In this case, Lemma III.1 is still applicable since it is independent of the underlying E, ensuring the existence of a small nonzero representation of zero M .Furthermore, multiplication of A by X is achieved with the following vector-matrix multiplication: Thus, (4) may be generalised to A C = AC mod E = a (0) a (1) . . .a (n−1) where the multiplication by X i is achieved by multiplying C by the i-th power of D in (6).Since E has small coefficients, the multiplication by D i can be implemented with only shifts and additions.Therefore, the complexity of Algorithm 1 is maintained for the new definition of .
The parameters of an EC may be precomputed under multiple HyPoRes associated with different reducing polynomials E. The prevention of DPA may be achieved by selecting a random E thereof at the beginning of point multiplication.In this case, a conversion from the binary representation system to a generic HyPoRes is required.When preventing attacks like [21], conversions between HyPoRes of different E are required throughout the EC point multiplication.We now consider these two types of conversions.The first deals with the conversion of a binary value to an HyPoRes with generic E. In this case the strategy proposed in Section III-B is still applicable by replacing R with The second type deals with conversions from an HyPoRes associated with a polynomial E, H E , to another associated with E , H E .By noticing that the representation of [aB 1 ] P under H E satisfies the conversion to H E is achieved as

HyPoRes-mul E (A[i], T E→E [i])
where T E→E [i] is a representation of γ i B 1 mod P under H E .

VIII. CONCLUSION
While the HPR has made subquadratic-time multiplication algorithms viable for ECC, the need to use primes of a special form makes it hardly practical.Through a weakening of the underlying the HyPoRes has been herein proposed.It not only achieves a similar subquadratic time complexity, but it also supports any prime, making it compatible with standardised elliptic curves.This results in a slow-down of at most 1.4 when HyPoRes is compared with HPR for HPR-crafted primes, but produces an acceleration of up to 1.4 when compared with generic RNS based approaches for primes of ECC standards.The implications of HyPoRes are wide-spreading.While it is less applicable than pure-RNS approaches, it makes the application of small RNS bases of near power-of-two moduli, typically employed in signal processing, viable for cryptographic applications.Also, since it reduces the complexity of basis extensions when compared with a pure-RNS approach, HyPoRes is more amenable to parallelism at a smaller scale.Finally, through a generalisation of the reducing polynomial, redundant representations are introduced, providing for resistance against SCAs.

1 /B 1 aB 1 Fig. 1 :
Fig. 1: Values in Z P are premultiplied by B 1 before the proposed arithmetic routines are applied but for γ n = β and any P .After a product D = A C, the coefficients of D are reduced by recursively rewriting D as D = D L + D H 2 k with ||D L || < 2 k , and updating D with the value of D L + D H , where D H = D H mod X n − mod, P with ||D H || < βP 1/n precomputed for all possible D H 2 k .