Title: Fast and Low Complexity Architectures for Arithmetics over GF2m
1Fast and Low Complexity Architectures for
Arithmetics over GF(2m)
- Hua Li
- Department of Math CS
- University of Lethbridge
2Introduction
- Finite field arithmetics are of great importance
in the applications of public-key cryptography
(ECC Elliptic Curve Cryptography), and digital
signal processing (Reed-Solomn encoder/decoder). - The addition operation is fast and inexpensive as
it can be realized with m bitwise XOR operations.
The multiplication operation is costly in terms
of gate number and time delay - There is a great need for fast as well as
low-complexity VLSI (Very Large Scale Integrated)
chips that can efficiently implement fundamental
finite field arithmetic operations.
3Introduction
- This thesis focuses on the designs of fast and
low-complexity architectures for fundamental
finite field arithmetic operations and their
applications in public-key cryptography systems. - The proposed architectures have the properties of
modularity, simplicity, and regular
interconnection, and are very easy and suitable
for VLSI implementations.
4A LOW-COMPLEXITY PIPELINED ARCHITECTURE FOR
NORMAL BASIS MULTIPLIER
- In normal basis, the squaring is a cost-free
cyclic shift operation. - The inversion (the most complicated operation
among the important finite field arithmetic
operations) can be effectively computed by
recursive squaring and multiplication.
5A LOW-COMPLEXITY PIPELINED ARCHITECTURE FOR
NORMAL BASIS MULTIPLIER
- Most of the previously proposed finite field
multipliers operate over a fixed field. - In other words, a new multiplier is needed if
there is a change of the irreducible polynomial.
6A LOW-COMPLEXITY PIPELINED ARCHITECTURE FOR
NORMAL BASIS MULTIPLIER
- A new versatile pipelined multiplier based on
the normal basis representation. - Advantages
- 1. The finite field parameters can be changed
according to the application environments,
increasing the flexibility of using the same
multiplier for different applications.
7A LOW-COMPLEXITY PIPELINED ARCHITECTURE FOR
NORMAL BASIS MULTIPLIER
- Advantages
- 2. The structure of the multiplier can be easily
extended to higher order finite fields. - 3. The basic architecture of the proposed
multiplier can be modified to a low-cost
multiplier which is very suitable for both
embedded systems and wireless devices.
8PIPELINED ARCHITECTURE FOR SERIAL VERSATILE
NORMAL BASIS MULTIPLIER
9A LOW-COST SERIAL VERSATILE NORMAL BASIS
MULTIPLIER
10(No Transcript)
11DESIGN OF MULTIPLIERS USING REDUNDANT
CANONICAL BASIS
- A redundant canonical basis representation is
defined with the irreducible All One Polynomial
(AOP). - Based on the proposed redundant representation,
the multiplication operation can be simplified
and the squaring operation will be a cost-free
permutation of the element coefficients. - Three new multipliers in redundant basis are
presented.
12- The first two are fast bit-parallel multipliers
(Design 1, 2) whose time delays are less than the
previous bit-parallel multipliers 31,33,34,35.
13Design 1 Logic circuit diagram for bit-parallel
multiplier in GF(24)
14Design 2 Another structure for a bit-parallel
multiplier in GF(24)
15Design 3 Bit-serial multiplier in GF(24)
16- The third one is a low-cost bit-serial multiplier
- (Design 3) which only requires m1 2-input
AND/XOR gates. - It reduces the clock period to T_ANDT_XOR,
which is a significant improvement in comparison
with the bit-serial multiplier proposed by Wu in
52.
17The time delays of the proposed redundant basis
bit-parallel multipliers (Design 1 and 2)
are less than the previous bit-parallel
multipliers in \cite Itoh89,Hasan92,Koc98,Hasan93
,Wu98. In particular, the decrease of time
delay is significant when a number
of multiplication/squaring operations are
performed because all of the arithmetic
operations are performed in a redundant canonical
basis until the final operation at which point
the result is converted to a regular canonical
basis (if required) through a simple XOR-gate
constructed hardware.
18the proposed bit-serial multiplier (Design 3) is
innovative and reduces the clock period to
T_ANDT_XOR, which is a significant
improvement in comparison with the bit-serial
multiplier proposed by Wu \it et al. \cite
Wu99 in which the clock period is
T_AND\lceil log_2 (m1) \rceil T_XOR.
The new proposed bit-serial multiplier is
very competitive in the restricted computing
environments, such as smart cards and wireless
communications, especially for applications where
large values of m are used.
19A HYBRID ARITHMETIC ARCHITECTURE FOR REDUNDANT
CANONICAL BASIS AND NORMAL BASIS
- In order to make the proposed redundant
multipliers applicable to the normal basis
representation, a new hybrid arithmetic
architecture is presented to compute the
multiplication, squaring, and inversion
efficiently in both redundant and normal bases.
20The conversion from a normal basis to a redundant
canonical basis is a cost-free permutation of the
coefficients of the element. The conversion
from a redundant canonical basis to a normal
basis requires m 2-input XOR gates.
21A HYBRID ARITHMETIC ARCHITECTURE FOR REDUNDANT
CANONICAL BASIS AND NORMAL BASIS
22The logic structure of the hybrid arithmetic
unit is illustrated in Figure for m10. We
use two signals, the Basis-signal and the
Operation-signal, to control the output, and
basis input. We set the Basis-signal0 if it
is used in a redundant canonical basis and
Basis-signal1 if it is used in a normal basis.
We set the Operation-signal0 if the operation
is multiplication and Operation-signal1 if
the operation is squaring. The output of the
multiplexer is the left input if the control
signal is 0'' otherwise, the output is the
right input. The core unit of the hybrid
arithmetic architecture is the redundant basis
multiplier proposed in the previous chapter.
23There are two options for the multiplier. The
first one is to use the proposed bit-parallel
multiplier (Design 1 and 2) for fast parallel
computation and the second one is to use the
bit-serial multiplier (Design 3) to achieve the
best space and time trade-off when applied in the
embedded systems. The modules of Shifter''
and Permutation'' in Figure \ref
hybrid_mulare used for the squaring operation
of a normal basis and a redundant canonical
basis, respectively. The inversion operation can
be obtained by iteratively squaring and
multiplication.
24The proposed bit-parallel hybrid arithmetic
architecture requires m22m XOR gates and
(m1)2 AND gates. The maximum time delay is
T_\rm AND (\lceil log_2 (m1) \rceil
1)T_\rm XOR. It achieves significant space
improvement compared with the optimal normal
basis multiplier proposed by Sunar and Koc \cite
Sunar2001 which requires 1.5(m2-m) XOR
gates. A reconfigurable VLSI chip for hybrid
finite field arithmetic in GF(210) has been
designed and simulated by Verilog HDL. It was
also synthesized and placed by Synopsys VLSI
design packages.
25CELLULAR AUTOMATA BASED RECONFIGURABLE
ARCHITECTURE FOR SYMMETRIC-KEY AND PUBLIC-KEY
CRYPTOSYSTEMS
- In practical applications, hybrid cryptosystems
are employed to contain both public-key and
symmetric-key cryptosystems. They have both the
security advantages of public-key cryptosystems
and the speed advantages of symmetric-key
cryptosystems. - A low-complexity Programmable Cellular Automata
(PCA) based reconfigurable architecture is
proposed.
26- Through simple configurations, the architecture
not only can be used in the PCA-based block
cipher of symmetric-key encryption, but also can
be configured to be an efficient versatile
modular multiplier in GF(2m), an essential
operation in public-key cryptography. - The unique properties of this reconfigurable
architecture are its capability to be
reconfigured on-line, and its ratio of
throughput/area is much higher than the
traditional FPGA (Field Programmable Gate Array)
method.
27Preliminary CA Theory
- Programmable CA (PCA) is a structure where the CL
(Combinational Logic) of each cell is not fixed
but controlled by a number of control signals
such that different functions (rules) can be
realized on the same structure.
28PCA-based Block Cipher Scheme
- Encryption CEK M
- Decryption MEK-1 C
- A PCA-based block cipher scheme can be achieved
by applying some special characteristics of CA
rules to form the transformation function E. The
fundamental transformations are the combinations
of rules of 51, 153, and 195.
No explain --gt
292-D Pipelined PCA-based Block Cipher
we first propose a fast scheme of 2-D (two
dimensional) pipelined PCA block cipher. In this
scheme, each message block is enciphered by only
one fundamental transformation, that is, q
fundamental transformations ( T0, T1, , Tq-1)
will be applied to q message blocks.
30High Security PCA-based Block Cipher Scheme
- In order to improve the encryption security, the
proposed fast 2-D scheme can be extended such
that each message block is encrypted by q
fundamental transformations.
31PCA-based Versatile Multiplier in GF(2m)
32PCA-based Versatile Multiplier in GF(2m)
- A PCA based versatile modular multiplier in the
canonical (standard, polynomial) basis. - The field parameters can be changed according to
the application environments.
33Optimal PCA Multiplier
34Complexity of the PCA Based Multiplier
35The major differences between the standard PCA
and the extended PCA are that (1) the three
neighbor cells of the standard PCA are the cell
itself and its two nearest left/right neighbor
cells but in the extended PCA, the neighbor
cells can be either the nearest left neighbor
cell or the left/right most cells and (2) the
standard PCA cell (used in PCA block cipher)
performs an XNOR operation and the extended PCA
cell performs an XOR operation.
36Unified PCA Cell
- If the control signal C is set to '1' --gt the
block cipher encryption (Standard PCA) - If C is set to '0 --gt a finite field multiplier.
(Extended PCA).
37Unified PCA Based Reconfigurable Architecture
- If the control signal C is set to '1' --gt the
block cipher encryption The CA rules of
fundamental transformations (T0, T1, ..., Tq-1)
can be loaded into the cyclic shift registers at
configuration time and they are not required to
change once the the encryption scheme is settled.
- If C is set to '0 --gt the architecture will be
configured to be a finite field multiplier. We
can load the coefficients of the irreducible
polynomial into the registers at initialization
and change the values of registers only if the
irreducible polynomial is changed.
38Unified PCA Based Reconfigurable Architecture
39A FAST ALGORITHM FOR MULTIPLICATION ON ELLIPTIC
CURVES
- A new fast algorithm for multiplication of a
point on the elliptic curve with a large integer
based on non-adjacent form(NAF) Frobenius
expansion is proposed.
40(No Transcript)
41Simulation and VLSI Design
- An 8-bit Reconfigurable Crypto-Chip
- 879 Standard Cells
- 55 I/O Circuit Cells
- Total Area is 131845
- 200MHz clock
42A multiplier two clock cycles are required for
one multiplication.A PCA block cipher one
clock cycle is required to encrypt a message
block. It can perform encryption 2 message block
simultaneously.
43Simulation and VLSI Design
44Future Research
- In the future research, we will extract the
common components of the different basis
multipliers and design a reconfigurable and
versatile multiplier which can be used in any
basis and can be applied in hybrid cryptography
systems. - Furthermore, hardware/software co-design and
system-on-chip will be considered in order to
implement the fast and low-complexity
reconfigurable VLSI architectures.
45