Superscalar Coprocessor for High-speed Curve-based Cryptography - PowerPoint PPT Presentation

About This Presentation
Title:

Superscalar Coprocessor for High-speed Curve-based Cryptography

Description:

Superscalar Coprocessor for High-speed Curve-based Cryptography K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede Katholieke Universiteit Leuven / IBBT – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 29
Provided by: Kazu47
Category:

less

Transcript and Presenter's Notes

Title: Superscalar Coprocessor for High-speed Curve-based Cryptography


1
Superscalar Coprocessor forHigh-speed
Curve-based Cryptography
  • K. Sakiyama, L. Batina, B. Preneel, I.
    Verbauwhede
  • Katholieke Universiteit Leuven / IBBT
  • Department Electrical Engineering - ESAT/COSIC

2
Overview
  • Introduction
  • Curve-based Cryptography
  • HW/SW Partitioning
  • Superscalar Coprocessor
  • Results
  • Conclusions

3
IntroductionMotivation
  • High-speed curve-based cryptography in HW/SW
    co-design
  • How much instruction-level parallelism can we
    obtain from coprocessor instructions?
  • Performance improvement for different operation
    forms in datapath
  • ABC mod P vs A(BD)C mod P ,A,B,C,D,P
    polynomials
  • Performance comparison three different
    curve-based cryptosystems
  • Which one is faster between ECC, HECC, ECC over a
    composite field?
  • Programmability and scalability
  • Programmable in order to support different
    cryptosystems?
  • Scalable in field sizes?

4
IntroductionTarget Architecture
  • Curve-based cryptography over binary fields
  • Hardware can be smaller and faster than prime
    field
  • ECC over a binary field, e.g. GF(2163)
  • HECC of genus 2
  • Field length can be shorter with a factor of
    2, e.g. GF(283)
  • ECC over a composite field
  • Field length can be shorter with a factor of
    2, e.g. GF ((283)2)
  • The datapath can be shared
  • Programmable coprocessor supporting three
    curve-based cryptography by defining coprocessor
    instruction(s)
  • (Coprocessor) instruction-level parallelism by
    superscalar

5
Overview
  • Introduction
  • Curve-based Cryptography
  • HW/SW Partitioning
  • Superscalar Coprocessor
  • Results
  • Conclusions

6
Curve-based CryptographyHW/SW partitioning (1)
  • General hierarchy in coprocessor for curve-based
    cryptography

Point/Divisor Multiplication
SW or HW controller
Point/Divisor Addition
Point/Divisor Doubling
SW or HW controller
Finite Field Addition
Finite Field Multiplication
Finite Field Inversion
HW Datapath
7
Curve-based Cryptography Proposed Hierarchy (1)
  • Single instruction for all finite field
    operations
  • Fixed-cycle execution enables efficient
    implementation

Single Instruction (Datapath)
Point/Divisor Multiplication
Point/Divisor Multiplication
Conventional
Point/Divisor Addition
Point/Divisor Doubling
Finite Field Inversion
Point/Divisor Addition
Point/Divisor Doubling
Finite Field Operation E.g. ABC mod P
Finite Field Addition
Finite Field Multiplication
Finite Field Inversion
8
Curve-based Cryptography Modular Arithmetic
Logic Unit (MALU)
  • (a) Building block Regular XOR chains
  • (b) Scalable in digit size (d) and field size (k)
    by interconnecting several building blocks
  • We use MALU83 (n83, d12) as building block
  • 2xMALU83 can be configured as 1xMALU163

9
Overview
  • Introduction
  • Curve-based Cryptography
  • HW/SW Partitioning
  • Superscalar Coprocessor
  • Results
  • Conclusions

10
HW/SW PartitioningTYPE I Smallest
implementation (baseline)
Main CPU
SRAM
Program ROM
Memory Mapped I/O
32-bit instructions
32-bit data
Coprocessor
DBC
IBC
Instruction Bus
Data Bus
MALU83
11
HW/SW Partitioning TYPE II TYPE I m-code RAM
Main CPU
SRAM
Program ROM
Memory Mapped I/O
32-bit instructions
32-bit data
Coprocessor
IBC
FSM
m-code RAM
DBC
Instruction Bus
Data Bus
MALU83
12
HW/SW Partitioning TYPE III TYPE I
Coprocessor Memory
Main CPU
SRAM
Program ROM
Memory Mapped I/O
32-bit instructions
32-bit data
Coprocessor
DBC
IBC
Instruction Bus
Data Bus
MALU83
Coprocessor Memory
13
HW/SW Partitioning TYPE IV TYPE I Copro.
Mem. m-code RAM
Main CPU
SRAM
Program ROM
Memory Mapped I/O
32-bit instructions
32-bit data
Coprocessor
IBC
FSM
m-code RAM
DBC
Instruction Bus
Data Bus
MALU83
Coprocessor Memory
14
HW/SW Partitioning Co-design flow with GEZEL
C/C codes for PKCs
Partitioning of functions
C/C codes H/W behavior blocks w/interface
ARM (SW)
Co-processor (HW)
C/C codes w/physical memory map
Cycle-true sim. (GEZEL)
GEZEL FDL codes
Cross compile
Synthesis
VHDL codes
Program codes
15
HW/SW Partitioning Result Vertical Exploration
of System
  • HECC Performance for different HW/SW partitioning
  • (Performance Point/Divisor multiplication)

16
Overview
  • Introduction
  • Curve-based Cryptography
  • HW/SW Partitioning
  • Superscalar Coprocessor
  • Results
  • Conclusions

17
Superscalar Coprocessor Proposed Hierarchy (2)
  • Multiple Modular Arithmetic Logic Units (MALUs)
    in coprocessor

Single MALU
Point/Divisor Multiplication
Multiple MALUs
Point/Divisor Multiplication
Point/Divisor Addition
Point/Divisor Doubling
Finite Field Inversion
Point/Divisor Addition
Point/Divisor Doubling
Finite Field Inversion
Finite Field Operation E.g. ABC mod P
Finite Field Operation E.g. ABC mod P
Finite Field Operation E.g. ABC mod P
Finite Field Operation E.g. ABC mod P
Finite Field Operation E.g. ABC mod P

18
Superscalar Coprocessor Parallel Processing
Architecture (TYPE IV-based)
19
Superscalar Coprocessor Horizontal Exploration
of System
  • Performance of ECC and HECC

20
Overview
  • Introduction
  • Curve-based Cryptography
  • HW/SW Partitioning
  • Superscalar Coprocessor
  • Results
  • Conclusions

21
ResultsPerformance for ECC over GF(283)
  • Fastest of three
  • x1.8 speed-up by 2-way superscaling (ILPDP6)
    with A(BD)C
  • Still more improvement is possible by adding
    MALUs

ABC
A(BD)C
22
ResultsPerformance of HECC over GF(283)
  • Faster than ECC over a composite field
  • x2.7 speed-up by 4-way superscaling (ILPDP5)
    with A(BD)C
  • Less improvement as increasing of MALU

ABC
A(BD)C
23
ResultsPerformance for ECC over GF((283)2 )
  • Slowest of three
  • x2.5 speed-up by 4-way superscaling (ILPDP6)
    with A(BD)C
  • Less improvement as increasing of MALU

ABC
A(BD)C
24
ResultsComparison of ECC/HECC implementations on
FPGAs
11 T. Wollinger, PhD thesis, 2004. 13 G.
Orlando and C. Paar, CHES 00. 14 N. Gura et
al., CHES02. 29 Nazar A. Saqib et al.,
International Journal of Embedded Systems 2005
25
Conclusions
  • Performance improvement / Comparison
  • ECC was improved by a factor of 1.8 (2-way)
  • HECC (genus 2) was improved by a factor of 2.7
    (4-way)
  • ECC over a composite field was improved by a
    factor of 2.5

  • (4-way)
  • A(BD)C offers better performance than ABC
  • ECC is the fastest in this case study
  • Programmability flexibility
  • Support three different curve-based cryptosystems
    over a binary field
  • Arbitrary irreducible polynomial
  • Field size up to 332 bits by using 4xMALU83

26
Thank you!
27
Parallel issue of instructionsCase of using 4
MALUs
  • IF/D Instruction Fetch Decode
  • R_ Read operands (dependent on the type of
    operation)
  • EX Execution (dependent on MALU
    configuration, k d)
  • W_ Write (dependent on of instructions
    issued in parallel)

28
Parallel issue of instructionsOut-of-order
Execution
  • Check RAW (Read After Write Dependency) for
    in-/out-of-order execution
Write a Comment
User Comments (0)
About PowerShow.com