Lucas-Lehmer Primality Tester - PowerPoint PPT Presentation

About This Presentation
Title:

Lucas-Lehmer Primality Tester

Description:

Nathan Stohs W4-1. Brian Johnson W4-2. Joe Hurley W4-3. Marques Johnson W4-4 ... Algorithmic Description (Joe) Data Flow/Block Diagram (Joe) Design Process (Nathan) ... – PowerPoint PPT presentation

Number of Views:213
Avg rating:3.0/5.0
Slides: 45
Provided by: marques9
Category:

less

Transcript and Presenter's Notes

Title: Lucas-Lehmer Primality Tester


1
Lucas-Lehmer Primality Tester
  • Team W-4
  • Nathan Stohs W4-1
  • Brian Johnson W4-2
  • Joe Hurley W4-3
  • Marques Johnson W4-4
  • Design Manager Prateek Goenka

2
Agenda
  • Background (Marques)
  • Project Description (Marques)
  • Algorithmic Description (Joe)
  • Data Flow/Block Diagram (Joe)
  • Design Process (Nathan)
  • Simulations (Nathan)
  • Floorplan/Layout (Brian)
  • Conclusions (Brian)

3
History of 2P-1
  • 16th century it was believed 2P-1 was prime for
    all prime Ps
  • 1536 Hudalricus Regius proved 211-1 was not prime
  • French monk Marin Mersenne published Cogitata
    Physica-Mathematica where he stated 2P-1 was
    prime for P 2, 3, 5, 7, 13, 17, 19, 31, 67, 127
    and 257 

4
Lucas-Lehmer
  • François Edouard Anatole Lucas
  • 1876 proved that the number 2127 - 1 is prime
    using his own methods
  • Derrick Lehmer
  • 1930 he refined Lucass method

5
Make History
  • December 2005
  • 43rd Known Mersenne Prime Found!!
  • Dr. Curtis Cooper and Dr. Steven Boone
  • Professors at Central Missouri State University
  • 230,402,457-1

6
Prime Number Competitions
  • Electronic Frontier Foundation
  • 50,000 to the first individual or group who
    discoversa prime number with at least 1,000,000
    decimal digits (awarded Apr. 6, 2000)
  • 100,000 to the first individual or group who
    discoversa prime number with at least 10,000,000
    decimal digits
  • 150,000 to the first individual or group who
    discoversa prime number with at least
    100,000,000 decimal digits
  • 250,000 to the first individual or group who
    discoversa prime number with at least
    1,000,000,000 decimal digits

7
rank prime digits who when reference
1 230402457-1 9152052 G9 2005 Mersenne 43
2 225964951-1 7816230 G8 2005 Mersenne 42
3 224036583-1 7235733 G7 2004 Mersenne 41
4 220996011-1 6320430 G6 2003 Mersenne 40
5 213466917-1 4053946 G5 2001 Mersenne 39
6 27653.291674331 2759677 SB8 2005
7 28433.278304571 2357207 SB7 2004
8 26972593-1 2098960 G4 1999 Mersenne 38
9 5359.250545021 1521561 SB6 2003
10 4847.233210631 999744 SB9 2005
8
Mersenne Prime Algorithm
  • Only used for numbers that are in the form 2P-1
  • For P gt 2
  • 2P-1 is prime if and only if Sp-2 is zero in this
    sequence
  • S0 4
  • SN (SN-12 - 2) mod (2P-1)

9
Example to Show 27 - 1 is Prime
  • 27 1 127
  • S0 4
  • S1 (4 4 - 2) mod 127 14
  • S2 (14 14 - 2) mod 127 67
  • S3 (67 67 - 2) mod 127 42
  • S4 (42 42 - 2) mod 127 111
  • S5 (111 111 - 2) mod 127 0

10
Algorithmic description
We knew the necessary computations, but how to
translate that to gates?
  • Computations needed
  • Squaring (not a problem)
  • Add/Subtract (not a problem)
  • -Modulo (2n 1) multiplication (?)

11
Mechanisms behind the math
  • If done with brute force, modulo 2n-1 could have
    been ugly.
  • Would need to square and find the remainder via
    division.
  • Luckily, for that specific computation, math is
    on our side, the 2n-1 constraint saves us from
    division, as will be seen.
  • A quick search on www.ieee.org produced
    inspiration.
  • Reto Zimmermann. Efficient VLSI Implementation
    of Modulo (2n - 1) Addition and Multiplication.
    Computer Arithmetic, 1999 p158-167.

12
Useful Math Multiplication
Just like any other multiplication, a modulo
multiplication can be computed by (modulo)
summing the partial products. So modulo
multiplication is multiplication using a modulo
adder.
From the Zimmerman paper
13
Block Diagram
Mod Calc
16
16
Next Partial Product
P
FSM
2
start
Counter
Mod add
4
2
1
done
Register
16
2
Subtract 2
S2 (14 14) mod 127 - 2 67
2
Register
1
1
Count
Compare
Out
14
Design Process
  • The Process So far
  • Found Mathematical Means (core algorithm)
  • Found Computational Means (modulo multiplier,
    adder)
  • From the above, a high level C program was
    written in a manner that would easily translate
    to verilog and gates, or at least more standard
    operations

int mod_square_minus(int value, int p, int
offset) int acc, i int mod (1 ltlt p) -
1 for(accoffset, i0 ilt(sizeof(int)8-1)
i) int a (value gtgt i) 1 int
temp if (a) if (i-p gt 0) temp
value ltlt (i-p) else temp value gtgt
(p-i) acc acc temp ((value ltlt i)
((1 ltlt p) - 1)) if (acc gt mod)
acc acc - mod return acc
This easily translated into behavorial verilog,
and readily turned into a gate-level
implementation. Essentially it was written in a
more low-level manner.
15
Design Process
The rest of the design can simply be thought of
as a wrapper for the modulo multiplier. The
following slides contain Verilog code that was
directly taken from the C code below.
module mod_mult(out, itrCount, x, y, mod, p,
reset, en, clk) input 150 x, y, mod, p
output 150 out input reset, en,
clk wire 150 pp, ma0, temp output
30 itrCount counter mycount(itrCount,
reset, en, clk) partial_product ppg(pp, x, y,
itrCount, mod, p) mod_add modAdder(out, pp,
temp, mod) dff_16_lp partial(clk, out, temp,
reset, en) endmodule
Top level of multiplier
16
module partial_product(out, x, y, i, mod, p)
output 150 out input 150 x, y, mod, p
input 30 i wire 150 diff1, diff2,
added, result, corrected, final wire 150
high, low, shifted, toadd wire cout1,
cout2, ithbith, toobig sub_16
difference1(diff1, cout1, 12'b0, i, p)
sub_16 difference2(diff2, cout2, p, 12'b0, i)
shift_left shiftL(high, y, diff130)
shift_right shiftR(low, y, diff230) mux16
choose(high, low, shifted, cout1) shift_left
shiftL2(toadd, y, i) and16 bigand(added,
toadd, mod) fulladder_16 addhighlow(.out(resul
t), .xin(added), .yin(shifted), .cin(1'b0),
.cout(nowhere)) sub_16 correct(.out(corrected)
, .cout(toobig), .xin(mod), .yin(result))
mux16 correctionMux(.out(final),
.high(corrected), .low(result), .sel(toobig))
shift_right ibit(15'b0, ithbit, x, i)
select16 checkfor0(.out(out), .x(result),
.sel(ithbit)) endmodule
Partial Product Unit w/ modulo reduction
17
Modulo Adder
module mod_add(out, x, y, mod) input 150 x,
y, mod output 150 out wire
cout, isDouble, cin wire 150 plus,
lowbits, done, mod_bar, check fulladder_16
add(.out(plus), .xin(x), .yin(y), .cin(cin),
.cout()) invert_16 inverter(mod_bar, mod)
and16 hihnbits(check, plus, mod_bar) and16
lownbits(done, plus, mod) or8 (cin, check0,
check1, check2, check3, check4, check5,
check6, check7, check8, check9,
check10, check11, check12, check13,
check14, check15) compare_16
checkfordouble(isDouble, done, 16'b1111_1111_1111_
1111) mux16 fixdouble(.out(out), .high(16'b0),
.low(done), .sel(isDouble)) endmodule
18
Final Design Process Notes
  • Lessons learned Never tweak the schematics
    without retesting the verilog first. Timing
    issues can be subtle. Verilog is better for
    catching them and quickly fixing/retesting than
    schematics.
  • Considering total time spent during this phase,
    roughly half was on the core and the FSM, the
    rest on the wrapper.

19
Road to verification C
2 Examples of the high-level C implementations T
yrion/Desktop/15525 nstohs ./prime4 7 round 1
(4 4 - 2) mod 127 14 round 2 (14 14 - 2)
mod 127 67 round 3 (67 67 - 2) mod 127
42 round 4 (42 42 - 2) mod 127 111 round 5
(111 111 - 2) mod 127 0 27-1 is
prime Tyrion/Desktop/15525 nstohs ./prime4
11 round 1 (4 4 - 2) mod 2047 14 round 2
(14 14 - 2) mod 2047 194 round 3 (194 194
- 2) mod 2047 788 round 4 (788 788 - 2) mod
2047 701 round 5 (701 701 - 2) mod 2047
119 round 6 (119 119 - 2) mod 2047
1877 round 7 (1877 1877 - 2) mod 2047
240 round 8 (240 240 - 2) mod 2047 282 round
9 (282 282 - 2) mod 2047 1736 211-1 is not
prime
20
Road to verification Verilog
Tests were either specific tests on important
units such as Partial_Product
Samples of Verilog Verification output Partial
Product Unit p 7 380 ppOut 56, x 14, y
14, i 2, mod 127, p 7 400 ppOut 112, x
14, y 14, i 3, mod 127, p 7 420
ppOut 0, x 14, y 14, i 4, mod 127,
p 7 440 ppOut 0, x 14, y 14, i 5,
mod 127, p 7
Top Level p 7 itrOut x itrOut 4 itrOut
14 itrOut 67 itrOut 42 itrOut
111 itrOut 0
Top Level p 11 itrOut x itrOut
4 itrOut 14 itrOut 194 itrOut 788 itrOut
701 itrOut 119 itrOut 1877
or top level tests. Note that these are the same
results generated from the C code
21
Road to verification Schematic I
Schematic Test of our modular adder. 128 68
Mod 127 69
22
Road to verification Schematic II
Plot of the top level output after a single
iteration, p7 Output after a single iteration is
14, the expected value.
23
Road to verification Schematic III
4 14
67 42
111
24
Road to verification Intermission
Disk Space required for a full-length schematic
test of p7 6 GB Time required for a
full-length schematic test of p7 5 hours Disk
Space required for a full-length extractedRC test
of p7 20 GB Time required for a full-length
extractedRC test of p7 8 hours Simulations
become lengthy due to tests needing to be deep
to be useful.
25
Layout ExtractedRC Full Run
4 14
67 42
111
26
Timing
To determine the bounds of our clock, Pathmill
was used once major portions of the schematic was
complete. The critical path through our design is
one loop through the modular multiplier, which
runs through the modular adder and partial
products module. The pathmill delay of the
modular adder was 9ns, and 5.2 ns through the
partial products module. This already puts our
total delay at 14.2 ns, putting our schematic
delay at 70 MHz. For extractedRC, due in part to
simulation issues, a conservative 50 MHz was
chosen as the final clock.
27
Issues
  • extractedRC of partial_product module
  • Registers switch
  • Custom design to DFFs with muxes
  • Switching from parallel calculations to series
  • Transistor count vs. clock cycles
  • Syncing up design between people
  • Transferring files
  • Different design styles
  • LONG simulation times
  • Floorplanning
  • Too much emphasis on aspect ratios and not enough
    on wiring
  • Couldnt decide on one set floorplan

28
Floorplan v1.0
29
Floorplan v2.0
30
Final Floorplan
31
Pin Specifications
Pin Type of Pins
Vdd! In/Out 1
Gnd! In/Out 1
plt015gt In 16
clk In 1
start In 1
Done Out 1
out Out 1
Total - 22
32
Initial Module Specifications
Module Transistor Count Area (µm²) Transistor Density
FSM 300 900 .33
mod_p 2,440 7,000 .35
mod_add 1,282 9,000 .14
partial_product 8,676 65,000 .13
count 1,656 6,000 .27
sub_16 704 3,500 .20
Registers 1,848 6,000 .30
compare 36 300 .12
Total 16,942 97,700 .17
33
Final Module Specifications
Module Transistor Count Area (µm²) Transistor Density
FSM 152 1,200 .13
mod_p 1,280 8,603 .15
mod_add 1,168 5,603 .21
partial_product 7,520 54,680 .14
count 1,424 8,701 .16
sub_16 576 2,934 .20
Registers 896 6,028 .15
compare 56 201 .28
Total 13,702 86,621 .16
Aspect Ratio
2.45
0.79
2.40
1.16
6.88
4.49
4.76
4.41
1.01
34
Chip Specifications
  • Transistor Count 13,702
  • Size 296.51µm x 292.13µm
  • Area 86,621µm²
  • Aspect Ratio 1.011
  • Density 0.16 transistors/µm²

35
Final Floorplan
36
Final Floorplan
37
Partial Product
adder
Sub_16
shift_right
shift_left
mux
shift_right
shift_left
Select16
16-bit and
38
Poly Layer
Density 7.14
39
Active Layer
Density 8.76
40
Metal1 Layer
Density 23.86
41
Metal2 Layer
Density 19.97
42
Metal3 Layer
Density 11.30
43
Metal4 Layer
Density 10.34
44
Conclusions
  • Plan for buffers
  • -Will be hard to put them in after the fact
  • Your design will change dramatically from start
    to finish so be flexible
  • Communication is key
  • Do layout in parallel
Write a Comment
User Comments (0)
About PowerShow.com