396ps 32bit HanCarlson ALU in 180nm TSMC process - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

396ps 32bit HanCarlson ALU in 180nm TSMC process

Description:

Self-Resetting to minimize the clock period. 10/7/09. 45. Reference ... Implementation of a Self-Resetting CMOS 64-Bit Parallel Adder with Enhanced ... – PowerPoint PPT presentation

Number of Views:353
Avg rating:3.0/5.0
Slides: 47
Provided by: eceW
Category:

less

Transcript and Presenter's Notes

Title: 396ps 32bit HanCarlson ALU in 180nm TSMC process


1
396-ps 32-bit Han-Carlson ALU in 180nm TSMC
process
  • Liang-Kai Wang

VLSI CAD Lab University of Wisconsin, Madison
2
Outline
  • Review of Adders
  • The Idea of Han-Carlson Adder
  • The Implementation of Han-Carlson Adder
  • Simulation Result
  • Discussion
  • Comparison between Lings and H-C Adder
  • Future work
  • Reference

3
Review of Adders
  • Carry Ripple Adder

4
Review of Adders(cont.)
  • Carry Skip Adder

5
Review of Adders(cont.)
  • Carry-Select Adder

6
Review of Adders(cont.)
  • Carry-Save Adder

7
Review of Adders(cont.)
  • Carry Lookahead Adder

8
Review of Adders(cont.)
  • Ling Adder

Observation
Back
9
Review of Adders(cont.)
  • Hybrid (Parallel) Prefix Adder
  • Brent-Kung Adder
  • Kogge-Stone
  • Han-Carlson Adder

10
Review of Adders(cont.)
  • Brent-Kung Adder
  • Cost C(k)C(k/2)k-12k-2-log2k ( of adder
    cells)
  • Time 2log2k 2 (in terms of adder levels)

11
Review of Adders(cont.)
  • Kogge-Stone Adder
  • Cost klog2k-(k-1)
  • Time log2k

12
The idea of Han-Carlson Adder
  • Han-Carlson Adder
  • B-K adder small area, but slow
  • K-S adder large area, but fast
  • Speed 2log2n-2?log2n (1/2 reduction)
  • Cost 2k-2-log2k?klog2k-k1 (log2k/2 increase)
  • The area-time tradeoff results in Han-Carlson
    Adder

13
The idea of Han-Carlson Adder (cont.)
  • Han-Carlson Adder
  • Cost O(k/2log2k)
  • Time O(log2k1)

14
Review of Adders(cont.)
  • Optimized Brent-Kung Adder
  • Cost C(k)C(k/2)k-12k-2-log2k
  • Time log2k (in terms of adder levels)

15
The idea of Han-Carlson Adder (cont.)
16
The idea of Han-Carlson Adder (cont.)
  • Produce Generate, Propagate, and Partial
  • Sum bit in the first stage.
  • Single-rail circuit with double-rail in the
  • last stage to perform XOR function.
  • SumPartial_Sum XOR CarryIn
  • Improved Domino circuit with odd stage in
    Dynamic and even stage in Static.

17
The implementation of Han-Carlson Adder
  • Schematics Design by Composer, Simulation by
    Spectre. Both of them are in the Cadence design
    kits
  • The simulation result is from Schematic
    (pre-layout)
  • The best speed is achieved by using the fast mode
    in the technology file instead of tuning the Bulk
    voltage
  • Clock is generated by ring oscillator with five
    inverters in the loop.
  • Cadence tutorial for both of them and about how
    to setup the environment are provided here.

18
The implementation of Han-Carlson Adder(cont.)
  • Clock generation
  • Ring Oscillator five inverters followed by lots
    of buffers

19
The implementation of Han-Carlson Adder(cont.)
  • Clock distribution

20
The implementation of Han-Carlson Adder(cont.)
  • The whole view

Single Rail Circuit
Foot-transistor added
Double Rail inside
21
The implementation of Han-Carlson Adder(cont.)
  • ALU PG/Partial Sum Circuit.

Back
22
The implementation of Han-Carlson Adder (cont.)
  • Dynamic and Static Carry Merge Stage

i0, 2,30
Even Stage
i1, 3, 31, or the carry at that bit is already
got.
Odd Stage
23
The implementation of Han-Carlson Adder (cont.)
  • Dynamic and Static Carry Merge Stage (cont.)
  • Carry-In of LSB should be merged in order to do
    subtraction.
  • The generate and propagate bit MSB are passed to
    the last stage to produce the carry_out of the
    ALU. (for the check bit)

24
The implementation of Han-Carlson Adder (cont.)
  • Even/Odd-bits CSG Sum Generation

Complementary signal generator (CSG) logic
25
The implementation of Han-Carlson Adder (cont.)
  • Even/Odd-bits CSG Sum Generation
  • Use a latch to increase noise tolerance

Carry_bar
Carry
26
Simulation Result
  • Try the worst case pattern to test this design
  • A0, B-2, Carry-In1 is the worst case delay.
  • Why? Because from the structure of the circuit,
    the worst case is 3N-2P-2N-2P-2N-2P-3N (For
    Propagate bit)

27
Simulation Result (cont.)
  • 0th stage Carry-In1
  • 1st stage g0, p0, Psum0 (P/G/Psum, 3N)
  • 2nd stage g 1, p 1 (Static, 2P)
  • 3rd stage g0, p0 (Dynamic, 2N)
  • 4th stage g 1, p 1 (Static, 2P)
  • 5th Stage g0, p0 (Dynamic, 2N)
  • 6th stage g 1, p 1 (static, 2P)
  • 7th stage Cin310, (Dynamic, 3N)
  • The result should be 2 Correct 1

28
Simulation Result (cont.)
29
Simulation Result (cont.)
  • The result window

30
Simulation Result (cont.)
  • Test if the error flag is correct.
  • 1st Test pattern A-231 B-1. The answer is
    231-1 (1b031b1), which is the wrong answer.
    And the correct bit should be equal to 0. (test
    the lower bound)
  • Also check the clock period is about 396.23ps

31
Simulation Result (cont.)
32
Simulation Result (cont.)
  • 2nd Test pattern A231-1 B2. The answer is
    -2311 (1b1 30b 01b1, wrong answer), the
    correct bit should be equal to 0. (test the upper
    bound)

33
Simulation Result (cont.)
34
Discussion P/G/Psum Block
P circuit G circuit Psum circuit
Psum A xor B
Mine
35
Discussion (cont.)
  • What might be the problem?
  • Longer path to the ground
  • When pre-charge, both of the propagate and
    generate bit are 1
  • What we need to consider? If p0, g0, this
    circuit may have a good performance.
  • However, what if g goes from 1 to 0, but p1?

36
Discussion (Cont.)
37
Discussion (cont.)
  • If the longest path is cut, then

Mine
38
Discussion (Cont.)
  • Mine

39
Comparison between H-C adder and Ling Adder
  • Ling Adder
  • For n-bit Ling adder combining r groups
  • critical path
  • logrn-1 levels
  • r?1 reduction result in logrn levels,
  • -1 is because of the using of CLA expression
    rather than Lings expression for the last group.
    Therefore, additional stage is saved.
  • The worst case delay will remain the second path
    from the last block
  • For each block, there are r1 transistors
    serially connected.
  • Use carry-select block for the generation of Sum
    bit. Only additional 2 gate delays needed.

40
Comparison between H-C adder and Ling Adder(cont.)
Lookahead Network
  • Td(logrn-1)(r1)2
  • E.g. r3, n32, Td14

Group Generation
CLA expression
Carry-Select structure (MUX)
41
Comparison between H-C adder and Ling Adder(cont.)
  • H-C Adder
  • P, G generation 3
  • Carry Merge in each stage (including dynamic and
    static) 2
  • CSG Sum 5
  • Td2log2n3(P, G generation)5 (CSG Sum)
  • E.g. n32, Td18

42
Comparison between H-C adder and Ling Adder(cont.)
  • What is the pros and cons?
  • Ling Adder
  • Advantage shorter worse case path ? might be
    faster theoretically.
  • Disadvantage.
  • not regular layout ?Area waste
  • Lots of complex gates imply the charge sharing
    problem.
  • Lots of input for a stage contribute to the long
    path of wire ? delay problem for high frequency
  • Carry-Select logic makes the area bigger.

43
Comparison between H-C adder and Ling Adder(cont.)
  • Han-Carlson Adder
  • Disadvantage. Longer path to the output
  • Advantage.
  • Regular layout for each stage
  • Fewer of inputs for each path imply the
    resolution of interconnection
  • Simpler gates means few charge sharing problem

44
Future Work
  • Power Reduction by inserting sleep transistors
  • Speed improvement by inserting discharge
    transistors in the intermediate stack nodes of
    the dynamic stages during precharge phase.
  • Area Reduction in layout
  • SOI model test
  • Self-Resetting to minimize the clock period

45
Reference
  • A 6.5GHz 130nm Single-Ended Dynamic ALU and
    Instruction Scheduler Loop, ISSCC 2002
  • Sub-500-ps 64-b ALUs in 0.18-um SOI/Bulk CMOS
    Design and Scaling Trends, JSSC, Nov, 2001
  • Fast Area-Efficient VLSI Adders, Proc. 8th Symp.
    Computer Arithmetic, Sept. 1987

46
Reference (cont.)
  • Computer Arithmetic, Algorithms and Hardware
    Design. Behrooz Parhami, Oxford University Press.
  • Advanced Computer Arithmetic Design. Michael J.
    Flynn, et al. John Wiley Sons, INC.
  • 5 GHz 32b Integer-Execution Core in 130nm Dual-Vt
    CMOS, ISSCC 2002
  • Implementation of a Self-Resetting CMOS 64-Bit
    Parallel Adder with Enhanced Testability, JSSC
    Aug. 1999
Write a Comment
User Comments (0)
About PowerShow.com