Title: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)
1Presentation 9 MAD MAC 525
Farhan Mohamed Ali (W2-1)Jigar Vora
(W2-2)Sonali Kapoor (W2-3) Avni Jhunjhunwala
(W2-4)
W2
Design Manager Zack Menegakis
29th March, 2006 Functional Block Simulations
Project Objective Design a crucial part of a GPU
called the Multiply Accumulate Unit (MAC) which
will revolutionize graphics.
2MAD MAC 525 Status
- Project chosen
- Specifications defined
- Architecture
- Design
- Behavioral Verilog
- Testbenches
- Verilog Gate Level Design
- Floor plan
- Schematics and Analog Verifications
- Layout of basic gates and small modules
- Spring Break ?
- Top level layouts, extractions, LVS, simulations
(in progress) - To be done
- Full chip layout and simulation
3Block Diagram
Input
Input
Input
16
16
16
5
RegArray A
RegArray B
RegArray C
10
10
10
5
5
Multiplier
Exp Calc
Align
1
5
14
22
35
Control Logic Sign Dtrmin
Leading 0 Anticipator
Adder/Subtractor
36
4
Normalize
14
5
1
Round
Reg Y
10
5
Output
16
15
1
1
Ovf Checker
4Design Decisions
- Removed carry select top adder bits
- Reduced hardware at the cost of speed
- Speed still well within required parameters
- Easier to layout
5Pipelining Stages
Reg C
Multiplier
Reg A
Exp Calc
Reg B
Pipeline Reg
Pipeline Reg
Pipeline Reg
Align C
Pipeline Reg
Pipeline Reg
Adder
Ld Zero
Pipeline Reg
Round
Normalize
Overflow checker
Reg Y
6Timing Diagram
Pipeline stage 1 Pipeline stage 2 Pipeline stage 3 Pipeline stage 4 Pipeline stage 5
Multiplier lower 7 outputs Multiplier mid 4 outputs Multiplier top 11 outputs Adder Normalize
Exponent calculator Align Invert Adder inputs Zero Counter Round
Holds exponent calculator Holds exponent calculator Holds exponent calculator Overflow Checker
7New Floorplan
8Newer Floorplan
9Adder Schematic
10Adder Bit Slice Layout
11Adder Layout
12Adder Schematic Simulation
13Adder Layout Simulation
14Adder Schematic vs Layout
- Layout is 19 slower than schematic
- Layout 1150ps
- Schematic 962ps
- Other logic in adder module will slow it down
further - Expecting about 1.6-1.8ns total
- Well within 2ns target
15Transistor Count Area in um2 Prop. Delay Power in mW (350MHz)
Multiplier 3600 16560 4.64n 8.5
Exponents 738 3800 942p 1.608
Align 500 2990 637p 0.393
Adder 3174 15870 1.7n 5.236
Leading 0 364 1222 551p 0.857
Normalize 942 520 434p 2.291
Round 462 2310 948p 0.631
OvfCheck 100 500 475p 0.13
Registers 1850 9200 120p -
Total 11730 55628 - -
16Normalize Layout
17Normalize Layout Simulation
18Problems
- Cadence refused to extractRC some of our modules
- Turns out that Cadence discriminates against
certain output pins for a reason we cannot yet
determine - Solution was to copy output pins from modules
that work when running extractRC and rename them - Certain group members not happy with group
picture - Solution is to take a new picture, iron out the
wrinkles photoshop our project manager in
19Questions??