Title: Using CarrySave Adders
1Using Carry-Save Adders
- For Radix- 4, Can Be Used to Generate 3a No
Booths - Slight Delay Penalty from CSA 3 Gates
2Upper Half P in Stored Carry
- For Radix-2, Better Use in Keeping Cumulative
Product - in Redundant Form for First k -1 Cycles
- Then Use a CPA in the Last Cycle
3CSA With Booth Recoding
- Better Usage when Combined with Booths Recoding
- Reduces Cycles by 50
- Each Cycle Faster Due to CSA
- Sign of ?a, ?2a Incorporated Directly in
Recoder/Selector Instead of Add/Subtract Signal
Generation
4CSA Combined with Booth Recoding
5Booth Recoder/Selector
- Circuitry Shown on Following Slide
- Negative Multiples a, -2a in 2s Complement
- a, 2a Aligned at Right with Position i
- Must be Padded with i Zeros to Right
- Bitwise Complement (when a, -2a Needed) Converts
zeros to ones Followed by LSb add of 1 Converts
Back to zeros - Causes a Carry-in of 1 into Position i
- Can Ignore Positions 0 through i -1 (in neg.
multiples) Insert carry-in directly (dot)
6Booth Recoder Selector Circuit
7Radix-4 with CSA No Booth
8Radices gt 4
- Radix-8 (3 bits at a time-k/3 multiples) Requires
3-Level CSA Tree - Might as Well Use Radix-16 (4 bits at a time)
- Still 3-level tree with one more CSA
- MUXes Can Be Replaced with Booth Recoder/Selector
Circuits in Higher Radix Multipliers - Can Continue to Increase Radix (256-8bits)
Leading to Wider Trees - Tradeoff is Speed Versus Area
9Radix-16 Multiplication
10Classification of Multipliers
11Twin-Beat Mult. with Radix-8 Booth Recoding
12Full Tree Multipliers
- All k PPs Produced Simultaneously
- Input to k-input Multioperand Tree
- Multiples of a (Binary, High-Radix or Recoded)
Formed at Top of Tree - Multiple-Forming Circuits
- AND Gates (binary multiplier)
- radix-4 Booth (recoded multiplier)
- Tree Results in Product in Redundant Form(2
Values Carry-Store for Example) - Final Product Formed With Converter(Fast CPA for
Exmaple)
13General Parallel Multiplier
14Tree Type Multiplier Classification
- Distinguished by Design of
- Partial Product Forming Circuits (i.e., Booth,
Hi-Rad, etc.) - Reduction Tree Type
- Redundant-to-Binary Converter
- If Redundant Result in Carry-Save Form, Converter
is Just a CPA - Could Use Other Redundant Adders Such as Signed
Binary (42 Compressors) - High Radix Multipliers Lead to Fewer Values to
Accumulate - Sequential Design Fewer Cycles
- Parallel Design Smaller Tree
- Tradeoff Tree Complexity Versus Multiple Forming
Circuit
15Wallace and Dadda Tree Multipliers
- Wallace Combine Partial Products as Soon as
Possible - Dadda Maintain Critical Path Length (Tree
Depth) but Combine as Late as Possible - Wallace Fastest Possible Design Since Typically
Smaller CPA at End - Dadda Simpler Tree but Wider CPA at End
164 ? 4 Example
- 16 AND Gates Used to Form xiaj Terms (dots)
?
1 2 3 4 3 2 1
17Wallace Example
1 2 3 4 3 2 1
18Dadda Examples
1 2 3 4 3 2 1
1 2 3 4 3 2 1
19Trees in Numeric Representation
- Many Times Hybrid Approach Used to Find Smallest
Width CPA
- MS Thesis Topic Optimize Tree With Different
Counter Types
20Implementation Issues
- Logarithmic Depth Tree Irregular Structure
- Design/Layout Difficult
- Various Length Signal Propagation Paths
- Hazards and Signal Skew
- Need Iterated Recursive Structures
- Automatic Synthesis and Layout
- Motivates Search for Alternative Reduction Tree
Structures
21Other Tree Architectures
- Can Compose from Larger Counters, e.g. (72)
- Use 0 Inputs for Some
- Or Prune the Tree for Some
- Use slices Example is (112) Next Slide
- Can be Laid Out to Occupy Narrow Vertical Slice
and Replicated - All Carries Produced in Level i Enter Level i1
- Balanced Delay Tree Results
- 3 Columns 1, 3, 5 FAs
- Can Expand from 11 to 18 Append Col. of 7
22(112) Tree Slice
23Other Tree Blocks
- Converter Stage is Fast CPA
- Can Also Use SBD
- With SBD the Converter Stage is a Fast Subtractor
24Array Multipliers
- Can Eliminate Top CSA With 0 Input
- Can Replace 0 With y to Compute axy
25Array Multipliers
- Tree is One-Sided
- Longest Delay is 4 CSA Plus k-bit CPA
- Slower than Wallace/Dadda Tree
- Regular Structure
- short wires in horiz., vert., diag. positions
- simple, efficient layout
- easily pipelined (latches after each CSA row)
26Methods for Reducing Array Size
27Reducing Array Size (cont.)
285 by 5 Array Multiplier (unsgnd)
29Signed Array Multiplier
- Array with 2s Complement
- Alternative is Pezaris Array with Different Cell
Types - Need Array of AND Gates for Multiple Generation
- Critical Path is Main Diagonal then Ripple Thru
CPA - Can skip h Cells Along Main Diag
- lower right cell now has 4 inputs
- move to extra input in second cell in diag.
- less regular layout now but faster
305 by 5 Array Multiplier (signed)
315 by 5 Array Multiplier
- AND Gates Embedded inside FA Blocks
32Pipelined Partial Tree Multiplier
33Pipelined Array Multiplier