Title: Welcome to the ECE 449 Computer Design Lab
1ECE 448 Lecture 14
Multipliers
2Required reading
- S. Brown and Z. Vranesic, Fundamentals of
Digital Logic with VHDL Design - Chapter 10.2.3, Shift-and-Add Multiplier
- Chapter 10.2.5, Arithmetic Mean
- Chapter 10.2.6, Sort Operation
3Shift-and-Add Multiplier
4An algorithm for multiplication
5Expected behavior of the multiplier
6Datapath for the multiplier
7ASM chart for the multiplier
8ASM chart for the multiplier control circuit
9VHDL code of multiplier circuit (1)
- LIBRARY ieee
- USE ieee.std_logic_1164.all
- USE ieee.std_logic_unsigned.all
- USE work.components.all
- ENTITY multiply IS
- GENERIC ( N INTEGER 8 NN INTEGER 16 )
- PORT ( Clock IN STD_LOGIC
- Resetn IN STD_LOGIC
- LA, LB, s IN STD_LOGIC
- DataA IN STD_LOGIC_VECTOR(N1 DOWNTO 0)
- DataB IN STD_LOGIC_VECTOR(N1 DOWNTO 0)
- P BUFFER STD_LOGIC_VECTOR(N1 DOWNTO
0) - Done OUT STD_LOGIC )
- END multiply
10VHDL code of multiplier circuit (2)
- ARCHITECTURE Behavior OF multiply IS
- TYPE State_type IS ( S1, S2, S3 )
- SIGNAL y State_type
- SIGNAL Psel, z, EA, EB, EP, Zero STD_LOGIC
- SIGNAL B, N_Zeros STD_LOGIC_VECTOR(N1 DOWNTO
0) - SIGNAL A, Ain, DataP, Sum STD_LOGIC_VECTOR(NN
1 DOWNTO 0) - BEGIN
- FSM_transitions PROCESS ( Resetn, Clock )
- BEGIN
- IF Resetn '0 THEN
- y lt S1
- ELSIF (Clock'EVENT AND Clock '1') THEN
- CASE y IS
- WHEN S1 gt
- IF s '0' THEN y lt S1 ELSE y lt S2 END
IF - WHEN S2 gt
- IF z '0' THEN y lt S2 ELSE y lt S3 END
IF - WHEN S3 gt
- IF s '1' THEN y lt S3 ELSE y lt S1 END
IF
11VHDL code of multiplier circuit (3)
- FSM_outputs PROCESS ( y, s, B(0) )
- BEGIN
- EP lt '0' EA lt '0' EB lt '0' Done lt '0'
Psel lt '0' - CASE y IS
- WHEN S1 gt
- EP lt '1
- WHEN S2 gt
- EA lt '1' EB lt '1' Psel lt '1
- IF B(0) '1' THEN
- EP lt '1'
- ELSE
- EP lt '0'
- END IF
- WHEN S3 gt
- Done lt '1
- END CASE
- END PROCESS
12Datapath for the multiplier
13VHDL code of multiplier circuit (4)
- - - Define the datapath circuit
- Zero lt '0'
- N_Zeros lt (OTHERS gt '0' )
- Ain lt N_Zeros DataA
- ShiftA shiftlne GENERIC MAP ( N gt NN )
- PORT MAP ( Ain, LA, EA, Zero, Clock, A )
- ShiftB shiftrne GENERIC MAP ( N gt N )
- PORT MAP ( DataB, LB, EB, Zero, Clock, B )
- z lt '1' WHEN B N_Zeros ELSE '0'
- Sum lt A P
- - - Define the 2n 2-to-1 multiplexers for
DataP - GenMUX FOR i IN 0 TO NN1 GENERATE
- Muxi mux2to1 PORT MAP ( Zero, Sum(i), Psel,
DataP(i) ) - END GENERATE
- RegP regne GENERIC MAP ( N gt NN )
- PORT MAP ( DataP, Resetn, EP, Clock, P )
- END Behavior
14Array Multiplier
15Notation
a Multiplicand ak-1ak-2 . . . a1 a0 x
Multiplier xk-1xk-2 . . . x1 x0 p
Product (a ? x) p2k-1p2k-2 . . . p2 p1 p0
16Unsigned Multiplication
a4 a3 a2 a1 a0
x4 x3 x2 x1 x0
x
ax0 20
a4x0 a3x0 a2x0 a1x0 a0x0
ax1 21
a4x1 a3x1 a2x1 a1x1 a0x1
ax2 22
a4x2 a3x2 a2x2 a1x2 a0x2
ax3 23
a4x3 a3x3 a2x3 a1x3 a0x3
a4x4 a3x4 a2x4 a1x4 a0x4
ax4 24
p0
p1
p9
p2
p3
p4
p5
p6
p7
p8
175 x 5 Array Multiplier
18Array Multiplier - Basic Cell
cin
x
FA
y
cout
s
19Array Multiplier Modified Basic Cell
am
ci
si-1
xn
FA
ci1
si
205 x 5 Array Multiplier with modified cells
21Pipelined 5 x 5 Multiplier
22Array Multiplier Modified Basic Cell
am
ci
si-1
xn
FA
ci1
si
Flip-flops
23Timing parameters
units
definition
time from point?point
ns
delay
rising edge ?rising edge of clock
ns
clock period T
1
MHz
clock frequency
clock period
ns
latency
time from input?output
throughput
Mbits/s
output bits/time unit
24Latency
top-level entity
8 bits
8 bits
CombinationalLogic
CombinationalLogic
input
output
clk
clk
clk
100 MHz
clk
input(1)
input(2)
input(0)
input
(unknown)
output(0)
output(1)
output
- Latency is the time between input(n) and
output(n) - i.e. time it takes from first input to first
output, second input to second output, etc. - Latency is usually constant for a system (but not
always) - Also called input-to-output latency
- Count the number of rising edges of the clock!
- In this example, 3 rising edges from input to
output ? latency is 3 cycles - Latency is measured in clock cycles (then
translated to seconds) - In this example, say clock period is 10 ns, then
latency is 30 ns
25Throughput
top-level entity
8 bits
8 bits
CombinationalLogic
CombinationalLogic
input
output
clk
clk
clk
clk
input(1)
input(2)
input(0)
input
(unknown)
output(0)
output(1)
output
1 cycle betweeenoutput samples
- Throughput (bits per output sample) / (time
between consecutive output samples) - Bits per output sample
- In this example, 8 bits per output sample
- Time between consecutive output samples clock
cycles between output(n) to output(n1) - Can be measured in clock cycles, then translated
to time - In this example, time between consecutive output
samples 1 clock cycle 10 ns - Throughput (8 bits per output sample) / (10 ns)
0.8 bits / ns 800 Mbits/s
26PipeliningConceptual
CombinationalLogic
clk
clk
tLOGIC 10 ns
- Assuming tCLK2Q tS 0 ns, the critical path is
10 ns, and the maximum clock frequency is 100 MHz - Latency 2 cycles
27PipeliningConceptual
CombinationalLogic
register splits logic in half
CombinationalLogic A
CombinationalLogic A
clk
clk
clk
tLOGICB 5 ns
tLOGICA 5 ns
- Purpose of pipelining is to reduce the critical
path of the circuit by inserting an additional
register (called a pipeline register) - This splits the combinational logic in half
- Now critical path delay is 5 ns, so maximum clock
frequency is 200 MHz - Double the clock frequency
- However, latency increases to 3 cycles (and area
is increased due to additional register) - In general, pipelining increases throughput at
the cost of increased latency and area/power