1 / 37

ECE 366 -- Computer ArchitectureLecture Notes 11

-- Multiply, Shift, DivideShantanu DuttUniv.

of Illinois at ChicagoExcerpted

fromComputer Architecture and Engineering

Lecture 6 VHDL, Multiply, Shift

- September 12, 1997
- Dave Patterson (http.cs.berkeley.edu/patterson)
- lecture slides http//www-inst.eecs.berkeley.edu/

cs152/

MULTIPLY (unsigned)

- Paper and pencil example (unsigned)
- Multiplicand 1000 Multiplier

1001 1000 0000 0000

1000 Product 01001000 - m bits x n bits mn bit product
- Binary makes it easy
- 0 gt place 0 ( 0 x multiplicand)
- 1 gt place a copy ( 1 x multiplicand)
- 4 versions of multiply hardware algorithm
- successive refinement

Unsigned Combinational Multiplier

- Stage i accumulates A 2 i if Bi 1
- Q How much hardware for 32 bit multiplier?

Critical path?

How does it work?

0

0

0

0

0

0

0

B0

B1

B2

B3

P0

P1

P2

P3

P4

P5

P6

P7

- at each stage shift A left ( x 2)
- use next bit of B to determine whether to add in

shifted multiplicand - accumulate 2n bit partial product at each stage

Unisigned shift-add multiplier (version 1)

- 64-bit Multiplicand reg, 64-bit ALU, 64-bit

Product reg, 32-bit multiplier reg

Shift Left

Multiplicand

64 bits

Multiplier

Shift Right

64-bit ALU

32 bits

Write

Product

Control

64 bits

Multiplier datapath control

Multiply Algorithm Version 1

Start

Multiplier0 1

Multiplier0 0

1a. Add multiplicand to product place

the result in Product register

- Product Multiplier Multiplicand 0000 0000

0011 0000 0010 - 0000 0010 0001 0000 0100
- 0000 0110 0000 0000 1000
- 0000 0110

2. Shift the Multiplicand register left 1 bit.

3. Shift the Multiplier register right 1 bit.

32nd repetition?

No lt 32 repetitions

Yes 32 repetitions

Done

Observations on Multiply Version 1

- 1 clock per cycle gt 100 clocks per multiply
- Ratio of multiply to add 51 to 1001
- 1/2 bits in multiplicand always 0gt 64-bit adder

is wasted - 0s inserted in left of multiplicand as

shiftedgt least significant bits of product

never changed once formed - Instead of shifting multiplicand to left, shift

product to right?

MULTIPLY HARDWARE Version 2

- 32-bit Multiplicand reg, 32 -bit ALU, 64-bit

Product reg, 32-bit Multiplier reg

Multiplicand

32 bits

Multiplier

Shift Right

32-bit ALU

32 bits

Shift Right

Product

Control

Write

64 bits

Multiply Algorithm Version 2

Start

- Multiplier Multiplicand Product0011 0010 0000

0000

Multiplier0 1

Multiplier0 0

- Product Multiplier Multiplicand 0000 0000

0011 0010

2. Shift the Product register right 1 bit.

3. Shift the Multiplier register right 1 bit.

32nd repetition?

No lt 32 repetitions

Yes 32 repetitions

Done

Whats going on?

0

0

0

0

B0

B1

B2

B3

P0

P1

P2

P3

P4

P5

P6

P7

- Multiplicand stays still and product moves right

Break

- 5-minute Break/ Do it yourself Multiply
- Multiplier Multiplicand Product0011 0010

0000 0000

Multiply Algorithm Version 2

Start

Multiplier0 1

Multiplier0 0

- Product Multiplier Multiplicand 0000 0000

0011 0010 - 0010 0000
- 0001 0000 0001 0010
- 0011 00 0001 0010
- 0001 1000 0000 0010
- 0000 1100 0000 0010
- 0000 0110 0000 0010

2. Shift the Product register right 1 bit.

3. Shift the Multiplier register right 1 bit.

32nd repetition?

No lt 32 repetitions

Yes 32 repetitions

Done

Observations on Multiply Version 2

- Product register wastes space that exactly

matches size of multipliergt combine Multiplier

register and Product register

MULTIPLY HARDWARE Version 3

- 32-bit Multiplicand reg, 32 -bit ALU, 64-bit

Product reg, (0-bit Multiplier reg)

Multiplicand

32 bits

32-bit ALU

Shift Right

Product

(Multiplier)

Control

Write

64 bits

Multiply Algorithm Version 3

Start

- Multiplicand Product0010 0000 0011

Product0 1

Product0 0

2. Shift the Product register right 1 bit.

32nd repetition?

No lt 32 repetitions

Yes 32 repetitions

Done

Observations on Multiply Version 3

- 2 steps per bit because Multiplier Product

combined - MIPS registers Hi and Lo are left and right half

of Product - Gives us MIPS instruction MultU
- How can you make it faster?
- What about signed multiplication?
- easiest solution is to make both positive

remember whether tocomplement product when done

(leave out the sign bit, run for 31 steps) - apply definition of 2s complement
- need to sign-extend partial products and subtract

at the end - Booths Algorithm is elegant way to multiply

signed numbers using same hardware as before and

save cycles - can handle multiple bits at a time

Motivation for Booths Algorithm

- Example 2 x 6 0010 x 0110

0010 x 0110 0000 shift (0 in

multiplier) 0010 add (1 in multiplier)

0100 add (1 in multiplier) 0000 shift

(0 in multiplier) 00001100 - ALU with add or subtract gets same result in more

than one way 6 2 8 0110 00010

01000 11110 01000 - For example
- 0010 x 0110 0000

shift (0 in multiplier) 0010 sub (first 1

in multpl.) . 0000 shift (mid

string of 1s) . 0010 add (prior step

had last 1) 00001100

Booths Algorithm

- Current Bit Bit to the Right Explanation Example O

p - 1 0 Begins run of 1s 0001111000 sub
- 1 1 Middle of run of 1s 0001111000 none
- 0 1 End of run of 1s 0001111000 add
- 0 0 Middle of run of 0s 0001111000 none
- Originally for Speed (when shift was faster than

add) - Replace a string of 1s in multiplier with an

initial subtract when we first see a one and then

later add for the bit after the last one

Booths Example (2 x 7)

Operation Multiplicand Product next? 0. initial

value 0010 0000 0111 0 10 -gt sub

- 1a. P P - m 1110

1110 1110 0111 0 shift P (sign ext) - 1b. 0010 1111 0011 1 11 -gt nop, shift
- 2. 0010 1111 1001 1 11 -gt nop, shift
- 3. 0010 1111 1100 1 01 -gt add
- 4a. 0010 0010
- 0001 1100 1 shift
- 4b. 0010 0000 1110 0 done

Booths Example (2 x -3)

Operation Multiplicand Product next? 0. initial

value 0010 0000 1101 0 10 -gt sub

- 1a. P P - m 1110

1110 1110 1101 0 shift P (sign ext) - 1b. 0010 1111 0110 1 01 -gt add

0010 - 2a. 0001 0110 1 shift P
- 2b. 0010 0000 1011 0 10 -gt sub

1110 - 3a. 0010 1110 1011 0 shift
- 3b. 0010 1111 0101 1 11 -gt nop
- 4a 1111 0101 1 shift
- 4b. 0010 1111 1010 1 done

MIPS logical instructions

- Instruction Example Meaning Comment
- and and 1,2,3 1 2 3 3 reg.

operands Logical AND - or or 1,2,3 1 2 3 3 reg. operands

Logical OR - xor xor 1,2,3 1 2 ??3 3 reg. operands

Logical XOR - nor nor 1,2,3 1 (2 3) 3 reg.

operands Logical NOR - and immediate andi 1,2,10 1 2

10 Logical AND reg, constant - or immediate ori 1,2,10 1 2 10 Logical

OR reg, constant - xor immediate xori 1, 2,10 1 2

10 Logical XOR reg, constant - shift left logical sll 1,2,10 1 2 ltlt

10 Shift left by constant - shift right logical srl 1,2,10 1 2 gtgt

10 Shift right by constant - shift right arithm. sra 1,2,10 1 2 gtgt

10 Shift right (sign extend) - shift left logical sllv 1,2,3 1 2 ltlt 3

Shift left by variable - shift right logical srlv 1,2, 3 1 2 gtgt

3 Shift right by variable - shift right arithm. srav 1,2, 3 1 2 gtgt 3

Shift right arith. by variable

Shifters

Two kinds logical-- value shifted in is

always "0" arithmetic-- on right

shifts, sign extend

msb

lsb

"0"

"0"

msb

lsb

"0"

Note these are single bit shifts. A given

instruction might request 0 to 32 bits to

be shifted!

Combinational Shifter from MUXes

B

A

Basic Building Block

sel

D

8-bit right shifter

- What comes in the MSBs?
- How many levels for 32-bit shifter?
- What if we use 4-1 Muxes ?

General Shift Right Scheme using 16 bit example

S 0 (0,1)

S 1 (0, 2)

S 2 (0, 4)

S 3 (0, 8)

If added Right-to-left connections could support

Rotate (not in MIPS but found in ISAs)

Funnel Shifter

Instead Extract 32 bits of 64.

X

Y

Shift Right

- Shift A by i bits (sa shift right amount)
- Logical Y 0, XA, sai
- Arithmetic? Y _, X_, sa_
- Rotate? Y _, X_, sa_
- Left shifts? Y _, X_, sa_

R

Y

X

Shift Right

R

Barrel Shifter

Technology-dependent solutions transistor per

switch

SR0

SR1

SR2

SR3

D3

D2

A6

D1

A5

D0

A4

A3

A2

A1

A0

Divide Paper Pencil

- 1001 Quotient
- Divisor 1000 1001010 Dividend 1000

10 101 1010 1000 10

Remainder (or Modulo result) - See how big a number can be subtracted, creating

quotient bit on each step - Binary gt 1 divisor or 0 divisor
- Dividend Quotient x Divisor Remaindergt

Dividend Quotient Divisor - 3 versions of divide, successive refinement

DIVIDE HARDWARE Version 1

- 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder

reg, 32-bit Quotient reg

Shift Right

Divisor

64 bits

Quotient

Shift Left

64-bit ALU

32 bits

Write

Remainder

Control

64 bits

Divide Algorithm Version 1

- Takes n1 steps for n-bit Quotient Rem.
- Remainder Quotient Divisor0000 0111

0000 0010 0000

Remainder lt 0

Test Remainder

Remainder