Chapter 4: Arithmetic for Computers (Part 1) - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 4: Arithmetic for Computers (Part 1)

Description:

... as long as the sign bit is extended in the product register Booth s Algorithm Booth s Algorithm starts with the observation that if we have the ability to ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 75
Provided by: Adm9551
Learn more at: https://www.cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4: Arithmetic for Computers (Part 1)


1
Chapter 4 Arithmetic for Computers(Part 1)
  • CS 447
  • Jason Bakos

2
Notes on Project 1
  • There are two different ways the following two
    words can be stored in a computer memory
  • word1 .byte 0,1,2,3
  • word2 .half 0,1
  • One way is big-endian, where the word is stored
    in memory in its original order
  • word1
  • word2
  • Another way is little-endian, where the word is
    stored in memory in reverse order
  • word1
  • word2
  • Of course, this affects the way in which the lw
    instruction works

00 01 02 03
0000 0001
03 02 01 00
0001 0000
3
Notes on Project 1
  • MIPS uses the endian-style that the architecture
    underneath it uses
  • Intel uses little-endian, so we need to deal with
    that
  • This affects assignment 1 because the input data
    is stored as a series of bytes
  • If you use lws on your data set, the values will
    be loaded into your dest. register in reverse
    order
  • Hint Try the lb/sb instruction
  • This instruction will load/store a byte from an
    unaligned address and perform the translation for
    you

4
Notes on Project 1
  • Hint Use SPIMs breakpoint and single-step
    features to help debug your program
  • Also, make sure you use the registers and
    memory/stack displays
  • Hint You may want to temporarily store your
    input set into a word array for sorting
  • Make sure you check Appendix A for additional
    useful instructions that I didnt cover in class
  • Make sure you comment your code!

5
Goals of Chapter 4
  • Data representation
  • Hardware mechanisms for performing arithmetic on
    data
  • Hardware implications on the instruction set
    design

6
Review of Binary Representation
  • Binary/Hex -gt Decimal conversion
  • Decimal -gt Binary/Hex conversion
  • Least/Most significant bits
  • Highest representable number/maximum number of
    unique representable symbols
  • Twos compliment representation
  • Ones compliment
  • Finding signed number ranges (-2n-1 to 2n-1-1)
  • Doing arithmetic with twos compliment
  • Sign extending with load half/byte
  • Unsigned loads
  • Signed/unsigned comparison

7
Binary Addition/Subtraction
  • Binary subtraction works exactly like addition,
    except the second operand is converted to twos
    compliment
  • Overflow in signed arithmetic occurs under the
    following conditions

Operation Operand A Operand B Result
AB Positive Positive Negative
AB Negative Negative Positive
A-B Positive Negative Negative
A-B Negative Positive Positive
8
What Happens When Overflow Occurs?
  • MIPS detects overflow with an exception/interrupt
  • When an interrupt occurs, a branch occurs to code
    in the kernel at address 80000080 where special
    registers (BadVAddr, Status, Cause, and EPC) are
    used to handle the interrupt
  • SPIM has a simple interrupt handler built-in that
    deals with interrupts
  • We may come back to interrupts later

9
Review of Shift and Logical Operations
  • MIPS has operations for SLL, SRL, and SRA
  • We covered this in the last chapter
  • MIPS implements bit-wise AND, OR, and XOR logical
    operations
  • These operations perform a bit-by-bit parallel
    logical operation on two registers
  • In C, use ltlt and gtgt for arithmetic shifts, and ,
    , , and for bitwise and, or, xor, and NOT,
    respectively

10
Review of Logic Operations
  • The three main parts of a CPU
  • ALU (Arithmetic and Logic Unit)
  • Performs all logical, arithmetic, and shift
    operations
  • CU (Control Unit)
  • Controls the CPU performs load/store, branch,
    and instruction fetch
  • Registers
  • Physical storage locations for data

11
Review of Logic Operations
  • In this chapter, our goal is to learn how the ALU
    is implemented
  • The ALU is entirely constructed using boolean
    functions as hardware building blocks
  • The 3 basic digital logic building blocks can be
    used to construct any digital logic system AND,
    OR, and NOT
  • These functions can be directly implemented using
    electric circuits (wires and transistors)

12
Review of Logic Operations
  • These combinational logic devices can be
    assembled to create a much more complex digital
    logic system

A B A AND B
0 0 0
0 1 0
1 0 0
1 1 1
A B A OR B
0 0 0
0 1 1
1 0 1
1 1 1
A not A
0 1
1 0
13
Review of Logic Operations
  • We need another device to build an ALU
  • This is called a multiplexor it implements an
    if-then-else in hardware

A B D C (out)
0 0 0 0 (a)
0 0 1 0 (b)
0 1 0 0 (a)
0 1 1 1 (b)
1 0 0 1 (a)
1 0 1 0 (b)
1 1 0 1 (a)
1 1 1 1 (b)
14
A 1-bit ALU
  • Perform logic operations in parellel and mux the
    output
  • Next, we want to include addition, so lets build
    a single-bit adder
  • Called a full adder

15
Full Adder
  • From the following table, we can construct the
    circuit for a full adder and link multiple full
    adders together to form a multi-bit adder
  • We can also add this input to our ALU
  • How do we give subtraction ability to our adder?
  • How do we detect overflow and zero results?

Inputs Inputs Inputs Outputs Outputs Comments
A B CarryIn CarryOut Sum Comments
0 0 0 0 0 00000
0 0 1 0 1 00101
0 1 0 0 1 0101
0 1 1 1 0 01110
1 0 0 0 1 10001
1 0 1 1 0 10110
1 1 0 1 0 11010
1 1 1 1 1 11111
16
Chapter 4 Arithmetic for Computers(Part 2)
  • CS 447
  • Jason Bakos

17
Logic/Arithmetic
  • From the truth table for the mux, we can use
    sum-of-products to derive the logic equation
  • With sum-of-products, for each 1 row for each
    output, we AND together all the inputs (inverting
    the input 0s), then OR all the row products
  • To make it simpler, lets add dont cares to
    the table

18
Logic/Arithmetic
A B D C (out)
0 X 0 0 (a)
X 0 1 0 (b)
1 X 0 1 (a)
X 1 1 1 (b)
  • This gives us the following equation
  • (A and (not D)) or (B and D)
  • We dont need the inputs for the dont cares in
    our partial products
  • This is one way to simplify our logic equation
  • Other ways include propositional calculus,
    Karnaugh Maps, and the Quine-McCluskey algorithm

19
Logic/Arithmetic
  • Here is a (crude) digital logic design for the
    2-to-1 mux
  • Note that multiple muxes can be assembled in
    stages to implement multiple-input muxes

20
Logic/Arithmetic
  • For the adder, lets minimize the logic using a
    Karnaugh Map
  • For CarryOut, we need 23 entries
  • We can minimize this to
  • CarryOutABCarryInBCarryInC

AB
CarryIn 00 01 11 10
0 1
1 1 1 1
21
Logic/Arithmetic
  • Theres no way to minimize this equation, so we
    need the full sum of products
  • Sum(not A)(not B)CarryIn ABCarryIn (not
    A)BCarryIn A(not B)CarryIn

AB
CarryIn 00 01 11 10
0 1 1
1 1 1
22
Logic/Arithmetic
  • In order to implement subtraction, we can invert
    the B input to the adder and set CarryIn to be 1
  • This can be implemented with a mux select B or
    not B (call this input Binvert)
  • Now we can build a 1-bit ALU using an AND, OR,
    addition, and subtraction operation
  • We can perform the AND, OR, and ADD in parallel
    and switch the results with a 4-input mux
    (Operation will be our D-input)
  • To make the adder a subtractor, well need to
    have to set Binvert and CarryIn to 1

23
Lecture 4 Arithmetic for Computers(Part 3)
  • CS 447
  • Jason Bakos

24
Chapter 4 Review
  • So far, weve covered the following topics for
    this chapter
  • Binary representation of signed integers
  • 16 to 32 bit signed conversion
  • Binary addition/subtraction
  • Overflow detection/overflow exception handling
  • Shift and logical operations
  • Parts of the CPU
  • AND, OR, XOR, and inverter gates
  • Multiplexor (mux) and full adder
  • Sum-of-products logic equations (truth tables)
  • Logic minimization techniques
  • Dont cares and Karnaugh Maps

25
1-bit ALU Design
  • A 1-bit ALU can be constructed
  • Components
  • AND, OR, and adder
  • 4-to-1 mux
  • Binverter (inverter and 2-to-1 mux)
  • Interface
  • Inputs A, B, Binvert, Operation (2 bits),
    CarryIn, and Less
  • Outputs CarryOut and Result
  • Digital functions are performed in parallel and
    the outputs are routed into a mux
  • The mux will also accept a Less input which well
    accept from outside the 1-bit ALU
  • The select lines of the mux make up the
    operation input to the ALU

26
32-bit ALU
  • In order to create a multi-bit ALU, array 32
    1-bit ALUs
  • Connect the CarryOut of each bit to the CarryIn
    of the next bit
  • A and B of each 1-bit ALU will be connected to
    each successive bit of the 32-bit A and B
  • The Result outputs of each 1-bit ALU will form
    the 32-bit result
  • We need to add an SLT unit and connect the output
    to the least significant 1-bit ALUs Less input
  • Hardwire the other Less inputs to 0
  • We need to add an Overflow unit
  • We need to add a Zero detection unit

27
SLT Unit
  • To compute SLT, we need to make sure that when
    the 1-bit ALUs Operation is set to 11, a
    subtract operation is also being computed
  • With this happening, the SLT unit can compute
    Less based on the MSB (sign) of A, B, and Result

Asign Bsign Rsign Less
0 0 0 0
0 0 1 1
0 1 X 0
1 0 X 1
1 1 0 1
1 1 1 0
28
Overflow Unit
  • When doing signed arithmetic, we need to follow
    this table, as we covered previously
  • How do we implement this in hardware?

Operation Operand A Operand B Result
AB Positive Positive Negative
AB Negative Negative Positive
A-B Positive Negative Negative
A-B Negative Positive Positive
29
Overflow Unit
  • We need a truth table
  • Since well be computing the logic equation with
    SOP, we only need the rows where the output is 1

Operation A(31) B(31) R(31) Overflow
010 (add) 0 0 1 1
010 (add) 1 1 0 1
110 (sub) 0 1 1 1
110 (sub) 1 0 0 1
30
Zero Detection Unit
  • Or together all the 1-bit ALU outputs the
    result is the Zero output to the ALU

31
32-bit ALU Operation
  • We need a 3-bit ALU Operation input into our
    32-bit ALU
  • The two least significant bits can be routed into
    all the 1-bit ALUs internally
  • The most significant bit can be routed into the
    least significant 1-bit ALUs CarryIn, and to
    Binvert of all the 1-bit ALUs

32
32-bit ALU Operation
  • Heres the final ALU Operation table

ALU Operation Function
000 and
001 or
010 add
110 subtract
111 set on less than
33
32-bit ALU
  • In the end, our ALU will have the following
    interface
  • Inputs
  • A and B (32 bits each)
  • ALU Operation (3 bits)
  • Outputs
  • CarryOut (1 bit)
  • Zero (1 bit)
  • Result (32 bits)
  • Overflow (1 bit)

34
Carry Lookahead
  • The adder architecture we previously looked at
    requires n2 gate delays to compute its result
    (worst case)
  • The longest path that a digital signal must
    propagate through is called the critical path
  • This is WAAAYYYY too slow!
  • There other ways to build an adder that require
    lg n delay
  • Obviously, using SOP, we can build a circuit that
    will compute ANY function in 2 gate delays (2
    levels of logic)
  • Obviously, in the case of a 64-input system, the
    resulting design will be too big and too complex

35
Carry Lookahead
  • For example, we can easily see that the CarryIn
    for bit 1 is computed as
  • c1(a0b0)(a0c0)(b0c0)
  • c2(a1b1)(a1c1)(b1c1)
  • Hardware executes in parallel, so using the
    following fast CarryIn computation, we can
    perform an add with 3 gate delays
  • c2(a1b1)(a1a0b0)(a1a0c0)(a1b0c0)(b1a0b0)(b1a
    0c0)(b1b0c0)
  • I used the logical distributive law to compute
    this
  • As you can see, the CarryIn logic gets bigger and
    bigger for consecutive bits

36
Carry Lookahead
  • Carry Lookahead adders are faster than
    ripple-carry adders
  • Recall
  • ci1(aibi)(aici)(bici)
  • ci can be factored out
  • ci1(aibi)(aibi)ci
  • So
  • c2(a1b1)(a1b1)((a0b0)(a0b0)c0)

37
Carry Lookahead
  • Note the repeated appearance of (aibi) and
    (aibi)
  • They are called generate (gi) and propagate (pi)
  • giaibi, piaibi
  • ci1gipici
  • This means if gi1, a CarryOut is generated
  • If pi1, a CarryOut is propagated from CarryIn

38
Carry Lookahead
  • c1g0(p0c0)
  • c2g1(p1g0)(p1p0c0)
  • c3g2(p2g1)(p2p1g0)(p2p1p0c0)
  • c4g3(p3g2)(p3p2g1)(p3p2p1g0)(p3p2p1p0c0)
  • This system will give us an adder with 5 gate
    delays but it is still too complex

39
Carry Lookahead
  • To solve this, well build our adder using 4-bit
    adders with carry lookahead, and connect them
    using super-propagate and generate logic
  • The superpropagate is only true if all the bits
    propagate a carry
  • P0p0p1p2p3
  • P1p4p5p6p7
  • P2p8p9p10p11
  • P3p12p13p14p15

40
Carry Lookahead
  • The supergenerate follows a similar equation
  • G0g3(p3g2)(p2p2g1)(p3p2p1g0)
  • G1g7(p7g6)(p7p6g5)(p7p6p5g4)
  • G2g11(p11g10)(p11p10g9)(p11p10p9g8)
  • G3g15(p15g14)(p15p14g13)(p15p14p13g12)
  • The supergenerate and superpropagate logic for
    the 4-4 bit Carry Lookahead adders is contained
    in a Carry Lookahead Unit
  • This yields a worst-case delay of 7 gate delays
  • Reason?

41
Carry Lookahead
  • Weve covered all ALU functions except for the
    shifter
  • Well talk after the shifter later

42
Lecture 4 Arithmetic for Computers(Part 4)
  • CS 447
  • Jason Bakos

43
Binary Multiplication
  • In multiplication, the first operand is called
    the multiplicand, and the second is called the
    multiplier
  • The result is called the product
  • Not counting the sign bits, if we multiply an
    n-bit multiplicand with a m-bit multiplier, well
    get a nm-bit product

44
Binary Multiplication
  • Binary multiplication works exactly like decimal
    multiplication
  • In fact, multiply 100101 by 111001 and pretend
    youre using decimal numbers

45
First Hardware Design for Multiplier
Note that the multiplier is not routed into the
ALU
46
Second Hardware Design for Multiplier
  • Architects realized that at the least, half of
    the bits in the multiplicand register were 0
  • Reduce ALU to 32 bits, shift the product right
    instead of shifting the multiplicand left
  • In this case, the product is only 32 bits

47
Second Hardware Design for Multiplier
48
Final Hardware Design for Multiplier
  • Lets combine the product register with the
    multiplier register
  • Put the multiplier in the right half of the
    product register and initialize the left half
    with zeros when were done, the product will be
    in the right half

49
Final Hardware Design for Multiplier
50
Final Hardware Design for Multiplier
  • For the first two designs, we need to convert the
    multiplicand and the multiplier must be converted
    to positive
  • The signs would need to be remembered so the
    product can be converted to whatever sign it
    needs to be
  • The third design will deal with signed numbers,
    as long as the sign bit is extended in the
    product register

51
Booths Algorithm
  • Booths Algorithm starts with the observation
    that if we have the ability to both add and
    subtract, there are multiple ways to compute a
    product
  • For every 0 in the multiplier, we shift the
    multiplicand
  • For every 1 in the multiplier, we add the
    multiplicand to the product, then shift the
    multiplicand

52
Booths Algorithm
  • Instead, when a 1 is seen in the multiplier,
    subtract instead of add
  • Shift for all 1s after this, until the first 0
    is seen, then add
  • The method was developed because in Booths era,
    shifters were faster than adders

53
Booths Algorithm
  • Example
  • 0010 2
  • x 0110 6
  • 0000 0 shift
  • 0010 -2 (21) subtract (first 1)
  • 0000 0 shift (second 1)
  • 0010 2 (23) (first 0)
  • -4162612

54
Lecture 4 Arithmetic for Computers(Part 5)
  • CS 447
  • Jason Bakos

55
Binary Division
  • Like last lecture, well start with some basic
    terminology
  • Again, lets assume our numbers are base 10, but
    lets only use 0s and 1s

56
Binary Division
  • Recall
  • DividendQuotientDivisor Remainder
  • Lets assume that both the dividend and divisor
    are positive and hence the quotient and the
    remainder are nonnegative
  • The division operands and both results are 32-bit
    values and we will ignore the sign for now

57
First Hardware Design for Divider
Initialize the Quotient register to 0, initialize
the left-half of the Divisor register with the
divisor, and initialize the Remainder register
with the dividend (right-aligned)
58
Second Hardware Design for Divider
Much like with the multiplier, the divisor and
ALU can be reduced to 32-bits if we shift the
remainder right instead of shifting the divisor
to the left
Also, the algorithm must be changed so the
remainder is shifted left before the subtraction
takes place
59
Third Hardware Design for Divider
Shift the bits of the quotient into the remainder
register Also, the last step of the algorithm
is to shift the left half of the remainder right
1 bit
60
Signed Division
  • Simplest solution remember the signs of the
    divisor and the dividend and then negate the
    quotient if the signs disagree
  • The dividend and the remainder must have the same
    signs

61
Considerations
  • The same hardware can be used for both multiply
    and divide
  • Requirement 64-bit register that can shift left
    or right and a 32-bit ALU that can add or subtract

62
Floating Point
  • Floating point (also called real) numbers are
    used to represent values that are fractional or
    that are too big to fit in a 32-bit integer
  • Floating point numbers are expressed in
    scientific notation (base 2) and are normalized
    (no leading 0s)
  • 1.xxxx2 2yyyy
  • In this case, xxxx is the significand and yyyy is
    the exponent

63
Floating Point
  • In MIPS, a floating point is represented in the
    following manner (IEEE 754 standard)
  • bit 31 sign of significand
  • bit 30..23 (8) exponent (2s comp)
  • bit 22..0 (23) significand
  • Note that size of exponent and significand must
    be traded off... accuracy vs. range
  • This allows us representation for signed numbers
    as small as 2x10-38 to 2x1038
  • Overflow and underflow must be detected
  • Double-precision floating point numbers are 2
    words... the significand is extended to 52 bits
    and the exponent to 11 bits
  • Also, the first bit of the significand is
    implicit (only the fractional part is specified)
  • In order to represent 0 in a float, put 0 in the
    exponent field
  • So heres the equation we use (-1)S x
    (1Significand) x 2E
  • Or (-1)S X (1 (s1x2-1) (s2x2-2) (s3x2-3)
    (s4x2-4) ...) x 2E

64
Considerations
  • IEEE 754 sought to make floating-point numbers
    easier to sort
  • sign is first bit
  • exponent comes first
  • But we want an all-0 (1) exponent to represent
    the most-negative exponent and an all-1 exponent
    to be the most positive
  • This is called biased-notation, so well use the
    following equation
  • (-1)S x (1 Significand) x 2(Exponent-Bias)
  • Bias is 127 for single-precision and 1023 for
    double-precision

65
Lecture 4 Arithmetic for Computers(Part 6)
  • CS 447
  • Jason Bakos

66
Converting Decimal Floating Point to Binary
  • Use the method I showed last lecture...
  • Significand
  • Use the iterative method to convert the
    fractional part to binary
  • Convert the integer part to binary using the
    old-fashioned method
  • Shift the decimal point to the left until the
    number is normalized
  • Drop the leading 1, and set the exponent to be
    the number of positions you shifted the decimal
    point
  • Adjust the exponent for bias (127/1023)

67
Floating Point Addition
  • Lets add two decimal floating point numbers...
  • Lets try 9.999 x 101 1.610 x 10-1
  • Assume we can only store 4 digits of the
    significand and two digits of the exponent

68
Floating Point Addition
  • Match exponents for both operands by
    un-normalizing one of them
  • Match to the exponent of the larger number
  • Add significands
  • Normalize result
  • Round significand

69
Binary Floating Point Addition
70
Floating Point Multiplication
  • Example 1.110 x 1010 X 9.200 x 10-5
  • Assume 4 digits for significand and 2 digits for
    exponent
  • Calculate the exponent of the product by simply
    adding the exponents of the operand
  • 10(-5)5
  • Bias the exponents
  • 137122259
  • Somethings wrong! We added the biases with the
    exponents...
  • 5127132

71
Floating Point Multiplication
  • Multiply the significands...
  • 1.110 x 9.20010.212000
  • Normalize and add add 1 to exponent
  • 1.0212 x 106
  • Round significand to four digits
  • 1.021
  • Set sign based on signs of operands
  • 1.021 x 106

72
Floating Point Multiplication
73
Accurate Arithmetic
  • Integers can represent every value between the
    largest and smallest possible values
  • This is not the case with floating point
  • Only 253 unique values can be represented with
    double precision fp
  • IEEE 754 always keeps 2 extra bits on the right
    of the significand during intermediate
    calculation called guard and round to minimize
    rounding errors

74
Accurate Arithmetic
  • Since the worst case for rounding would be when
    the actual number is halfway between two floating
    point representations, accuracy is measured as
    number of least-significant error bits
  • This is called units in the last place (ulp)
  • IEEE 754 guarantees that the computer is within
    .5 ulp (using guard and round)
Write a Comment
User Comments (0)
About PowerShow.com