Csci 136 Computer Architecture II - PowerPoint PPT Presentation

About This Presentation
Title:

Csci 136 Computer Architecture II

Description:

B is good when branch taking probability is high. ... Branch decision is made at MEM stage: instructions in IF, ID, EX stages need to be discarded. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 36
Provided by: xiuzhe
Category:

less

Transcript and Presenter's Notes

Title: Csci 136 Computer Architecture II


1
Csci 136 Computer Architecture II Branch
Hazards, Exceptions
  • Xiuzhen Cheng
  • cheng_at_gwu.edu

2
Announcement
  • Homework assignment 10, Due time Before class,
    April 12
  • Readings Sections 6.4 6.5
  • Problems 6.17-6.19, 6.21-6.22, 6.33-6.36,
    6.39-6.40 (six of them will be graded. Your TA
    will give hints in the lab sections.)
  • Project 3 is due on April 10, 2005
  • Quiz 4 April 12, 2005
  • Final Thursday, May 12, 1240AM-240PM
  • Note you must pass final to pass this course!

3
The Big Picture Where are We Now?
  • The Five Classic Components of a Computer
  • Current Topics
  • Control/Branch Hazard
  • Exceptions

Processor
Input
Control
Memory
Datapath
Output
4
Review on Data Hazards, Forwarding, Stall
  • When does a data hazard happen?
  • Data dependencies
  • Using forwarding to overcome data hazards
  • Data is available after ALU stage
  • Forwarding conditions
  • Stall the pipeline for load-use instructions
  • Data is available after MEM stage (lw
    instruction)
  • Hazard detection conditions
  • Why in ID stage?

5
Review on Data Hazards
6
Review on Data Hazards, Forwarding, Stall
PC4
Sign-extend
7
LW and SW
  • lw 5, 0(15)sw 5, 100(15)
  • lw 5, 0(15)beq 5, 0, Exitsw 5, 100(15)
  • lw 5, 0(15)add 8, 8, 8sw 5, 100(15)

8
SW is in MEM Stage
sw
lw
Sign-Ext
  • lw 5, 0(15)sw 5, 100(15)

EX/MEM
  • MEM/WB.RegWrite and EX/MEM.MemWrite and
  • MEM/WB.RegisterRd EX/MEM.RegisterRd and
  • MEM/WB.RegisterRD ! 0

Data memory
9
SW is In EX Stage
sw
lw
Sign-Ext
  • ID/EX.MemWrite and MEM/WB.RegWrite and
  • MEM/WB.RegisterRd ID/EX.RegisterRt and
  • MEM/WB.RegisterRd ! 0

10
More Cases
  • lw 15, 0(8) load-use,sw 5, 100(15)
    stall pipeline
  • R-Type followed by sw?
  • The result from R-Type will be saved into memory
  • R-Type will overwrite base register for sw

11
An Example
  • 40 lw 2, 20(1)
  • 44 and 4, 2, 5
  • 48 or 8, 2, 4
  • Clock Cycle 1
  • Clock Cycle 2
  • Clock Cycle 3
  • Clock Cycle 4

12
Clock 1
Lw 2, 20(1)
44
PC4
Sign-extend
Clock 1
13
Clock 2
Lw 2, 20(1)
And 4, 2, 5
11
010
0001
44
48
PC4
1
20
Sign-extend
1
2
2
Clock 2
14
Clock 3
And 4, 2, 5
Or 8, 2, 4
Lw 2, 20(1)
10
11
000
010
1100
44
52
PC4
1
2
5
20
Sign-extend
2
1
5
2
5
2
4
Clock 3
15
Clock 4
And 4, 2, 5
Or 8, 2, 4
Bubble
Lw 2, 20(1)
10
00
000
000
11
1100
44
52
PC4
2
5
Sign-extend
2
5
5
4
Clock 4
16
Clock 5
And 4, 2, 5
Or 8, 2, 4
Bubble
Lw 2, 20(1)
10
10
000
000
00
11
1100
44
PC4
2
2
4
5
Sign-extend
2
2
4
5
4
5
2
4
8
4
Clock 5
17
Branch Hazards
Control hazard attempt to make a decision before
condition is evaluated
18
Branch Hazards
19
Observations
  • Branch decision does not occur until MEM stage 3
    CCs are wasted. Current design, non-optimized
  • Is it possible to reduce branch delay?
  • YES
  • In EXE stage?
  • Two CCs branch delay
  • In ID Stage?
  • One CC branch delay
  • How? for beq x, y, label, x xor y then or
    all bits, much faster than ALU operation. Also we
    have a separate ALU to compute branch address.
  • 3 strategies
  • Delayed branch Static branch prediction Dynamic
    branch Prediction

20
Delayed Branch
  • Will always execute the instruction following the
    branch.
  • Only one will be executed
  • Done by compiler or assembler
  • 50 successful rate
  • Losing popularity
  • Why?
  • More pipeline stages
  • Superscalar

21
Scheduling the Branch Delay Slot
Independent instruction, best choice
B is good when branch taking probability is high.
It must be OK to execute the sub instruction
when the branch goes to the unexpected direction
22
Static Branch Prediction
  • Assume the branch will not be taken If
    prediction is wrong, clear the effect of
    sequential instruction execution.
  • How to discard instructions in the pipeline?
  • Branch decision is made at MEM stage
    instructions in IF, ID, EX stages need to be
    discarded.
  • Branch decision is made at ID stage only flush
    IF/ID pipeline register!

23
Static Branch Prediction
24
Static Branch Prediction
IF.Flush
25
Pipelined Branch An Example
36
40
44
28
44
72
4
8
10
IF.Flush
26
Pipelined Branch An Example
72
27
Dynamic Branch Prediction
  • Static branch prediction is crude!
  • Take history into consideration
  • If a branch was taken last time, then fetching
    the new instruction from the same place
  • Branch prediction buffer indexed by the lower
    bits of the branch instruction
  • This memory contains a bit (or bits) which tells
    whether the branch was recently taken or not
  • Is the prediction correct? Any bad effect?
  • 1-bit prediction scheme
  • 2-bit prediction scheme

28
Observation
  • Since we move branch prediction to the ID stage,
    we need to copy forwarding control related
    hardware to the ID stage too!
  • Beq following lw
  • Hazard detection unit should work.

29
In-Class Exercise
  • Consider a loop branch that branches nine times
    in a row, then is not taken once. What is the
    prediction accuracy for this branch, assuming the
    prediction bit for this branch remains in the
    prediction buffer?
  • 1-bit prediction?
  • With 2-bit prediction?

30
Performance Comparision
  • Compare the performance of single-cycle,
    multi-cycle and pipelined datapath
  • 200ps for memory access, 100ps for ALU operation,
    50ps for register file access
  • 25 loads, 10 stores, 11 branches, 2 jumps,
    52 ALU ops
  • For piplelined datapath,
  • 50 of load are immediately followed an
    instruction that uses the result
  • Branch delay on misprediction is 1 clock cycle
    and 25 branches are mispredicted
  • Jump delay is 1 clock cycle

31
Exceptions
  • Exceptions events other than branch or jump that
    change the normal flow of instruction
  • Arithmetic overflow, undefined instruction, etc
  • Internal of the processor
  • Interrupts from external IO interrupts
  • Use arithmetic overflow as an example
  • When an overflow is detected, we need to transfer
    control to the exception handling routine at
    location 0x 8000 0180 immediately because we do
    not want this invalid value to contaminate other
    registers or memory locations
  • Similar idea as branch hazard
  • Detected in the EX stage
  • De-assert all control signals in EX and ID
    stages, flush IF/ID

32
Exceptions
80000180
33
Example
  • sub 11, 2, 4
  • and 12, 2, 5
  • or 13, 2, 6
  • add 1, 2, 1 -- overflow occurs
  • slt 15, 6, 7
  • lw 16, 50(7)
  • Exceptions handling routine
  • 0x 8000 0180 sw 25, 1000(0)
  • 0x 8000 0184 sw 26, 1004(0)

34
Example
80000180
Clock 6
35
Example
80000180
Clock 7
36
Questions?
Write a Comment
User Comments (0)
About PowerShow.com