Lecture 13 Instruction Execution Pipeline - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Lecture 13 Instruction Execution Pipeline

Description:

Title: Lecture 12 Instruction Execution Pipeline Author: Last modified by: jwcho Created Date: 1/23/2001 8:23:30 AM Document presentation format – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 32
Provided by: 6649912
Category:

less

Transcript and Presenter's Notes

Title: Lecture 13 Instruction Execution Pipeline


1
Lecture 13Instruction Execution Pipeline
2
Lecture 13 Instruction Execution Pipeline
  • In this lecture, we will study
  • Principle of pipeline
  • Characteristics of pipeline
  • Number of pipeline stages and the performance
  • Delays of pipeline stages and the performance
  • Instruction execution steps in RISC-S
  • 5-stage instruction execution pipeline for RISC-S
  • Ideal pipeline
  • Hazards
  • Improving RISC-S pipeline for hazards

3
Car Wash Station
  • Car wash stations
  • 1 S(Spray water)
  • 2 W(Wash with detergent and brush)
  • 3 R(Rinse)
  • 4 B(Blow dry)
  • Each stage takes 1 minute(identical delay)

1st car S W R B


2nd car
S W R B
3rd car

S W R B . . .


. . .
  • To improve the profit
  • Improve the speed of the wash stations -
    expensive solution
  • Improve the throughput - Parallel wash stations
    - expensive solution
  • Improve the effective wash time - Pipeline - a
    less expensive solution

4
Pipeline Principle
Ordinary car wash station
1 car/4 min
Parallel car wash station
1st car S W R B
2nd car S W R B
3rd car S W R B
4th car S W R B
5th car
S W R B . . .

. . .
4 cars/4 mins
Pipeline car wash station
1st car S W R B
2nd car S W R
B
3rd car S W
R B
1 car/1 min
4th car
S W R B
5th car
S W R B
. . .
. . .
5
Pipeline Terminology
  • Pipeline Stage
  • Pipeline consists of a finite number of Pipeline
    Stages
  • Pipeline Cycle
  • Delay of a pipeline stage is called Pipeline
    Cycle
  • Delays of the pipeline stages are not necessarily
    identical in practice
  • Control is complicated
  • Pipeline cycle can be made equal to the longest
    pipeline stage delay by sacrificing
    performance(pipeline cycle time)
  • Pipeline Latency
  • Time from beginning of a task to the completion
    of the task
  • Ideal Pipeline
  • Delays of the Pipeline Stages are identical -
    Pipeline Cycle
  • All the pipeline stages are occupied with tasks
    to be executed
  • Simple to control and provides the best
    performance
  • 1 instruction/cycle

6
Pipeline Characteristics
I0 I1 I2 I3 I4 I5 I6 I7
. . . In-1 In I0
I1 I2 I3 I4 I5 I6 . . .
In-2 In-1 In I0 I1
I2 I3 I4 I5 . . .
In-3 In-2 In-1 In I0
I1 I2 I3 I4 . . .
In-4 In-3 In-2 In-1 In
  • Assuming that there are plenty of
    tasks(instructions) to be executed
  • All of the pipeline stages are busy most of time
  • Pipeline Filling
  • At the initial phase of the execution, pipeline
    stages are not fully occupied with tasks
  • For an n-stage pipeline, first (n-1) pipeline
    cycles are filling time
  • Pipeline Draining
  • At the final phase of the execution, pipeline
    stages are not fully occupied with tasks
  • For an n-stage pipeline, last (n-1) pipeline
    stages are draining time

7
Number of Pipeline Stage
  • Comparison of car wash stations with
    4-stage(S,W,R,B) and 2-stage(SW,RB) pipeline,
    identical pipeline latency(4 minutes)
  • 4-stage pipeline with 1 minute pipeline cycle
  • 2-stage pipeline with 2 minute pipeline cycle

The more pipeline stages, the better performance
8
Delay of Pipeline Stages
  • Comparison of 4-stage car wash stations with
    different pipeline stage delays
  • Identically 1 minute delay

Identical pipeline stage delay shows better
performance
  • S(0.5 min) - W(1.5 min) - R(0.5 min) - B(1.5
    min) pipeline

9
Instruction Execution Steps
10
Instruction Execution PipelineRISC-S
  • A 5-stage pipeline
  • IF-DR-A-M-SR pipeline
  • For the instruction execution pipeline,
    information have to be passed to the succeeding
    pipeline stage
  • Need Inter-stage buffers made of latches
  • I/D buffer, D/A buffer, A/M buffer, M/S buffer

11
IF Stage
Instruction Fetch and update PC stage
12
DR Stage
  • Instruction decoding and register read stage
  • OP lt- OP-code
  • A(Rs1) lt- RIR14..18
  • t lt- IR13
  • B(Rs2) lt- RIR0..4
  • D(S2) lt- (IR12)19IR0..12
  • C(Cond) lt- IR19..22
  • cc(SCC) lt- IR24
  • (NPC lt- NPC)

13
A Stage
  • ALU operations using operands, and effective
    address computation,
  • and condition test for conditional branches
  • Memory Ref Instr(t1) AO lt- NPCD(imm32)
  • LD Instruction C lt- C
  • Functional instr(t0) AO lt- A op B
  • C lt- C
  • Control instr AO lt- NPCD(imm32)
  • T lt- (flag(C) op 0)
  • (OP lt- OP)
  • (NPC lt- NPC)

14
M Stage
  • Memory access for read and write, and decide
    final PC value for branch instructions
  • LD DATA lt- MAO
  • ST MAO lt- B
  • Functional instruction AO lt- AO
  • Branch instruction if T0 PC lt- AO
  • if T1 PC lt-
    NPC
  • (OP lt- OP)
  • (C lt- C)

15
SR Stage
  • Store the result of operation in a register for a
    functional instruction,
  • and store the data read from memory to a register
    for load instruction
  • Functional instruction RC lt- AO
  • LD RC lt- DATA

16
Time Out
  • ??? ????? ?? ??? ?? ????? ?? ???.
  • ??? ??? ?? ??? ??? ???. ?? ??? ???? ?? ??? ?? ??
    ??? ?? ?? ?? ???? ????. ?? ?? ??? ?? ? ??? ??? ??
    ????? ???.
  • ??? ??? ??? ????? ??? ??? ?? ?????? ??? ?? ?? ??
    ? ?? ???.
  • 2 ?? ? ?? ??? ??. ??, ?? ? ???! ???, ? ??? ?? ??
    ? ?? ??? ? ????

17
Ideal Pipeline
  • Ideal Pipeline
  • Delays of the pipeline stages are identical -
    Pipeline Cycle
  • All the pipeline stages are occupied with tasks,
    except the filling time and draining time
  • Complete one task for every pipeline cycle after
    the filling time
  • Reasons for preventing pipelines from operating
    as an ideal pipeline even though delays of the
    pipeline stages are identical
  • Hazards
  • Structural Hazard
  • Data Hazard
  • Control Hazard

18
Structural Hazard
  • Cases when Structural Hazards take place
  • More than one instruction require the same
    pipeline stage at the same clock cycle
  • This never happens when the delay of the pipeline
    stages are identical
  • More than one pipeline stages try to use the same
    hardware resource at the same clock cycle
  • IF and A stages Operation with Adder
  • DR and SR stages Access register file
  • IF and M stages Access memory

19
Example Structural Hazard
  • Structural Hazard due to Adder - IF and A stage
    in the same cycle
  • Structural Hazard due to Register
  • Structural Hazard due to Memory

20
Hardware Solution - For Structural Hazards -
  • Adder Hazard in IF and A stages
  • Include a simple 4 adder in the IF stage to
    avoid using ALU in A stage in calculating PC4
  • Register Hazard
  • Register can be made to write access in the first
    half of the clock cycle, and read access in the
    second half of the clock cycle
  • Memory Hazard
  • Dedicated memory, i.e., separate Instruction
    Memory and Data Memory
  • 2-port memory

21
Data Hazard
  • Data Hazard is possible when more than one
    instruction in a
  • sequence share the same data
  • SLL R5, R1 IF DR A
    M SR
  • ADD R1, R2, R3 IF DR
    A M SR
  • AND R1, R4, R4 IF
    DR A M SR
  • SUB R5, R1, R6
    IF DR A M SR
  • XOR R1, R7, R8
    IF DR A M SR
  • Read After Write(RAW) Hazard
  • Supposed to read the written data, but reading
    it takes place first
  • Write After Read(WAR) Hazard
  • Supposed to read first then write it, but
    writing it takes place first
  • Write After Write(WAW) Hazard
  • Written data at the same location in a wrong
    order

22
Data Hazards
  • RAW Hazard
  • Ii precedes Ij, and Ij tries to read a register
    or data memory location before Ii stores data
    into there.
  • ADD R2, R3, R1
  • AND R1, R4, R4
  • WAR Hazard
  • Ii precedes Ij, and Ii reads data and Ij writes
    data at the same location and writing take place
    earlier than reading
  • This never happens if all the instructions go
    through the same pipeline stages with same delay
    because instructions go through SR stage(for
    writing) later than DR stage(for reading)
  • WAW Hazard
  • Ii precedes Ij, and both Ii and Ij writes data at
    the same location, but in a wrong order
  • This never happens also if the assumption in WAR
    is true

23
Forwarding Circuit For RAW Data Hazard
  • Circuit that forwards the data to be stored in SR
    stage to ALU input
  • MUX in A stage
  • Data to be stored in a register in SR stage
  • DATA, AO in M/S Buffer
  • AO in A/M Buffer
  • These values in inter-stage buffers are
    forwarded to the ALU input MUX

24
Instruction Scheduling with Forwarding Circuit
  • Resolving Data Hazard with registers by
    forwarding No delay
  • SLL R5, R1 IF DR A
    M SR
  • ADD R1, R2, R3 IF DR
    A M SR
  • AND R1, R4, R4 IF
    DR A M SR
  • SUB R5, R1, R6
    IF DR A M SR
  • XOR R1, R7, R8
    IF DR A M SR

25
Load Delay Due To RAWImprovement by Forwarding
Circuit
  • Load Delay 2 cycles
  • LD R1, X IF DR A
    M SR
  • stall
  • stall
  • ADD R1, R2, R3
    IF DR A M SR
  • AND R1, R4, R4
    IF DR A M SR
  • SUB R5, R1, R6
    IF DR A M
    SR
  • XOR R1, R7, R8
    IF DR
    A M SR
  • Load delay with forwarding 1 cycle
  • LD R1, X IF DR A M SR
  • stall
  • ADD R1, R2, R3 IF
    DR A M SR
  • AND R1, R4, R4
    IF DR A M SR
  • SUB R5, R1, R6
    IF DR A M SR
  • XOR R1, R7, R8
    IF DR A M SR

26
Load Delay Due To RAWImprovement by Software
Scheduling
  • LD R1, X IF DR A
    M SR
  • stall
  • ADD R1, R2, R3 IF
    DR A M SR
  • SUB R1, R5, R4
    IF DR A M SR
  • LD R6, Y
    IF DR A M
    SR

Software Scheduling LD R1, X
IF DR A M SR LD R6, Y
IF DR A
M SR ADD R1, R2, R3
IF DR A M SR SUB R1,
R5, R4 IF
DR A M SR
27
Control Hazard
  • Address of the instruction after a branch
    instruction is determined in M stage. Therefore,
    the next instruction fetch must be delayed until
    the branch instruction completes in M stage.
  • ADD R1, R2, R3 IF DR A M SR
  • JMP COND, X IF DR A M SR
  • stall
  • stall
  • stall
  • next instruction IF DR A
    M SR
  • Branch Delay of 3 cycles
  • Value of PC is decided by the value of T, which
    select the from input addresses to the MUX in M
    stage - AO(branch address) or NPC(PC)
  • Value of T is decided by testing the conditions
    in A stage
  • Branch address can be decided earlier if branch
    condition can be tested earlier

28
Reduction of Branch Effect
  • If calculation of Branch Address and Testing
    Condition are made earlier, Branch delay can be
    reduced.
  • Move these operations to DR stage
  • Include an Adder for branch address calculation
    in DR stage
  • Move Circuit to test the branch condition in M
    stage to DR stage

29
Branch DelayImprovement by Software Rescheduling
  • ADD R1, R2, R3 IF DR A M SR
  • JMP COND, X IF DR A M SR
  • stall
  • next instruction IF DR A
    M SR

Branch Delay 1 cycle Rescheduling JMP
COND, X IF DR A M SR ADD R1, R2,
R3 IF DR A M SR next
instruction IF DR A M SR
This is possible only if COND is set by the
instruction before the JMP instruction.
Conditional branch on the COND set by the
ADD(following JMP) is not possible. No branch
delay
30
Branch DelayImprovement by Hardware Branch
Predictor
Predict TAKEN, and actually TAKEN ADD R1,
R2, R3 IF DR A M SR JMP
COND, X IF DR A M
SR LD R1, Y SUB R3, R4, R5
X ADD R1, R6, R5
IF DR A M SR
Predict TAKEN, and actually NOT TAKEN IF DR
A M SR IF DR A M
SR IF DR A M SR IF
DR A M SR IF
1 Cycle Delay
1 Cycle Delay
Predict NOT TAKEN, and actually NOT TAKEN IF
DR A M SR IF DR A M
SR IF DR A M
SR IF DR A M SR
Predict NOT TAKEN, and actually TAKEN ADD
R1, R2, R3 IF DR A M SR
JMP COND, X IF DR A
M SR LD R1, Y
IF SUB R3, R4, R5 X ADD
R1, R6, R5 IF
DR A M SR
1 Cycle Delay
No Delay
31
Branch Prediction Penalty
Write a Comment
User Comments (0)
About PowerShow.com