CS 42906290 Lecture 07 Outoforder execution, Outoforder completion a.k.a. the cool stuff - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

CS 42906290 Lecture 07 Outoforder execution, Outoforder completion a.k.a. the cool stuff

Description:

CS 4290/6290 Lecture 07 Out-of-Order. CS 4290/6290 Lecture 07. Out-of ... Option 2: Arrgh! Let's look at Option 1 for now. 25. The College of Computing ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 70
Provided by: michaelt8
Category:

less

Transcript and Presenter's Notes

Title: CS 42906290 Lecture 07 Outoforder execution, Outoforder completion a.k.a. the cool stuff


1
CS 4290/6290 Lecture 07Out-of-order
execution,Out-of-order completion(a.k.a. the
cool stuff)
  • (Lectures based on the work of Jay Brockman,
    Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
    Ken MacKenzie, Richard Murphy, Michael Niemier,
    and Milos Pruvlovic)

2
Scheduling
  • Finds instructions to execute in each cycle
  • Static (in-order) schedulinglooks only at the
    next instruction
  • Dynamic (out-of-order) schedulinglooks at a
    window of instructions
  • How many instructions are we looking for?
  • 3-4 is typical today, 8 is in the works
  • A CPU that can ideally do N instrs per cycleis
    called N-way superscalar, N-issue
    superscalar, or simply N-way or N-issue.

3
Static Scheduling
  • Cycle 1
  • Start I1.
  • Can we also start I2? No.
  • Cycle 2
  • Start I2.
  • Can we also start I3? Yes.
  • Can we also start I4? No.
  • If the next instruction can not start,stops
    looking for things to do in this cycle!

Program code
I1 ADD R1, R2, R3
I2 SUB R4, R1, R5
I3 AND R6, R1, R7
I4 OR R8, R2, R6
I5 XOR R10, R2, R11
4
Dynamic Scheduling
  • Cycle 1
  • Operands ready? I1, I5.
  • Start I1, I5.
  • Cycle 2
  • Operands ready? I2, I3.
  • Start I2,I3.
  • Window size (W)how many instructions ahead do
    we look.
  • Do not confuse with issue width (N).
  • E.g. a 4-issue out-of-order processor can have a
    128-entry window (it can look at the next 128
    instructions).

Program code
I1 ADD R1, R2, R3
I2 SUB R4, R1, R5
I3 AND R6, R1, R7
I4 OR R8, R2, R6
I5 XOR R10, R2, R11
5
Dynamic Scheduling Pipeline
  • Fetch gets the next few instructions(reads the
    instruction stream in-order)
  • Decode decodes the instructions fetched in the
    previous cycle (in-order)
  • Then we can start looking at instructions and try
    to execute them out of order.
  • Important we fetch and decode in-order even in
    an out-of-order processor.

6
Register Renaming
  • Name dependences
  • I3 can not go before I2 becauseI3 will overwrite
    R5
  • I5 can not go before I2 becauseI2, when it goes,
    will overwriteR2 with a stale value
  • Name dependences because the dependence is
    because of register name,not the flow of data.

Program code
I1 ADD R1, R2, R3
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I4 OR R8, R6, R2
I5 XOR R2, R4, R11
7
Register Renaming
  • Solution give I3 some othersome other name
    (e.g. S)for the value it produces.
  • But I4 uses that value,so we must also change
    that to S
  • In fact, all uses of R5 from I3 to the next
    instruction that writes to R5 again must now be
    changed to S!
  • We get rid of output dependences in the same way
    change R2 in I5 (and subsequent instrs) to T.

Program code
I1 ADD R1, R2, R3
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I4 OR R8, R6, R2
I5 XOR R2, R4, R11
8
Register Renaming
  • Implementation
  • Space for T, S, etc.
  • How do we know whento rename a register?
  • Simple Solution
  • Do renaming in-order, just after decoding
  • Change the name of a registereach time we decode
    aninstruction that will write to it.
  • Remember what name we gave it ?

Program code
I1 ADD R1, R2, R3
I2 SUB R2, R1, R5
I3 AND S, R11, R7
I4 OR R8, R6, R2
I5 XOR T, R4, R11
9
Register Renaming Example
Renaming table
Original
Renamed
Destination
R1
T1
Source
R2
R2

Source
R5
R5

R8
R8

Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
10
Register Renaming Example
Renaming table
Original
Renamed
Source
R1
T1
Destination
R2
T2

Source
R5
R5

R8
R8

Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
11
Register Renaming Example
Renaming table
Original
Renamed
R1
T1
R2
T2

Destination
R5
T3

Source
R8
R8

Source
Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I3 AND T3, R11, R7
12
Register Renaming Example
Renaming table
Original
Renamed
R1
T1
R2
T2

R5
T3

R8
T4

Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I3 AND T3, R11, R7
I4 OR R8, R6, R2
I4 OR T4, R6, T2
13
Register Renaming Example
Renaming table
Original
Renamed
R1
T1
R2
T5

R5
T3

R8
T4

Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I3 AND T3, R11, R7
I4 OR R8, R6, R2
I4 OR T4, R6, T2
I5 XOR R2, R4, R11
I5 XOR T5, R4, R11
14
Register Names
  • We keep using new names
  • Each name needs a place to keep its value
  • We can have only so many of those places
  • What happens when we run out of names?
  • There must be a way to recycle names
  • When can we recycle a name?
  • When we have given its value to allinstructions
    that use it as a source operand!
  • This is not as easy as it sounds

15
Implementing Dynamic Scheduling
  • Tomasulos Algorithm
  • Used in IBM 260/91 (in the 60s)
  • Tracks when operands are availableto satisfy
    data dependences
  • Removes name dependencesthrough register
    renaming
  • Very similar to what is used today

16
Tomasulos Algorithm The Picture
17
Tomasulos Algorithm Issue
  • Get next instruction from instruction queue.
  • Find a free reservation station for it(if none
    are free, stall until one is)
  • Read operands that are in the registers
  • If the operand is not in the register,find which
    reservation station will produce it
  • In effect, this step renames registers(reservatio
    n station IDs are temporary names)

18
Tomasulos Algorithm Execute
  • Monitor results as they are produced
  • Put a result into all reservation stations
    waiting for it (missing source operand)
  • When all operands available for an
    instruction,it is ready (we can actually execute
    it)
  • Several ready instrs for one functional unit?
  • Pick one.
  • Except for load/storeLoad/Store must be done
    inthe proper order to avoid hazards through
    memory

19
Tomasulos Algorithm Write Result
  • When result is computed, make it availableon the
    common data bus (CDB), wherewaiting
    reservation stations can pick it up
  • Stores write to memory
  • Result stored in the register file
  • This step frees the reservation station
  • For our register renaming, this recycles the
    temporary name future instructions can again find
    the value in the actual register, until it is
    renamed again)

20
Tomasulos Algorithm Load/Store
  • The reservation stations take care of dependences
    through registers.
  • Dependences also possible through memory
  • Stores can not be reordered with respect toother
    load/store operations to the same address
  • Example
  • Can I3 execute before I2?
  • Not if R3 is 100!

I1 ADD R1, R2, R3
I2 ST R4, 100(R1)
I3 LD R4, (R2)
21
Tomasulos Algorithm Load/Store
  • Load
  • Wait for all previous stores to compute address
  • If any store to the same address,wait for it to
    actually write to memory
  • Alternatively, just get the value of the last
    such store
  • Store
  • Wait for all previous loads and stores to compute
    addresses
  • If any load/store from/to the same address,wait
    for it to read/write

22
Tomasulos Algorithm Example
  • We need to have
  • Instruction status
  • Not part of HW, but having it makes our life
    easier
  • Reservation stations
  • All fields for each reservation station
  • Register status
  • Which reservation station it is renamed to

Loop L.D F0, 0(R1) Load 64-bit FP
value MUL.D F4,F0,F2 Multiply
FP S.D F4,0(R1) Store 64-bit FP
value DADDUI R1,R1,-8 Add (int)
immediate BNE R1,R2,Loop Branch if R1!R2
23
Branches Kill!
  • Branches are very frequent
  • Approx. 20 of all instructions
  • Can not wait until we know where it goes
  • Long pipelines
  • Branch outcome known after B cycles
  • No scheduling past the branch until outcome known
  • Superscalars (e.g. 4-way)
  • Branch every cycle or so!
  • One cycle of work, then bubbles for B cycles?

24
Surviving Branches Prediction
  • Predict Branches
  • And predict them well!
  • Fetch, decode, etc. on the predicted path
  • Option 1 No execute until branch resovled
  • Option 2 Execute anyway (speculation)
  • Recover from mispredictions
  • Option 1 Restart fetch from correct path
  • Option 2 Arrgh! Lets look at Option 1 for now

25
Branch Prediction
  • Need to know two things
  • Whether the branch is taken or not (direction)
  • The target address if it is taken (target)
  • Direct jumps, Function calls
  • Direction known (always taken), target easy to
    compute
  • Conditional Branches (typically PC-relative)
  • Direction difficult to predict, target easy to
    compute
  • Indirect jumps, function returns
  • Direction known (always taken), target difficult

26
Branch Prediction Direction
  • Needed for conditional branches
  • Most branches are of this type
  • Many, many kinds of predictors for this
  • Static compiler annotation(e.g. BEQL is
    branch if equal likely)
  • Dynamic hardware prediction
  • Dynamic prediction usually history-based
  • Example predict direction is the sameas the
    last time this branch was executed

27
One-Bit Branch Predictor
Branch historytable of 2K entries,1 bit per
entry
K bits of branchinstruction address
Use this entry topredict this branch 0
predict not taken 1 predict taken
Index
When branch direction resolved,go back into the
table andupdate entry 0 if not taken, 1 if taken
28
The Bit Is Not Enough!
  • Example short loop (8 iterations)
  • Taken 7 times, then not taken once
  • Not-taken misspredicted (was taken previously)
  • Execute the same loop again
  • First always misspredicted(previous outcome was
    not taken)
  • Then 6 predicted correctly
  • Then last one misspredicted again
  • Each fluke in a stable patternresults in two
    misspredicts per loop

29
Two Bits are Better Than One
  • Two-Bit Predictor
  • First bit is the prediction
  • Second bit tells if it is strong or weak
  • A misspredict will
  • Weaken a strong prediction
  • Change a weak predictionto the opposite
    strongprediction
  • Correct prediction will
  • Strengthen a weak prediction
  • Leave strong predictions strong

30
Still Not Good Enough
We can live with these
These are good
This is bad!
31
(N,M) Correlating Predictors
  • Branch outcome correlates with the outcome of
    some recently executed branches
  • Use this in our prediction
  • Keep N bits of historyof recent outcomes
  • Use a different M-bitpredictor for each
    differenthistory
  • Note N-bit history means2N different
    predictors foreach branch

32
The gShare Predictor
  • Correlating predictors often wasteful
  • Some histories are rare or even impossible
  • Yet we dedicate a predictor for each history
  • Solution hashing
  • Use a single large predictor table
  • Hash history and branch address together
  • Use the hash to index into the table
  • The hash is just an XOR, so its fast

33
The gShare Predictor
K bits of branchinstruction address
Index
Prediction
XOR
N bits of globalbranch history
Table of 2-bitpredictors with2max(N,K)entries
34
The pShare Predictor
  • Similar to gShare, but uses local history

Branch address
L bits
K bits
N-bit localhistory
Prediction
XOR
Index
Index
Table of 2-bitpredictors with2max(N,L)entries
Table of local historieswith 2K entries,each
entry has N bits
35
Why pShare is Good?
  • Long local history (e.g. 10 bits) used to choose
    the actual predictor for the branch
  • Back to our 8-iteration loop example
  • The 8-th (not taken) branch would alwayshave a
    history of 1101111111
  • All other seven instances of this branch inthat
    loop have different histories
  • So, after a few passes through the loop to
    train the predictors, we have perfect
    prediction each time

36
Why pShare is Bad?
  • Needs a lot of branch instances totrain the
    different 2-bit predictors
  • Simple 2-bit predictor
  • Has a prediction after it sees one instance of a
    branch
  • The pShare predictor
  • Has a prediction after it sees an instanceof
    that branch and that particular history
  • Back to our loop example
  • pShare needs two entire 8-iteration loops to
    warm up
  • Starts making useful predictions onlywhen we
    enter the same loop for the second time

37
Tournament Predictors
  • No predictor is clearly the best
  • Simple 2-bit warms up quicklyand uses only 2
    bits per branch
  • pShare uses many bits per branch,but tends to be
    much better after warming-up
  • IdeaLets have a predictor to predictwhich
    predictor will predict better ?

38
Direction Predictor Accuracy
39
Target Address Prediction
  • Branch Target Buffer
  • IF stage need to know fetch addr every cycle
  • Need target address one cycle after fetching a
    branch
  • For some branches (e.g. indirect) target
    knownonly after EX stage, which is way too late
  • Even easily-computed branch targets need to wait
    until instruction decoded and direction predicted
    in ID stage(still at least one cycle too late)
  • So, we have a quick-and-dirty predictor for the
    targetthat only needs the address of the branch
    instruction

40
Branch Target Buffer
  • BTB indexed by instruction address
  • We dont even know if it is a branch!
  • If address matches a BTB entry, it ispredicted
    to be a branch
  • BTB entry tells whether it is taken (direction)
    and where it goes if taken
  • BTB takes only the instruction address, sowhile
    we fetch one instruction in the IF stagewe are
    predicting where to fetch the next one from

41
Branch Target Buffer
42
Return Address Stack (RAS)
  • Function returns are frequent, yet
  • Address is difficult to compute(have to wait
    until EX stage done to know it)
  • Address difficult to predict with BTB(function
    can be called from multiple places)
  • But return address is actually easy to predict
  • It is the address after the last call
    instructionthat we havent returned from yet
  • Hence the Return Address Stack

43
Return Address Stack (RAS)
  • Call pushes return address into the RAS
  • When a return instruction decoded,pop the
    predicted return address from RAS
  • Accurate prediction even w/ small RAS

44
Life Story of a Branch
  • BTB predicts next address in IF stage
  • Later, after decoding we can get a second
    prediction from RAS or direction predictor
  • These are usually better than BTB, so if they say
    differently, we make bubbles and restart fetch
    from new prediction
  • Finally, the actual branch outcome becomes known
    eventually. If it is different from prediction,
    bubbles and restart fetch again

45
Speculation
  • Predict branches, then do everything(execute,
    write result, schedule instructions)
  • What do we do when we mispredict?
  • Two things
  • Allow things-before-the-branch to complete
  • Undo things-after-the-branch we have completed
  • Solution
  • At the end, put instructions in the correct order
    again

46
Speculation Pipeline
  • New Structure Reorder Buffer (ROB)
  • Queues instructions in the original order
  • Use ROB entry number as name in renaming
  • ROB entry keeps the result after Write Result
  • New stage Commit
  • Takes the oldest instruction in ROB
  • If instruction executed and result in ROB entry
  • Write result to registers
  • Free the ROB entry
  • Do this N times per cycle in a N-way superscalar

47
Recovery From a Misprediction
  • Mispredicted branch eventually committed
  • Now precise state is in the registers
  • Everything before the branch done and in regs
  • Nothing after the branch is in regs yet
  • Flush all the other structures
  • Reservation stations, ROB, instruction queue
  • Restart fetch from correct destination
  • Precise exceptions? Same thing!

48
Speculation Stores
  • ROB takes over the role of the store queue
  • Stores go to memory when they commit
  • Commit is in-order, so store order is correct
  • Mispredictions do not affect memory state

49
Speculation The Picture
50
ROB vs. Register Renaming
  • How many ports do we need for the ROB?
  • Lots! Look at a single-issue processor
  • Issue read two entries and write one
  • Write Result write one entry
  • Commit read and write one entry
  • ROB has a dual role
  • Keeps results (names)
  • Keeps order
  • Lets split the two roles

51
ROB vs. Register Renaming
  • Keeping results physical registers
  • Have a large physical register file
  • Keep architected-to-physical mapping in a table
  • Physical registers hold all values (names)
  • Keeping order simplified ROB
  • Only keeps info needed to commit instructions
  • Reservation stations also simplified
  • No need to keep values
  • Called instruction window instead of RS

52
How does it work?
  • Rename
  • Find in the rename RAT (Register Allocation
    Table)which physical registers are sources
  • Get a free physical register for destinationand
    change rename RAT
  • Dispatch
  • Wait in windowuntil all source registers have
    values, then
  • Read source values from registers
  • Write Result
  • Send result to destination register
  • Send destination register number to window

53
Committing
  • Wait until oldest instruction done
  • Change commit RAT
  • Before it said Rn is in Pj
  • Now change it so Rn is in Pk (the destination)
  • Free physical register Pj
  • Everything that wants Pj is already committed
  • All future uses of Rn should use Pk

54
Recovering Precise State
  • To get precise state after instruction X, we
  • Wait until X commits
  • The commit RAT is the precise state
  • E.g. recovery from branch misprediction
  • Wait until X commits
  • Rename map commit map
  • Flush window ROB, restart fetch

55
Reading Assignment
  • J. E. Smith and A. R. Pleszkun, Implementing
    Precise Interrupts in Pipelined Processors",IEEE
    Transactions on Computers,37(5), pages 562-573,
    May, 1988.
  • How to get the paper http//gtel.gatech.edu2051/
    Xplore/DynWel.jsp
  • Then log on with your GT user pass
  • Search in Journals Magazines forComputers,
    select ToC
  • Find the year 1988, the May issue

56
Register Renaming Example
  • 8 architectural (logical) registers R0..R7
  • 16 physical registers (numbered 0..15),
    6-instruction window
  • Single-issue, nine-stage pipeline
  • Fetch (also use BTB to predict next fetch addr)
  • Decode
  • Rename and put in instruction window
  • Also use RAS and direction predictor, calculate
    target address if not indirect
  • Schedule
  • Instruction stays in schedule stage until
    operands ready
  • Read Operands
  • Execute
  • Also calculate target address if indirect
  • Read Memory
  • Write Result
  • Commit
  • Instruction stays in commit stage until it can
    actually commit

57
Register Renaming Example
R0,
XOR
R0,
R0
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
0
R2,
LD.IMM
4(R0)
0
R3,
LD.IMM
400(R0)
0
R4,
AND
R0,
R0
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
0
0
Cycle 3 Rename I1
R0
P8
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
58
Register Renaming Example
R0,
XOR
R0,
R0
P9
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
P10
0
R2,
LD.IMM
4(R0)
P11
0
R3,
LD.IMM
400(R0)
P12
0
R4,
AND
R0,
R0
P13
0
R5,
LD
0(R3)
P14
0
R4,
ADD
R4,
R5
P15
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
1
0
End of Cycle 3
R0
P8
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
59
Register Renaming Example
R0,
XOR
R0,
R0
P10
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
P11
P8
0
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P12
0
R3,
LD.IMM
400(R0)
P13
0
R4,
AND
R0,
R0
P14
0
R5,
LD
0(R3)
P15
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
2
0
Cycle 4 Renamed I2
R0
P8
1
0
R1
P9
1
0
0
0
0
0
0
0
0
0
0
0
0
0
60
Register Renaming Example
R0,
XOR
R0,
R0
P10
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
P11
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P12
0
R3,
LD.IMM
400(R0)
P13
0
R4,
AND
R0,
R0
P14
0
R5,
LD
0(R3)
P15
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
2
0
Cycle 4 Schedule(the XOR is scheduled)
R0
P8
1
0
R1
P9
1
0
0
0
0
0
0
0
0
0
0
0
0
0
61
Register Renaming Example
R0,
XOR
R0,
R0
P11
P8
0
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P12
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P13
0
R3,
LD.IMM
400(R0)
P14
0
R4,
AND
R0,
R0
P15
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
3
0
Cycle 5 Renamed I3
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
0
0
0
0
0
0
0
0
0
0
RR
0
P0
1
P0
1
XOR
0
P8
62
Register Renaming Example
R0,
XOR
R0,
R0
P11
0
R1,
LD.IMM
416(R0)
P12
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P13
P8
0
1
LD.IMM
1
2
P10
R3,
LD.IMM
400(R0)
4
P14
0
R4,
AND
R0,
R0
P15
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
3
0
Cycle 5 Schedule
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
0
0
0
0
0
0
0
0
0
0
0
P0
1
P0
1
XOR
0
P8
63
Register Renaming Example
R0,
XOR
R0,
R0
P11
0
R1,
LD.IMM
416(R0)
P12
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P13
P8
0
1
LD.IMM
1
2
P10
R3,
LD.IMM
400(R0)
4
P14
0
R4,
AND
R0,
R0
P15
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
3
0
Cycle 5 I1 Reads Regs
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
0
P0
1
P0
1
XOR
0
P8
0
0
0
0
0
0
0
0
0
0
64
Register Renaming Example
R0,
XOR
R0,
R0
P12
P8
0
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P13
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P14
P8
0
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P15
0
R4,
AND
R0,
R0
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 6 Renamed I4
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
0
0
0
0
0
0
Exe
0
P0
1
P0
1
XOR
0
P8
0
0
RR
0
65
Register Renaming Example
R0,
XOR
R0,
R0
P13
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P14
P8
1
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P15
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 7 Renamed I5,Sched nothing, thenI1
Writes result
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
P0
1
P0
1
XOR
0
P8
0
Exe
0
0
0
RR
0
66
Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
P8
1
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 8 Renamed I6
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
67
Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
P8
1
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 8 ScheduleI2..I5 can be scheduled,pick I2
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
68
Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
R2,
LD.IMM
4(R0)
0
P0
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 8 Commit
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
P8
1
1
LD.IMM
1
1
P9
416
69
Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
R2,
LD.IMM
4(R0)
0
P0
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
1
Cycle 8 After Commit
0
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
P8
1
1
LD.IMM
1
1
P9
416
Write a Comment
User Comments (0)
About PowerShow.com