Title: CS 42906290 Lecture 07 Outoforder execution, Outoforder completion a.k.a. the cool stuff
1CS 4290/6290 Lecture 07Out-of-order
execution,Out-of-order completion(a.k.a. the
cool stuff)
- (Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, Michael Niemier,
and Milos Pruvlovic)
2Scheduling
- Finds instructions to execute in each cycle
- Static (in-order) schedulinglooks only at the
next instruction - Dynamic (out-of-order) schedulinglooks at a
window of instructions - How many instructions are we looking for?
- 3-4 is typical today, 8 is in the works
- A CPU that can ideally do N instrs per cycleis
called N-way superscalar, N-issue
superscalar, or simply N-way or N-issue.
3Static Scheduling
- Cycle 1
- Start I1.
- Can we also start I2? No.
- Cycle 2
- Start I2.
- Can we also start I3? Yes.
- Can we also start I4? No.
- If the next instruction can not start,stops
looking for things to do in this cycle!
Program code
I1 ADD R1, R2, R3
I2 SUB R4, R1, R5
I3 AND R6, R1, R7
I4 OR R8, R2, R6
I5 XOR R10, R2, R11
4Dynamic Scheduling
- Cycle 1
- Operands ready? I1, I5.
- Start I1, I5.
- Cycle 2
- Operands ready? I2, I3.
- Start I2,I3.
- Window size (W)how many instructions ahead do
we look. - Do not confuse with issue width (N).
- E.g. a 4-issue out-of-order processor can have a
128-entry window (it can look at the next 128
instructions).
Program code
I1 ADD R1, R2, R3
I2 SUB R4, R1, R5
I3 AND R6, R1, R7
I4 OR R8, R2, R6
I5 XOR R10, R2, R11
5Dynamic Scheduling Pipeline
- Fetch gets the next few instructions(reads the
instruction stream in-order) - Decode decodes the instructions fetched in the
previous cycle (in-order) - Then we can start looking at instructions and try
to execute them out of order. - Important we fetch and decode in-order even in
an out-of-order processor.
6Register Renaming
- Name dependences
- I3 can not go before I2 becauseI3 will overwrite
R5 - I5 can not go before I2 becauseI2, when it goes,
will overwriteR2 with a stale value - Name dependences because the dependence is
because of register name,not the flow of data.
Program code
I1 ADD R1, R2, R3
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I4 OR R8, R6, R2
I5 XOR R2, R4, R11
7Register Renaming
- Solution give I3 some othersome other name
(e.g. S)for the value it produces. - But I4 uses that value,so we must also change
that to S - In fact, all uses of R5 from I3 to the next
instruction that writes to R5 again must now be
changed to S! - We get rid of output dependences in the same way
change R2 in I5 (and subsequent instrs) to T.
Program code
I1 ADD R1, R2, R3
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I4 OR R8, R6, R2
I5 XOR R2, R4, R11
8Register Renaming
- Implementation
- Space for T, S, etc.
- How do we know whento rename a register?
- Simple Solution
- Do renaming in-order, just after decoding
- Change the name of a registereach time we decode
aninstruction that will write to it. - Remember what name we gave it ?
Program code
I1 ADD R1, R2, R3
I2 SUB R2, R1, R5
I3 AND S, R11, R7
I4 OR R8, R6, R2
I5 XOR T, R4, R11
9Register Renaming Example
Renaming table
Original
Renamed
Destination
R1
T1
Source
R2
R2
Source
R5
R5
R8
R8
Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
10Register Renaming Example
Renaming table
Original
Renamed
Source
R1
T1
Destination
R2
T2
Source
R5
R5
R8
R8
Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
11Register Renaming Example
Renaming table
Original
Renamed
R1
T1
R2
T2
Destination
R5
T3
Source
R8
R8
Source
Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I3 AND T3, R11, R7
12Register Renaming Example
Renaming table
Original
Renamed
R1
T1
R2
T2
R5
T3
R8
T4
Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I3 AND T3, R11, R7
I4 OR R8, R6, R2
I4 OR T4, R6, T2
13Register Renaming Example
Renaming table
Original
Renamed
R1
T1
R2
T5
R5
T3
R8
T4
Decoded
Renamed
I1 ADD T1, R2, R3
I1 ADD R1, R2, R3
I2 SUB T2, T1, R5
I2 SUB R2, R1, R5
I3 AND R5, R11, R7
I3 AND T3, R11, R7
I4 OR R8, R6, R2
I4 OR T4, R6, T2
I5 XOR R2, R4, R11
I5 XOR T5, R4, R11
14Register Names
- We keep using new names
- Each name needs a place to keep its value
- We can have only so many of those places
- What happens when we run out of names?
- There must be a way to recycle names
- When can we recycle a name?
- When we have given its value to allinstructions
that use it as a source operand! - This is not as easy as it sounds
15Implementing Dynamic Scheduling
- Tomasulos Algorithm
- Used in IBM 260/91 (in the 60s)
- Tracks when operands are availableto satisfy
data dependences - Removes name dependencesthrough register
renaming - Very similar to what is used today
16Tomasulos Algorithm The Picture
17Tomasulos Algorithm Issue
- Get next instruction from instruction queue.
- Find a free reservation station for it(if none
are free, stall until one is) - Read operands that are in the registers
- If the operand is not in the register,find which
reservation station will produce it - In effect, this step renames registers(reservatio
n station IDs are temporary names)
18Tomasulos Algorithm Execute
- Monitor results as they are produced
- Put a result into all reservation stations
waiting for it (missing source operand) - When all operands available for an
instruction,it is ready (we can actually execute
it) - Several ready instrs for one functional unit?
- Pick one.
- Except for load/storeLoad/Store must be done
inthe proper order to avoid hazards through
memory
19Tomasulos Algorithm Write Result
- When result is computed, make it availableon the
common data bus (CDB), wherewaiting
reservation stations can pick it up - Stores write to memory
- Result stored in the register file
- This step frees the reservation station
- For our register renaming, this recycles the
temporary name future instructions can again find
the value in the actual register, until it is
renamed again)
20Tomasulos Algorithm Load/Store
- The reservation stations take care of dependences
through registers. - Dependences also possible through memory
- Stores can not be reordered with respect toother
load/store operations to the same address - Example
- Can I3 execute before I2?
- Not if R3 is 100!
I1 ADD R1, R2, R3
I2 ST R4, 100(R1)
I3 LD R4, (R2)
21Tomasulos Algorithm Load/Store
- Load
- Wait for all previous stores to compute address
- If any store to the same address,wait for it to
actually write to memory - Alternatively, just get the value of the last
such store - Store
- Wait for all previous loads and stores to compute
addresses - If any load/store from/to the same address,wait
for it to read/write
22Tomasulos Algorithm Example
- We need to have
- Instruction status
- Not part of HW, but having it makes our life
easier - Reservation stations
- All fields for each reservation station
- Register status
- Which reservation station it is renamed to
Loop L.D F0, 0(R1) Load 64-bit FP
value MUL.D F4,F0,F2 Multiply
FP S.D F4,0(R1) Store 64-bit FP
value DADDUI R1,R1,-8 Add (int)
immediate BNE R1,R2,Loop Branch if R1!R2
23Branches Kill!
- Branches are very frequent
- Approx. 20 of all instructions
- Can not wait until we know where it goes
- Long pipelines
- Branch outcome known after B cycles
- No scheduling past the branch until outcome known
- Superscalars (e.g. 4-way)
- Branch every cycle or so!
- One cycle of work, then bubbles for B cycles?
24Surviving Branches Prediction
- Predict Branches
- And predict them well!
- Fetch, decode, etc. on the predicted path
- Option 1 No execute until branch resovled
- Option 2 Execute anyway (speculation)
- Recover from mispredictions
- Option 1 Restart fetch from correct path
- Option 2 Arrgh! Lets look at Option 1 for now
25Branch Prediction
- Need to know two things
- Whether the branch is taken or not (direction)
- The target address if it is taken (target)
- Direct jumps, Function calls
- Direction known (always taken), target easy to
compute - Conditional Branches (typically PC-relative)
- Direction difficult to predict, target easy to
compute - Indirect jumps, function returns
- Direction known (always taken), target difficult
26Branch Prediction Direction
- Needed for conditional branches
- Most branches are of this type
- Many, many kinds of predictors for this
- Static compiler annotation(e.g. BEQL is
branch if equal likely) - Dynamic hardware prediction
- Dynamic prediction usually history-based
- Example predict direction is the sameas the
last time this branch was executed
27One-Bit Branch Predictor
Branch historytable of 2K entries,1 bit per
entry
K bits of branchinstruction address
Use this entry topredict this branch 0
predict not taken 1 predict taken
Index
When branch direction resolved,go back into the
table andupdate entry 0 if not taken, 1 if taken
28The Bit Is Not Enough!
- Example short loop (8 iterations)
- Taken 7 times, then not taken once
- Not-taken misspredicted (was taken previously)
- Execute the same loop again
- First always misspredicted(previous outcome was
not taken) - Then 6 predicted correctly
- Then last one misspredicted again
- Each fluke in a stable patternresults in two
misspredicts per loop
29Two Bits are Better Than One
- Two-Bit Predictor
- First bit is the prediction
- Second bit tells if it is strong or weak
- A misspredict will
- Weaken a strong prediction
- Change a weak predictionto the opposite
strongprediction - Correct prediction will
- Strengthen a weak prediction
- Leave strong predictions strong
30Still Not Good Enough
We can live with these
These are good
This is bad!
31(N,M) Correlating Predictors
- Branch outcome correlates with the outcome of
some recently executed branches - Use this in our prediction
- Keep N bits of historyof recent outcomes
- Use a different M-bitpredictor for each
differenthistory - Note N-bit history means2N different
predictors foreach branch
32The gShare Predictor
- Correlating predictors often wasteful
- Some histories are rare or even impossible
- Yet we dedicate a predictor for each history
- Solution hashing
- Use a single large predictor table
- Hash history and branch address together
- Use the hash to index into the table
- The hash is just an XOR, so its fast
33The gShare Predictor
K bits of branchinstruction address
Index
Prediction
XOR
N bits of globalbranch history
Table of 2-bitpredictors with2max(N,K)entries
34The pShare Predictor
- Similar to gShare, but uses local history
Branch address
L bits
K bits
N-bit localhistory
Prediction
XOR
Index
Index
Table of 2-bitpredictors with2max(N,L)entries
Table of local historieswith 2K entries,each
entry has N bits
35Why pShare is Good?
- Long local history (e.g. 10 bits) used to choose
the actual predictor for the branch - Back to our 8-iteration loop example
- The 8-th (not taken) branch would alwayshave a
history of 1101111111 - All other seven instances of this branch inthat
loop have different histories - So, after a few passes through the loop to
train the predictors, we have perfect
prediction each time
36Why pShare is Bad?
- Needs a lot of branch instances totrain the
different 2-bit predictors - Simple 2-bit predictor
- Has a prediction after it sees one instance of a
branch - The pShare predictor
- Has a prediction after it sees an instanceof
that branch and that particular history - Back to our loop example
- pShare needs two entire 8-iteration loops to
warm up - Starts making useful predictions onlywhen we
enter the same loop for the second time
37Tournament Predictors
- No predictor is clearly the best
- Simple 2-bit warms up quicklyand uses only 2
bits per branch - pShare uses many bits per branch,but tends to be
much better after warming-up - IdeaLets have a predictor to predictwhich
predictor will predict better ?
38Direction Predictor Accuracy
39Target Address Prediction
- Branch Target Buffer
- IF stage need to know fetch addr every cycle
- Need target address one cycle after fetching a
branch - For some branches (e.g. indirect) target
knownonly after EX stage, which is way too late - Even easily-computed branch targets need to wait
until instruction decoded and direction predicted
in ID stage(still at least one cycle too late) - So, we have a quick-and-dirty predictor for the
targetthat only needs the address of the branch
instruction
40Branch Target Buffer
- BTB indexed by instruction address
- We dont even know if it is a branch!
- If address matches a BTB entry, it ispredicted
to be a branch - BTB entry tells whether it is taken (direction)
and where it goes if taken - BTB takes only the instruction address, sowhile
we fetch one instruction in the IF stagewe are
predicting where to fetch the next one from
41Branch Target Buffer
42Return Address Stack (RAS)
- Function returns are frequent, yet
- Address is difficult to compute(have to wait
until EX stage done to know it) - Address difficult to predict with BTB(function
can be called from multiple places) - But return address is actually easy to predict
- It is the address after the last call
instructionthat we havent returned from yet - Hence the Return Address Stack
43Return Address Stack (RAS)
- Call pushes return address into the RAS
- When a return instruction decoded,pop the
predicted return address from RAS - Accurate prediction even w/ small RAS
44Life Story of a Branch
- BTB predicts next address in IF stage
- Later, after decoding we can get a second
prediction from RAS or direction predictor - These are usually better than BTB, so if they say
differently, we make bubbles and restart fetch
from new prediction - Finally, the actual branch outcome becomes known
eventually. If it is different from prediction,
bubbles and restart fetch again
45Speculation
- Predict branches, then do everything(execute,
write result, schedule instructions) - What do we do when we mispredict?
- Two things
- Allow things-before-the-branch to complete
- Undo things-after-the-branch we have completed
- Solution
- At the end, put instructions in the correct order
again
46Speculation Pipeline
- New Structure Reorder Buffer (ROB)
- Queues instructions in the original order
- Use ROB entry number as name in renaming
- ROB entry keeps the result after Write Result
- New stage Commit
- Takes the oldest instruction in ROB
- If instruction executed and result in ROB entry
- Write result to registers
- Free the ROB entry
- Do this N times per cycle in a N-way superscalar
47Recovery From a Misprediction
- Mispredicted branch eventually committed
- Now precise state is in the registers
- Everything before the branch done and in regs
- Nothing after the branch is in regs yet
- Flush all the other structures
- Reservation stations, ROB, instruction queue
- Restart fetch from correct destination
- Precise exceptions? Same thing!
48Speculation Stores
- ROB takes over the role of the store queue
- Stores go to memory when they commit
- Commit is in-order, so store order is correct
- Mispredictions do not affect memory state
49Speculation The Picture
50ROB vs. Register Renaming
- How many ports do we need for the ROB?
- Lots! Look at a single-issue processor
- Issue read two entries and write one
- Write Result write one entry
- Commit read and write one entry
- ROB has a dual role
- Keeps results (names)
- Keeps order
- Lets split the two roles
51ROB vs. Register Renaming
- Keeping results physical registers
- Have a large physical register file
- Keep architected-to-physical mapping in a table
- Physical registers hold all values (names)
- Keeping order simplified ROB
- Only keeps info needed to commit instructions
- Reservation stations also simplified
- No need to keep values
- Called instruction window instead of RS
52How does it work?
- Rename
- Find in the rename RAT (Register Allocation
Table)which physical registers are sources - Get a free physical register for destinationand
change rename RAT - Dispatch
- Wait in windowuntil all source registers have
values, then - Read source values from registers
- Write Result
- Send result to destination register
- Send destination register number to window
53Committing
- Wait until oldest instruction done
- Change commit RAT
- Before it said Rn is in Pj
- Now change it so Rn is in Pk (the destination)
- Free physical register Pj
- Everything that wants Pj is already committed
- All future uses of Rn should use Pk
54Recovering Precise State
- To get precise state after instruction X, we
- Wait until X commits
- The commit RAT is the precise state
- E.g. recovery from branch misprediction
- Wait until X commits
- Rename map commit map
- Flush window ROB, restart fetch
55Reading Assignment
- J. E. Smith and A. R. Pleszkun, Implementing
Precise Interrupts in Pipelined Processors",IEEE
Transactions on Computers,37(5), pages 562-573,
May, 1988. - How to get the paper http//gtel.gatech.edu2051/
Xplore/DynWel.jsp - Then log on with your GT user pass
- Search in Journals Magazines forComputers,
select ToC - Find the year 1988, the May issue
56Register Renaming Example
- 8 architectural (logical) registers R0..R7
- 16 physical registers (numbered 0..15),
6-instruction window - Single-issue, nine-stage pipeline
- Fetch (also use BTB to predict next fetch addr)
- Decode
- Rename and put in instruction window
- Also use RAS and direction predictor, calculate
target address if not indirect - Schedule
- Instruction stays in schedule stage until
operands ready - Read Operands
- Execute
- Also calculate target address if indirect
- Read Memory
- Write Result
- Commit
- Instruction stays in commit stage until it can
actually commit
57Register Renaming Example
R0,
XOR
R0,
R0
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
0
R2,
LD.IMM
4(R0)
0
R3,
LD.IMM
400(R0)
0
R4,
AND
R0,
R0
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
0
0
Cycle 3 Rename I1
R0
P8
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
58Register Renaming Example
R0,
XOR
R0,
R0
P9
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
P10
0
R2,
LD.IMM
4(R0)
P11
0
R3,
LD.IMM
400(R0)
P12
0
R4,
AND
R0,
R0
P13
0
R5,
LD
0(R3)
P14
0
R4,
ADD
R4,
R5
P15
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
1
0
End of Cycle 3
R0
P8
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
59Register Renaming Example
R0,
XOR
R0,
R0
P10
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
P11
P8
0
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P12
0
R3,
LD.IMM
400(R0)
P13
0
R4,
AND
R0,
R0
P14
0
R5,
LD
0(R3)
P15
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
2
0
Cycle 4 Renamed I2
R0
P8
1
0
R1
P9
1
0
0
0
0
0
0
0
0
0
0
0
0
0
60Register Renaming Example
R0,
XOR
R0,
R0
P10
P0
1
P0
1
XOR
1
0
P8
R1,
LD.IMM
416(R0)
P11
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P12
0
R3,
LD.IMM
400(R0)
P13
0
R4,
AND
R0,
R0
P14
0
R5,
LD
0(R3)
P15
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
2
0
Cycle 4 Schedule(the XOR is scheduled)
R0
P8
1
0
R1
P9
1
0
0
0
0
0
0
0
0
0
0
0
0
0
61Register Renaming Example
R0,
XOR
R0,
R0
P11
P8
0
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P12
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P13
0
R3,
LD.IMM
400(R0)
P14
0
R4,
AND
R0,
R0
P15
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
3
0
Cycle 5 Renamed I3
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
0
0
0
0
0
0
0
0
0
0
RR
0
P0
1
P0
1
XOR
0
P8
62Register Renaming Example
R0,
XOR
R0,
R0
P11
0
R1,
LD.IMM
416(R0)
P12
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P13
P8
0
1
LD.IMM
1
2
P10
R3,
LD.IMM
400(R0)
4
P14
0
R4,
AND
R0,
R0
P15
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
3
0
Cycle 5 Schedule
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
0
0
0
0
0
0
0
0
0
0
0
P0
1
P0
1
XOR
0
P8
63Register Renaming Example
R0,
XOR
R0,
R0
P11
0
R1,
LD.IMM
416(R0)
P12
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P13
P8
0
1
LD.IMM
1
2
P10
R3,
LD.IMM
400(R0)
4
P14
0
R4,
AND
R0,
R0
P15
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
3
0
Cycle 5 I1 Reads Regs
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
0
P0
1
P0
1
XOR
0
P8
0
0
0
0
0
0
0
0
0
0
64Register Renaming Example
R0,
XOR
R0,
R0
P12
P8
0
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P13
P8
0
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P14
P8
0
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P15
0
R4,
AND
R0,
R0
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 6 Renamed I4
R0
P8
1
0
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
0
0
0
0
0
0
Exe
0
P0
1
P0
1
XOR
0
P8
0
0
RR
0
65Register Renaming Example
R0,
XOR
R0,
R0
P13
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P14
P8
1
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P15
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
0
R5,
LD
0(R3)
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 7 Renamed I5,Sched nothing, thenI1
Writes result
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
P0
1
P0
1
XOR
0
P8
0
Exe
0
0
0
RR
0
66Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
P8
1
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 8 Renamed I6
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
67Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
P8
1
1
LD.IMM
1
1
P9
R2,
LD.IMM
4(R0)
416
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 8 ScheduleI2..I5 can be scheduled,pick I2
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
68Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
R2,
LD.IMM
4(R0)
0
P0
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
0
Cycle 8 Commit
R0
P8
1
1
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
P8
1
1
LD.IMM
1
1
P9
416
69Register Renaming Example
R0,
XOR
R0,
R0
P14
P8
1
1
LD.IMM
1
2
P10
R1,
LD.IMM
416(R0)
4
P15
R2,
LD.IMM
4(R0)
0
P0
P8
1
1
LD.IMM
1
3
P11
R3,
LD.IMM
400(R0)
400
P8
1
P8
1
AND
1
4
P12
R4,
AND
R0,
R0
P11
0
1
LD
1
5
P13
R5,
LD
0(R3)
0
0
R4,
ADD
R4,
R5
R3,
ADD
R3,
R2
R3,
BNE
R1,
-12(PC)
4
1
Cycle 8 After Commit
0
R1
P9
1
0
R2
P10
1
0
R3
P11
1
0
R4
P12
1
0
0
0
0
WR
0
0
Exe
0
0
0
RR
0
P8
1
1
LD.IMM
1
1
P9
416