???? (Cont), ?????ILP - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

???? (Cont), ?????ILP

Description:

Title: Subject: Dynamic Scheduling Author: Xuehai Zhou Last modified by: walkinnet Created Date: 10/22/1997 7:56:43 AM Document presentation format – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 64
Provided by: Xue59
Category:
Tags: ilp | arch | cont

less

Transcript and Presenter's Notes

Title: ???? (Cont), ?????ILP


1
???? (Cont), ?????ILP
2
Review Tomasulo
  • ?????????????????
  • ???????
  • ????????????????????????
  • ?????????
  • Out-of-order execution ? out-of-order completion!
  • ?????????
  • ????????????,??????
  • ????????????????
  • Scoreboard ?Tomasula
  • Tomasula ????
  • Dynamic scheduling
  • Register renaming---???WAW,WAR??
  • Load/store disambiguation
  • ???????
  • ??
  • ????CDB
  • ?????Common Data Bus

3
????????
  • ???????????????????
  • ????????????????????
  • ????????,?????RAW?WAR??
  • ???????????????????????
  • ?????? rename table ,??????????????????
  • ????????????RS?.
  • ????????2x ?????x????.

4
?????????
  • ????????????????
  • ???????????,???????????????????.
  • ?????????????????,????
  • ??
  • DIVD F10, F0, F2
  • SUBD F4, F6, F8
  • ADDD F12, F14, F16
  • ??rollback ???????????
  • ??????????????
  • ??????????????
  • ???????????
  • ?????????????(???)
  • ??????????????????????

5
??????????????????!
  • ?????????,??????????????????,??????????!
    Loop LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R
    1 SUBI R1 R1 8 BNEZ R1 Loop
  • ???????multd,?????
  • ?????????
  • ??????,???????????
  • ??superscalar??????????

6
???????????
  • ???????????????????,?????????????????????????????
    ??????
  • ??????????????????
  • ???????????????,???????????????,??????????????????
    ???????
  • ???????????????????????
  • ????
  • ????????????
  • ?????????????????

7
????????
  • ??????K??????,?????????????
  • ??????????,???????????????,???????????????????,???
    ???????????,???????,????????????????????

8
???????????????
  • ???????K?????,?????????,??????,?????????k-1???????
    ????????????????????p, ????????q??????????????
  • ?????????????????,????????????????
  • ?????????Static (at compile time) ?????
    Dynamic (at runtime)
  • ?????? vs. ??????,????

9
Dynamic Branch Prediction
  • ????????????????????????
  • ?????????
  • ??????????
  • ??????????????,???????
  • ????
  • ??BPB(Branch Prediction Buffer)?BHT(Branch
    History Table)???
  • 1-bit BHT?2-bit BHT
  • Correlating Branch Predictors
  • Tournament Predictors Adaptively Combining Local
    and Global Predictors
  • High Performance Instruction Delivery
  • BTB
  • Integrated Instruction Fetch Units
  • Return Address Predictors
  • Performance ƒ(accuracy, cost of misprediction)
  • Misprediction ? Flush Reorder Buffer

10
1-bit BHT
  • Branch History Table ?????PC?????1-bit BHT
  • ?????????????
  • ??????
  • ?????????10?,??????9?,1????,????????????????,????
    ??????????
  • ???? vs. ????
  • ?? ??????, 1-bit BHT ???2???????(avg is 9
    iteratios before exit)
  • ??????, ????????,?????????
  • ?????,?????????,?????????

11
2-bit BHT
  • ???? 2???????
  • Red stop, not taken
  • Green go, taken

NT
12
(No Transcript)
13
(No Transcript)
14
BHT Accuracy
  • ?????????
  • ????
  • ????PC?????BHT?,?????????????
  • BHT??????
  • 4096 ?????????????1 (nasa7, tomcatv) to 18
    (eqntott), spice at 9 and gcc at 12
  • ?????,??????????????(in Alpha 21164)

15
Correlating Branch Predicator
  • ??
  • if (aa2) aa0
  • if (bb2) bb0
  • if (aa!bb)
  • ???DLX
  • SUBI R3,R1,2
  • BNEZ R3,L1 branch b1
    (aa!2)
  • ADDI R1,R0,R0 aa0
  • L1 SUBI R3,R2,2
  • BNEZ R3,L2 branch
    b2(bb!2)
  • ADDI R2,R0,R0 bb0
  • L2 SUBI R3,R1,R2 R3aa-bb
  • BEQZ R3,L3 branch b3
    (aabb)

16
Correlating Branches
  • ????b3 ???b2 ?b1?????b1?b2?????,?b3?????
  • Correlating predictors ? ????????????????????????
    ?
  • ??????????????????????
  • if (d0)d1
  • if (d1) d0
  • ???DLX
  • BNEZ R1,L1
    branch b1(d!0)
  • ADDI R1,R0,1
    d0, so d1
  • L1 ADDI R3,R1,-1
  • BNEZ R3,L2
    branch b2(d!1)
  • ...
  • L2

17
???????????
  • ??d???????0,1,2
  • b1 ??????,b2????????
  • ????????????????????,???????????

18
  • ??d?????2?0?????
  • ?1-bit???,?????????,T??????,NT???????
  • ?????????????,?????100

19
Correlating Branches
  • ?????1???correlation????????????????????????????
    ??????????????,??????????????????????????
  • ?????????????????????????
  • ?? (1,1)??????????????????,??????????????????

20
  • Correlating ???????????,
  • ????????d2?,????,???????
  • ??(1,1)???,?????????????????1-bit????????
  • ???????(m, n),??????m???,?2m????????????,?????????
    n

21
Correlating Branches
  • (2,2) predictor 2-bit global, 2-bit local

22
(No Transcript)
23
Tournament Predictors
  • Tournament predictors
  • ???????
  • ????????
  • ????????
  • ???????
  • Use the predictor that tends to guess correctly

history
addr
Predictor B
Predictor A
24
Tournament Predictor in Alpha 21264
  • Selector4K 2-bit ???????????????????
  • Global predictor ??4K?,?????12???????????,??????2
    -bit???
  • 12-bit pattern ith bit 0 gt ith prior branch not
    taken ith bit 1 gt ith prior branch taken
  • Local predictor ?????
  • Top level 1024?10-bit ???,??10-bit?????10????????
  • Next level ?Top level?????10-bit????1K
    ??????,???3-bit saturating counter ???????
  • Total size 4K2 4K2 1K10 1K3 29K
    bits!
  • (180,000 transistors)

25
of predictions from local predictor in
Tournament Scheme
26
Accuracy of Branch Prediction
fig 3.40
  • Profile branch profile from last
    execution(static in that in encoded in
    instruction, but profile)

27
Accuracy v. Size (SPEC89)
28
Simple dynamic prediction Branch Target Buffer
(BTB)
  • ?????????BTB???,?????????
  • ???????????????,??????????
  • ?????????
  • ???????,?????PC

29
(No Transcript)
30
  • Determine the total branch penalty for a
    branch-target buffer assuming the penalty cycles
    for individual mispredictions from figure 2.24.
    Make the following assumptions about the
    prediction accuracy and hit rate
  • Prediction accuracy is 90 (for instructions in
    the butter)
  • Hit rate in the buffer is 90 (for branches
    predicted taken)

Ans. P1 9010 P2 10 BP
(0.090.1)2 0.38
31
Review ???????????
  • ????????????????????????
  • ????????
  • ??????????
  • ??????????????,???????
  • ????
  • 1-bit BHT?2-bit BHT
  • Correlating Branch Predictors
  • Tournament Predictors Adaptively Combining Local
    and Global Predictors
  • High Performance Instruction Delivery
  • BTB
  • Integrated Instruction Fetch Units
  • Return Address Predictors
  • Performance ƒ(accuracy, cost of misprediction)
  • Misprediction ? Flush Reorder Buffer

32
Accuracy of Different Schemes
33
HW support for More ILP
  • ??????????????????????
  • if (x) then A B op C else NOP
  • ???????,?????,??????
  • ??ISA, Alpha, MIPS, PowerPC, SPARC ?ISA ,
    ???????? PA-RISC ???????
  • EPIC 64 1-bit ???????????
  • ??????(conditional instructions)???
  • ?????????,????????
  • ????????,???Stall
  • ???????????,??????????????????

34
????????
  • ??????????????? reorder buffer (ROB)
  • 3 ?? ????,????, ?
  • Reorder buffer ???????? gt ?????????(?RS??)
  • ??????????,?ROB?????RS???
  • ????????
  • ROB?????????????????
  • ???????,????????
  • ??,??????,???????????,??????,??????

35
??????? Tomasulo ??????
  • 1. Issueget instruction from FP Op Queue
  • ??RS?ROB?????????????????ROB???????,??????RS,?????
    ROB??????RS (this stage sometimes called
    dispatch)
  • 2. Executionoperate on operands (EX)
  • ???????,???????????,??CDB,??RAW??
  • 3. Write resultfinish execution (WB)
  • ???????CDB??????????FU??ROB??,??RS??
  • 4. Commitupdate register with reorder result
  • ?ROB????,??????,??????(????),??????ROB????
  • ?????????,??ROB

36
  • LD F6, 34(R2)
  • LD F2, 45(R3)
  • MULT F0, F2, F4
  • SUBD F8, F6, F2
  • DIVD F10, F0, F6
  • ADDD F6, F8, F2

37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
v
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
????????? ?????????RAW??
  • Question ????????,store,load ??????????
  • ????????????? Eg st 0(R2),R5 ld R6,0(R3)
  • ??????????ld?
  • Store????????????????.
  • ?????????????????????.
  • ????
  • No Speculation ???load??,???????? 0(R2) ? 0(R3)
  • Speculation ??????????????? (called dependence
    speculation) ,????????ROB???

58
Hardware Support for Memory Disambiguation
  • ?????????????????????
  • ????(?????)??(????)
  • FIFO ordering ?????????store
  • ?????load???,????store??????.
  • ?load??????,??store??
  • ???load??????????store??,?stall?load??
  • ??load??????store????,?? memory-induced RAW
    hazard
  • ?????? ? ???
  • ?????????? ? ???store???ROB??
  • ?????????
  • ?????store??????,?????WAW ??.

59
Memory Disambiguation
Done?
FP Op Queue
ROB7 ROB6 ROB5 ROB4 ROB3 ROB2 ROB1
Newest
--
LD F4, 10(R3)
N
Reorder Buffer
F2
ST 10(R3), F5
N
F0
LD F0,32(R2)
N
Oldest
--
ltval 1gt
ST 0(R3), F4
Y
Registers
To Memory
Dest
from Memory
Dest
Dest
Reservation Stations
2 32R2
4 ROB3
FP adders
FP multipliers
60
Relationship between precise interrupts and
speculation
  • ??????????
  • Branch prediction, data prediction
  • If we speculate and are wrong, need to back up
    and restart execution to point at which we
    predicted incorrectly
  • This is exactly same as precise exceptions!
  • ????????
  • Need to take our best shot at predicting branch
    direction.
  • If we issue multiple instructions per cycle, lose
    lots of potential instructions otherwise
  • Consider 4 instructions per cycle
  • If take single cycle to decide on branch, waste
    from 4 - 7 instruction slots!
  • ??????????????
  • in-order completion or commit
  • This is why reorder buffers in all new processors

61
(No Transcript)
62
??1/2
  • Reservations stations ??????,??????
  • ?????????
  • ???Scoreboard?????? WAR, WAW ??
  • ?????????
  • ??????(IU??,??????)
  • ??
  • Dynamic scheduling
  • Register renaming
  • Load/store disambiguation
  • 360/91 ? Pentium II PowerPC 604 MIPS R10000
    HP-PA 8000 Alpha 21264??????

63
?? 2/2
  • ???????????????????
  • ??????????WAR? WAW ??
  • Reorder Buffer
  • ????????????
  • ?????????ROB?
  • ??????
  • ???????????????
  • ????????ROB?????????
  • Superscalar ?VLIW CPI lt 1 (IPC gt 1)
Write a Comment
User Comments (0)
About PowerShow.com