Computer Logic Design - PowerPoint PPT Presentation

Loading...

PPT – Computer Logic Design PowerPoint presentation | free to download - id: 6e989d-MzBjN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Computer Logic Design

Description:

Title: Slide 1 Subject: Computer Logic Design Author: Taeweon Suh Last modified by: Taeweon Suh Created Date: 8/14/2004 10:46:03 PM Document presentation format – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 57
Provided by: Tae64
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Computer Logic Design


1
COM515 Advanced Computer Architecture
Lecture 5. Dynamic Scheduling II
Prof. Taeweon Suh Computer Science
Education Korea University
2
Modern Processors
  • Branch Prediction results in speculative
    execution
  • Speculative instructions (if wrongly speculated)
    must not alter the architecture states
  • Architecture Registers
  • Memory
  • Requirement of precise exception/interrupts

Prof. Sean Lees Slide
3
Modern Out-of-Order Core
Reservation Station issues instructions to
functional units
Allocate instructions
Reorder Buffer maintains state information
(physical registers) for precise interrupts and
speculative execution
ROB
Architectural register file
LSQ
Register Alias Table renames architecture
registers
Load Store Queue maintains memory access ordering
Prof. Sean Lees Slide
4
Register Renaming
Architectural Registers
R0
R1
R2
R3
R4
R5
R6
R7
No False Dependencies!
Sandy Bridge 160 PRs for INT 144 PRs for FP
Adapted from Prof. G. Lohs Slides
5
Register Renaming
Dest Src1 op Src2
Mapping Mechanism
Src1 ? TagS1
Src2 ? TagS2
TagS1 op TagS2
TagD
Repeat for each instruction
Adapted from Prof. G. Lohs Slides
6
Register Alias Table (RAT)
  • Use a lookup table for renaming
  • One entry per architectural register
  • Each entry maps to the most recent version of the
    architectural register, could be in
  • Physical register file
  • Architectural register file

Prof. Sean Lees Slide
7
RAT Example
Free Physical Regs
T13, T14, T15, T16
R1 R2 R3
T13 R2 R3
T14, T15, T16
R5 R4 R1
T14 R4 T13
R1 R1 R5
T15, T16
T15 T13 T14
R2 R5 / R1
T16
T16 T14 / T15
Adapted from Prof. G. Lohs Slides
8
Superscalar Rename
T16 T39 T14 T5
R1 R2 R3 R4 R5 R7 R3 R0 / R2 R5 Ld
12R6
RAT
T23 T7 T16 X
Dont rename immediates
For N-wide superscalar 2N RAT read-ports N RAT
write-ports
Prof. Sean Lees Slide
9
Intra-Group Dependencies
T16 T39 T14 T5
R2 R2 R3 R4 R5 R7 R3 R0 / R2 R5 Ld
12R6
RAT
T23 T7 T16 X
T10 T31 T19 T6
From free register pool
Prof. Sean Lees Slide
10
Intra-Group Dependencies
R1 R2 R1 R2 R1 R2 R1 R2 / R1 R1 R2 gtgt
R1
T16 T34 T34 T16 T16 T34 T16 T34
RAT
Correct final renamed registers
Modified from Prof. Sean Lees Slide
11
Resolving Intra-Group Dependencies
Intra-Group Dependency Checker
Inst 0
Inst 1
Inst 2
Inst 3
RAT
T0L
T0R
Src L
Src R
Dest
T1L
T1R
T2L
T2R
From free register pool
T3L
T3R
Pdst0
Pdst1
Pdst2
Adapted from Prof. G. Lohs Slides
12
Intra-Group Dependency Checking
Pdst0
dst0
Pdst1
dst1
Pdst2
dst2
Adapted from Prof. G. Lohs Slides
13
Mapping Selection
R1 R2 R1 R2 R1 R2 R1 R2 / R1 R1 R2 gtgt
R1
Only this mapping for R1 should be written into
the RAT
Condition use mapping if instruction is
last writer to the register
Adapted from Prof. G. Lohs Slides
14
Issue with Imprecise Interrupt
lw r5, 8(r10) add r10, r9, r8 add r12,
r10, r7
  • add instructions take one cycle
  • E.g.,
  • Load (left side) induces a data page fault
  • If out-of-order completion is allowed
  • R10 and r12 will be modified
  • Wrong values will be used by the re-issued load
  • Interrupt classes
  • Program interrupts (exceptions or traps)
  • External interrupts (asynchronous)

Modified from Prof. Sean Lees Slide
15
Precise Interrupts
  • To reflect a sequential architecture model ?
    Serially correct (think about a single issue,
    non-pipelined processor)
  • Keep Precise State of an execution
  • All instructions before the interrupted
    instruction must be completed
  • The state should appear as if no instruction
    issued after the interrupted instruction
  • The interrupted PC should be presented to the
    interrupt handler (restartable)
  • Similar to branch misprediction handling
  • Out-of-order execution makes the ordering hard
  • Undo what comes after an interrupt

Prof. Sean Lees Slide
16
Why Support Precise Interrupts
  • Need to maintain a precise state (for recovery)
  • Software debugging
  • I/O or timer interrupts
  • Virtual memory (page fault)
  • Instruction emulation
  • Virtual machines

Prof. Sean Lees Slide
17
Support Precise Interrupt
  • Buffer results
  • Can reconstruct the scenario (state) as
    sequential execution
  • Restart from saved PC with saved PC state

Prof. Sean Lees Slide
18
Reorder Buffer (ROB) SmithPlezkun85 88
  • Architecture Register File keeps In-order state
  • Reorder Buffer (ROB)
  • A circular buffer
  • Contains all in-flight instructions
  • buffers the Lookahead state
  • In-order allocation/deallocation with head/tail
    pointers
  • When an exception occurs
  • Halt instruction issues
  • Revert to in-order state using RF and discard ROB
    results
  • Also used for branch misprediction recovery
  • Pentium Pro/II/III integrates physical register
    file within ROB
  • Pentium 4 decouples ROB and physical register file

Modified from Prof. Sean Lees Slide
19
ROB (with physical registers)
Exp event
Spec?
Done?
PC
V
Data (physical register)
RegDst
Head (oldest instruction)


Tail (next inst to be allocated)
Prof. Sean Lees Slide
Sandy Bridge 168-entry ROB
20
Handling Precise Interrupts
R1R110
0
11
1
0
0
1
xA004
1
0
0
0000
R2
R2R22
xA008
1
0
0
0000
FR1
FR1FR2/0.0
ARF
R1
1
11


R2
2
1
R3
3
1
R4
4
1
1
R31
Prof. Sean Lees Slide
21
Handling Precise Interrupts
0
xA004
1
0
0
0000
R2
R2R22
xA008
1
0
0
0000
FR1
FR1FR2/0.0
xA00C
R3R31
1
0
0
0000
R3
ARF
R1
1
11


R2
2
1
R3
3
1
R4
4
1
1
R31
Prof. Sean Lees Slide
22
Handling Precise Interrupts
0
xA004
1
0
0
0000
R2
R2R22
xA008
1
0
0
0000
FR1
FR1FR2/0.0
xA00C
R3R31
1
0
1
0000
R3
4
xA010
1
0
0
0000
R4
R4R42
ARF
R1
1
11


R2
2
1
R3
3
1
R4
4
1
1
R31
Prof. Sean Lees Slide
23
Handling Precise Interrupts
0
xA004
1
0
0
0000
R2
R2R22
1
4
xA008
1
0
0
0010
FR1
FR1FR2/0.0
xA00C
R3R31
1
0
1
0000
R3
4
xA010
1
0
1
0000
R4
R4R42
8
xA014
1
0
0
0000
FR4
FR4FR42.0
ARF
R1
1
11


R2
4
2
1
R3
3
1
R4
4
1
1
R31
Prof. Sean Lees Slide
24
Handling Precise Interrupts
0
0
xA008
1
0
0
0010
FR1
FR1FR2/0.0
xA00C
R3R31
1
0
1
0000
R3
4
xA010
1
0
1
0000
R4
R4R42
8
xA014
1
0
0
0000
FR4
FR4FR42.0
ARF
R1
1
11


R2
4
1
R3
3
1
R4
4
1
1
R31
Prof. Sean Lees Slide
25
Handling Precise Interrupts
These values were not committed into RF
0
0
xA008
1
0
0
0010
FR1
FR1FR2/0.0
xA00C
R3R31
1
0
1
0000
R3
4
xA010
1
0
1
0000
R4
R4R42
8
xA014
1
0
0
0000
FR4
FR4FR42.0
ARF
R1
1
11


R2
4
1
R3
3
1
R4
4
Back up PC and current RF
1
1
R31
Depending on the Exception, process will either
abort or instruction will be resumed from this
excepting instruction
Prof. Sean Lees Slide
26
Handling Speculative Execution
R1R110
1
0
0
xB004
1
0
0
0000
BEQ R1,R0,L1
ARF
R1
1


R2
2
1
R3
3
1
R4
4
1
1
R31
Prof. Sean Lees Slide
27
Handling Speculative Execution
R1R110
1
0
0
xB004
1
0
0
0000
BEQ R1,R0,L1
xC100
1
1
1
0000
R2R3ltlt2
12
R2
xC104
1
1
0
0000
R1R2R3
R1
xC108
1
1
0
0000
BEQ R3,R0,L1
xD2B0
1
1
1
0000
R1R71
R1
8
ARF
R1
1


R2
2
1
R3
3
1
R4
4
1
1
R31
BEQ R1, R0, L1 is predicted TAKEN
Modified from Prof. Sean Lees Slide
28
Handling Speculative Execution
BEQ Misprediction
xB004
1
0
0
0000
BEQ R1,R0,L1
xC100
1
1
1
0000
R2R3ltlt2
12
R2
xC104
1
1
0
0000
R1R2R3
R1
xD2AC
1
1
0
0000
BEQ R3,R0,L1
xD2B0
1
1
1
0000
R1R71
R1
8
ARF
R1
11


R2
2
1
R3
3
1
R4
4
1
1
R31
BEQ R1, R0, L1 is resolved, actually NOT TAKEN !!
Prof. Sean Lees Slide
29
Handling Speculative Execution
ARF
R1
11


R2
2
1
R3
3
1
R4
4
1
1
R31
Retire branch, Clear all entries after the
mis-speculated branch
Prof. Sean Lees Slide
30
Handling Speculative Execution
xB008
1
0
0
0000
R2R5ltlt4
R2
ARF
R1
11


R2
2
1
R3
3
1
R4
4
1
1
R31
Continue execution from the correct path (Fall
through in this case)
Prof. Sean Lees Slide
31
RAT Recovery
ARF state corresponds to state prior to oldest
non-committed instruction
ARF
As instructions are processed, the RAT
corresponds to the register mapping after the
most recently renamed instruction
br
RAT
?!?
On a branch misprediction, wrong-path instructions
are flushed from the machine
The RAT is left with an invalid set of mappings
corresponding to the wrong- path instruction state
Adapted from Prof. G. Lohs Slide
32
Solution Stall and Drain
Allow all instructions to execute and commit ARF
corresponds to last committed instruction
ARF
ARF now corresponds to the state right before the
next instruction to be renamed (foo)
br
RAT
X
Reset RAT so that all mappings refer to the ARF
?!?
  • Pros Very simple
  • to implement
  • Cons Performance loss
  • due to stalls

Correct path instructions from fetch cant
rename because RAT is wrong
Resume renaming the new correct- path
instructions from fetch
Prof. Sean Lees Slide
33
Another Solution Checkpointing
At each branch, make a copy of the RAT (register
mapping at the time of the branch)
ARF
br
br
RAT
RAT
Checkpoint Free Pool
RAT
RAT
br
RAT
br
On a misprediction
1. flush wrong-path instructions
2. deallocate RAT checkpoints
3. recover RAT from checkpoint
4. resume renaming
Prof. Sean Lees Slide
34
Modern Instruction Scheduler
  • At dispatch, instruction read all available
    operands from the register files and store a copy
    in the scheduler (Tomasulos algorithm)
  • Unavailable operands will be captured from the
    functional unit outputs (CDB broadcast)
  • When ready, instructions can issue directly from
    the scheduler without reading additional operands
    from any other register files (Wakeup and select)

Fetch Dispatch
Fetch Dispatch
Fetch Dispatch
ARF
PRF/ROB
ARF
PRF/ROB
ARF
Physical register update
Instruction Scheduler
Bypass
Functional Units
Adapted from Prof. G. Lohs Slide
35
Instruction Scheduling Wakeup and Select
  • Wakeup Logic
  • To notify the resolution of data dependency of
    input operands
  • Wake up instructions with zero input dependency
  • Select Logic
  • Choose and fire ready instructions
  • Deal with structure hazard
  • Wakeup-select is likely on the critical path
  • Associative match

Prof. Sean Lees Slide
36
Scalar Scheduler (Issue Width 1)
T14

T39
Select Logic
To Execute Logic
Tag Broadcast Bus
T16

T39

T8
T6

T17

T42
T39


T15
T17

T39
From Prof. G. Lohs Slide
37
Superscalar Scheduler (Issue Width 4)
Tag Broadcast Bus 3..0
T39
Select Logic
To Execute Logic
T8
T39
T6
T42
T17




T39




T17
T15




T39




Snapshot of RS (only 4 entries shown)
Adapted from Prof. G. Lohs Slide
38
Selection Logic
  • Select ready instructions to be issued
  • Goal to reduce the height of DFG
  • Methods
  • Location-based (e.g., leftmost ready first)
  • Allow simple, faster hardware
  • Oldest ready first
  • Can use location-based (in-order issue) with
    compaction
  • Compact the issue window to the left every time
    instructions are issued and by inserting new
    instructions at the right end
  • Can be slow and complex

Prof. Sean Lees Slide
39
Simple Select Logic Implementation
Reservation Station
Leftmost ready first
  • The Enable signal to the root cell is high
    whenever the functional unit is ready to execute
    an instruction
  • The AnyReq signal is raised if any of the input
    Req signals is high

1
Modified from Prof. Sean Lees Slide
Palarchala Dissertation
40
Simple Select Logic Implementation
Reservation Station
1
Prof. Sean Lees Slide
Palarchala Dissertation
41
Simple Select Logic Implementation
Reservation Station
Grant3
Grant3
Req0
Grant0
Req1
Grant1
Req2
Grant02
Req3
Req0
Grant0
Req1
Grant1
Req2
Grant02
Req3
Enable
AnyReq
Enable
AnyReq
Multiple Ready Instruction Request
Grant3
Req0
Grant0
Req1
Grant1
Req2
Grant02
Req3
Enable
AnyReq
1
Prof. Sean Lees Slide
Palarchala Dissertation
42
Simple Select Logic Implementation
Reservation Station
Grant3
Grant3
Req0
Grant0
Req1
Grant1
Req2
Grant02
Req3
Req0
Grant0
Req1
Grant1
Req2
Grant02
Req3
Enable
AnyReq
Enable
AnyReq
Selective Issue for One FU
Grant3
Req0
Grant0
Req1
Grant1
Req2
Grant02
Req3
Enable
AnyReq
1
Prof. Sean Lees Slide
Palarchala Dissertation
43
Issues to Distinctive Functional Units
Distributed Instruction Windows (e.g., MIPS R1000
or Alpha 21264)
Integer Unit
FPU
Faster to have separate instruction schedulers
for different instruction types
Prof. Sean Lees Slide
44
Dual Issues to Multiple Units (e.g., 2 Adders)
Req0
Req1
Req2
Req3
Selection Logic for Adder0
Grant0
Grant1
Grant2
Grant3
Selection Logic for Adder1
Prof. Sean Lees Slide
Palarchala Dissertation
45
Memory Disambiguation
  • Can we undo stores?
  • Stores cannot be committed to memory until they
    are marked ready to retire
  • Completed stores are queued and waiting in a
    store queue or store buffer
  • Disambiguate (and resolve) memory dependency
    dynamically

Prof. Sean Lees Slide
46
Memory Ordering
Source Alpha 21264 HRM
  • Load X bypassing Load X violates certain memory
    consistency model (e.g., sequential consistency)
  • Load-load order trap replays

Prof. Sean Lees Slide
47
Load Store Queue (LSQ)
Age-ordered
ROB
Store Queue
Load Queue
Split LSQ
  • Memory instructions are allocated into LSQ in
    program order
  • LSQ manages memory reference ordering
  • Unified LSQ vs. Split LSQ
  • Sandy Bridge 64 Load buffers, 36 Store buffers

Prof. Sean Lees Slide
48
Issuing a Load for Execution
Issued?
Issued?
age
address
age
address
data
1
A
1
00000001
1
B
1
12340000
1
C
0
FFFF1111
FFFFFF00
Load Queue
Store Queue
  • Each load checks against older stores
  • Associative search
  • A performance issue of scalability

Prof. Sean Lees Slide
49
Issuing a Load for Execution
Issued?
Issued?
age
address
age
address
data
1
A
1
00000001
1
B
1
12340000
1
C
0
FFFF1111
FFFFFF00
Load Queue
Store Queue
  • Implementation dependent comprehensive size
    matching can be prohibitively expensive
  • Simple method forward when a larger store (word)
    precedes a smaller load (half)

Prof. Sean Lees Slide
50
Issuing a Load for Execution
Issued?
Issued?
age
address
age
address
data
1
A
1
00000001
1
B
1
12340000
Speculatively issue for execution
1
C
0
FFFF1111
FFFFFF00
2
???
0
Load Queue
Store Queue
  • Can speculatively issue loads for shortening
    latency (Alpha 21264, Pentium 4 (Prescott))
  • Store, when address ready, checks newer loads in
    the Load Queue
  • Replay needed if speculation turns out to be
    incorrect (e.g. Alphas store-load replay)

Modified from Prof. Sean Lees Slide
51
Store Checks Pre-Mature Loads
Issued?
Issued?
age
address
age
address
data
1
A
1
00000001
1
B
1
12340000
1
C
1
FFFF1111
FFFFFF00
2
K
0
3
K
1
Conflict detected! Replay the load
Load Queue
Store Queue
  • Store, when address ready, checks newer loads in
    the Load Queue
  • Associative Search
  • Replay needed if speculation turns out to be
    incorrect (e.g. Alphas store-load replay)

Prof. Sean Lees Slide
52
Issuing a Store for Execution
Issued?
Issued?
age
address
age
address
data
4
A
1
11000000
6
A
0
0F0F0F0F
6
C
0
00000002
Load Queue
Store Queue
  • Shown above the basic concept
  • Implementation dependent
  • Not allow store bypassing load, since it has
    little impact on performance
  • Perform associative search

Prof. Sean Lees Slide
53
Issuing a Store for Execution
Issued?
Issued?
age
address
age
address
data
4
A
1
11000000
6
A
0
0F0F0F0F
6
C
0
5
C
0
00000002
cannot issue for execution
Load Queue
Store Queue
Prof. Sean Lees Slide
54
Load-Load Ordering
  • Needed for
  • Multiprocessor support
  • Maintaining memory consistency model
  • Load-load trap invoked
  • Trap on the later, conflicted instructions
  • Replay

Issued?
age
address
5
C
1
6
A
1
Load-load trap
Load Queue
54
Prof. Sean Lees Slide
55
  • Backup Slides

56
Issue with Imprecise Interrupt
lw r5, 8(r10) add r10, r9, r8 add r12,
r10, r7
L1 add r3, r1, r2 add r4, r1, r4
add r2, r4, r4
End of Non-Resident Page X
Instruction Page Fault
Start of Resident Page X1
  • add instructions take one cycle
  • E.g.,
  • Load (left side) induces a data page fault
  • Add (right side) induces an instruction page
    fault
  • If out-of-order completion is allowed
  • r10, r12, (or r2, r4) will be modified
  • Wrong values will be used by the re-issued load
  • Interrupt classes
  • Program interrupts (exceptions or traps)
  • External interrupts (asynchronous)

Prof. Sean Lees Slide
About PowerShow.com