William Stallings Computer Organization and Architecture - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

William Stallings Computer Organization and Architecture

Description:

Has become the standard method for implementing high-performance microprocessors ... Direct descendent of IBM 801, RT PC and RS/6000. All are RISC. RS/6000 ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 50
Provided by: adrianjpul6
Category:

less

Transcript and Presenter's Notes

Title: William Stallings Computer Organization and Architecture


1
William Stallings Computer Organization and
Architecture
  • Chapter 14
  • Instruction Level Parallelism
  • and Superscalar Processors

2
Topics
  • Overview
  • Design Issues
  • Pentium 4 and PowerPC

3
Overview - What is Superscalar?
  • Originally refers to a machine designed to
    improve the performance of execution of scalar
    instructions. (Agerwala, 1987)
  • Common instructions (arithmetic, load/store,
    conditional branch) can be initiated and executed
    independently
  • Equally applicable to RISC CISC
  • In practice usually RISC
  • Has become the standard method for implementing
    high-performance microprocessors

4
Why Superscalar?
  • Most operations are on scalar quantities (see
    RISC notes)
  • Improve these operations to get an overall
    improvement

5
Idea...
  • If we could have more than one pipelines
  • Ability to execute instructions independently in
    different pipelines
  • Instructions can be executed in an order
    different from the program order
  • e.g., a a 2 b c d
  • b c d a a 2
  • Do they give the same result?
  • Degree of parallelism goes up as more
    instructions are executed in parallel

6
General Superscalar Organization
  • 2 integer, 2 FP, and 1 memory operations can be
    executed at the same time

7
Superpipelined
  • Many pipeline stages need less than half a clock
    cycle
  • Double internal clock speed gets two tasks per
    external clock cycle
  • Degree of superpipelining
  • Number of substages
  • Speed goes up as degree of superpipelining
  • Superscalar allows parallel fetch execute

8
Superscalar vsSuperpipeline
9
Example
  • 6 instructions, 4 stages
  • No pipelining ___ time units
  • Basic pipelining ___ time units
  • Degree of superpipelining 2 ___ time units
  • Degree of superscalar 2 ___ time units
  • Try generalizing it w/ n instructions and k
    stages

10
Limitations of Superscalar Approach
  • Instruction level parallelism
  • Degree of of instructions that can be executed in
    parallel
  • Can be maximized by
  • Compiler-based optimisation
  • Hardware techniques
  • Limited by
  • True data dependency
  • Procedural dependency
  • Resource conflicts
  • Output dependency
  • Antidependency

11
True Data Dependency
  • Example
  • ADD r1, r2 //r1 ? r1r2
  • MOVE r3, r1 //r3 ? r1
  • Can fetch and decode second instruction in
    parallel with first
  • Can NOT execute second instruction until first is
    finished
  • Flow dependency or write-read dependency

12
Procedural Dependency
  • Can not execute instructions after a branch in
    parallel with instructions before a branch
  • Also, if instruction length is not fixed,
    instructions have to be decoded to find out how
    many fetches are needed
  • This prevents simultaneous fetches

13
Resource Conflict
  • Two or more instructions requiring access to the
    same resource at the same time
  • e.g. two arithmetic instructions
  • Can duplicate resources
  • e.g. have two arithmetic units

14
Dependencies
15
Dependencies - Examples
  • Example 1
  • LOAD R1 ? R2
  • ADD R3 ? R3, 1 //1 immediate mode
  • ADD R4 ? R4, R2
  • Degree of parallelism __
  • Example 2
  • ADD R3 ? R3, 1
  • ADD R4 ? R3, R2
  • STORE R4 ? R0 //R4 register indirect
  • Degree of parallelism __

MM
R0

R4

16
Design Issues
  • Instruction level parallelism
  • Instructions in a sequence are independent
  • Execution can be overlapped
  • Governed by data and procedural dependency
  • Machine Parallelism
  • Ability to take advantage of instruction level
    parallelism
  • Governed by number of parallel pipelines, i.e.,
    number of instructions that can be fetched and
    executed at a time

17
Instruction Issue Policy (1)
  • Instruction issue
  • Process of initiating instruction execution in
    the processors functional units
  • Instruction-issue policy
  • Protocol used to issue instructions
  • Processor tries to look ahead to locate
    instructions that can be brought into pipeline
    and executed

18
Instruction Issue Policy (2)
  • Important orderings in which
  • instructions are fetched
  • instructions are executed
  • instructions change registers and memory
  • Order(s) changed to optimize performance
  • Constraint result must be CORRECT!
  • Three categories of instruction-issue policies
  • In-order issue, in-order completion
  • In-order issue, out-of-order completion
  • Out-of-order issue, out-of-order completion

19
In-Order Issue In-Order Completion
  • Issue instructions in the order they occur
  • Not very efficient
  • May fetch gt1 instruction
  • Instructions must stall if necessary

20
In-Order Issue In-Order Completion (Diagram)
Pipeline
Time
Takes 2 cycles
F.D.
D.D.
F.D.
3 Functional Units
Assumptions
21
In-Order Issue Out-of-Order Completion (1)
  • Any number of instructions may be in the
    execution stage at a time
  • Up to maximum degree of machine parallelism
  • Instruction issuing is stalled by resource
    conflicts, data or procedural dependencies
  • Output dependency must be solved

22
In-Order Issue Out-of-Order Completion (2)
  • Output dependency - Example
  • R3 ? R3 R5 (I1)
  • R4 ? R3 1 (I2)
  • R3 ? R5 1 (I3)
  • I2 depends on result of I1 - data dependency
  • If I3 completes before I1, the result from I1
    will be wrong ? output dependency
  • Write-write dependency

23
In-Order Issue Out-of-Order Completion (Diagram)
Pipeline
Time
24
Out-of-Order IssueOut-of-Order Completion
  • Decouple decode from execution by a buffer
    (instruction window) that stores decoded
    instruction
  • Can continue to fetch and decode until this
    buffer is full
  • When a functional unit becomes available an
    instruction can be executed
  • Since instructions have been decoded, processor
    can look ahead

25
Out-of-Order Issue Out-of-Order Completion
(Diagram)
Pipeline
26
Antidependency
  • Read-write dependency
  • R3 ? R3 R5 (I1)
  • R4 ? R3 1 (I2)
  • R3 ? R5 1 (I3)
  • R7 ? R3 R4 (I4)
  • I3 cannot complete before I2 starts as I2 needs a
    value in R3 and I3 changes R3

27
Register Renaming (1)
  • Output dependencies and antidependencies occur
    because register contents may not reflect the
    correct ordering from the program
  • May result in a pipeline stall
  • Solution allocate registers dynamically
  • i.e. registers are not specifically named

28
Register Renaming (2)
  • By processor hardware
  • Associated with values needed by instructions at
    various points in time
  • When a new register value is created a new
    register is allocated for that
  • Subsequent instructions that accessing that value
    as a source must do renaming

29
Register Renaming example
  • I1 R3 ? R3 R5 I1 R3b ? R3a R5a
  • I2 R4 ? R3 1 I2 R4a ? R3b 1
  • I3 R3 ? R5 1 I3 R3c ? R5a 1
  • I4 R7 ? R3 R4 I4 R7b ? R3c R4a
  • Q where are the dependencies?
  • Without subscript refers to logical register in
    instruction
  • With subscript is hardware register allocated
  • Write-write and read-write dependencies are gone!
  • Q Can we get rid of write-read dependencies?

30
Machine Parallelism
  • Three hardware techniques to enhance performance
  • Duplication of Resources
  • Out of order issue
  • Renaming
  • Not worth duplication functions without register
    renaming
  • Need instruction window large enough (more than 8)

31
Branch Prediction
  • 80486 fetches both next sequential instruction
    after branch and branch target instruction
  • ? Gives two cycle delay if branch taken
  • Pre-RISC technique

32
RISC - Delayed Branch
  • Calculate result of branch before unusable
    instructions pre-fetched
  • ? Always execute single instruction immediately
    following branch
  • ? Keeps pipeline full while fetching new
    instruction stream
  • Not as good for superscalar, as
  • Multiple instructions need to execute in delay
    slot
  • ? Instruction dependence problems
  • Revert to branch prediction

33
Superscalar Execution
34
Superscalar Implementation
  • Simultaneously fetch multiple instructions
  • Logic to determine true dependencies involving
    register values
  • Mechanisms to communicate these values
  • Mechanisms to initiate multiple instructions in
    parallel
  • Resources for parallel execution of multiple
    instructions
  • Mechanisms for committing process state in
    correct order

35
Pentium 4
  • 80486 - CISC
  • Pentium some superscalar components
  • Two separate integer execution units
  • Pentium Pro Full blown superscalar
  • Subsequent models refine enhance superscalar
    design

36
Pentium 4 Block Diagram
37
Pentium 4 Operation
  • Fetch instructions form memory in order of static
    program
  • Translate instruction into one or more fixed
    length RISC instructions (micro-operations)
  • Execute micro-ops on superscalar pipeline
  • micro-ops may be executed out of order
  • Commit results of micro-ops to register set in
    original program flow order
  • Outer CISC shell with inner RISC core
  • Inner RISC core pipeline at least 20 stages
  • Some micro-ops require multiple execution stages
  • Longer pipeline
  • c.f. five stage pipeline on x86 up to Pentium

38
Pentium 4 Pipeline
39
Pentium 4 Pipeline Operation (1)
40
Pentium 4 Pipeline Operation (2)
41
Pentium 4 Pipeline Operation (3)
42
Pentium 4 Pipeline Operation (4)
43
Pentium 4 Pipeline Operation (5)
44
Pentium 4 Pipeline Operation (6)
45
PowerPC
  • Direct descendent of IBM 801, RT PC and RS/6000
  • All are RISC
  • RS/6000 first superscalar
  • PowerPC 601 superscalar design similar to RS/6000
  • Later versions extend superscalar concept

46
PowerPC 601 General View
47
PowerPC 601 Pipeline Structure
48
PowerPC 601 Pipeline
49
Required Reading
  • Stallings chapter 14
  • Manufacturers web sites
  • IMPACT web site
  • research on predicated execution
Write a Comment
User Comments (0)
About PowerShow.com