Instruction Set Architectures: RISC, CISC, - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Instruction Set Architectures: RISC, CISC,

Description:

Multi-pass structure easy to write bug-free compilers ... Use graph coloring (graph theory) to allocate registers. NP-complete ... – PowerPoint PPT presentation

Number of Views:4413
Avg rating:5.0/5.0
Slides: 59
Provided by: mot112
Category:

less

Transcript and Presenter's Notes

Title: Instruction Set Architectures: RISC, CISC,


1
Instruction Set Architectures RISC, CISC,
64-bit Processors
2
CISC (Complex Instruction Set Computers)
3
The Rationale for CISC
  • One of the most visible forms of evolution
    associated with computers is that of programming
    languages
  • As the cost of hardware has dropped, the relative
    cost of software has risen.
  • Complexity of modern software has increased the
    prevalence of faults (bugs).
  • Thus, the major cost in the lifecycle of a system
    is software, not hardware.

4
The Rationale for CISC
  • The response from researchers and industry has
    been to develop ever more powerful and complex
    high-level languages.
  • These high-level languages (HLL) allow the
    programmer to express algorithms more concisely,
    take care of much of the detail, and naturally
    support structured programming and
    object-oriented design.
  • This solution gave rise to another problem, known
    as the semantic gap. This is the difference
    between the operations provided in HLLs and those
    provided in computer architecture.

5
The Rationale for CISC
  • Symptoms of this gap include
  • Execution inefficiency
  • Excessive program size
  • Compiler complexity
  • Designers responded with architectures intended
    to close this gap. Key feature include
  • Large instruction sets
  • Dozens of addressing modes
  • Various HLL statements implemented in hardware.

6
The Rationale for CISC
  • Such complex instruction sets are intended to
  • Ease the task of the compiler writer
  • Improve execution efficiency, because complex
    sequences of operations can be implemented in
    microcode
  • Provide support for even more complex and
    sophisticated HLLs.

7
Motivations for CISC
  • Compiler simplification.
  • The task of the compiler writer is to generate
    machine instructions for each HLL statement. If
    there are machine instructions that resemble HLL
    statements, this task is simplified.
  • This reasoning has been disputed by RISC
    researchers. They have found that CISC
    instructions are often hard to exploit because
    the compiler must find those cases that exactly
    fit the construct.
  • The task of optimizing the generated code to
    minimize code size, reduce instruction execution
    count, and enhance pipelining is much more
    difficult with a complex instruction set.
  • Most of the instructions in a compiled program
    are the relatively simple ones.

8
Motivations for CISC
  • Smaller programs.
  • Because the program takes up less memory, there
    is a savings in that resource. Memory today is
    relatively inexpensive, so this advantage is no
    longer compelling.
  • Smaller programs should improve performance.
    This will happen in two ways
  • Fewer instructions means fewer instruction bytes
    to be fetched.
  • In a paging environment, smaller program occupy
    fewer pages, reducing page faults.
  • The problem with this line of reasoning is that
    it is not obvious that a CISC program will be
    smaller than a corresponding RISC program. In
    many cases, the CISC program, expressed a in
    symbolic machine language, may be shorter (i.e.
    fewer instructions) but the number of bits of
    memory occupied may not be noticeably smaller.

9
Motivations for CISC
  • Improved performance.
  • It seems to make sense that a complex HLL
    operation will execute more quickly as a single
    machine instruction than as a set of more
    primitive instructions.
  • Because of the bias toward the simpler
    instructions, this may not be so.
  • The entire control unit must be made more
    complex, and/or the microprogram control store
    must be made larger to accommodate a richer
    instruction set. Both of these factors increase
    the execution time of the simple instructions.

10
Motivations for CISC
  • It is far from clear that CISC is the appropriate
    solution. This has led a number of groups to
    pursue the opposite path.

11
RISC (Reduced Instruction Set Computers)
12
The Rationale for RISC
  • Meanwhile, a number of studies have been done to
    determine the characteristics and patterns of
    execution of machine instructions generated from
    HLL programs.
  • The results of these studies inspired some
    researchers to look for a different approach.
  • Namely, to make the architecture that supports
    the HLL simpler, rather than more complex.

13
RISC
  • RISC systems have been defined and designed in a
    variety of ways, the key elements shared by most
    designs are
  • A limited and simple instruction set.
  • A large number of general-purpose registers, and
    the use of compiler technology to optimize
    register usage.
  • An emphasis on optimizing the instruction
    pipeline.

14
Characteristics of RISC Architectures
  • Although there are a variety of approaches taken
    to RISC architectures, certain characteristics
    are common to all of them
  • One instruction per cycle RISC machine
    instructions comprise only one cycle of fetch,
    execute, store. With simple, one-cycle
    instructions, there is no need for microcode (as
    in CISC) machine instructions can be hardwired.
    Such instructions should execute faster than
    comparable machine instructions on CISC machines,
    as it is not necessary to access a microprogram
    control store.
  • Register-to register operation If most register
    operations are register-to-register, this
    simplifies the instruction set and therefore the
    control unit. For example, a RISC instruction
    set may only include one or two ADD instructions
    the VAX has 25 different ADD instructions. This
    also encourages the optimization of register use.

15
Characteristics of RISC Architectures
  • Simple addressing modes Almost all RISC
    instructions use simple register addressing.
    Complex addressing modes can be synthesized in
    software from simple ones. Again, this design
    feature simplifies the instruction set and the
    control unit.
  • Simple instruction formats - Generally, only one
    or a few formats are used. Instruction length is
    fixed and aligned on word boundaries. Field
    locations, especially the opcode, are fixed.
    This generates a number of benefits
  • With fixed fields, opcode decoding and register
    operand accessing can occur simultaneously.
  • Simplified formats simplify the control unit.
  • Instruction fetching is optimized because
    word-length units are fetched.
  • Alignment on word boundary also means that a
    single instruction does not cross page
    boundaries.

16
Potential Benefits of RISC
  • These characteristics can be assessed to
    determine the potential benefits of RISC. These
    benefits fall into two main categories
    performance and VLSI implementation.

17
Performance
  • More effective optimizing compilers can be
    developed. With more primitive instructions,
    there are more opportunities for moving functions
    out of loops, reorganizing code for efficiency,
    maximizing register utilization, etc.
  • With simple instructions (and little or no
    microcode), a relatively simple control unit
    required. It is likely that a simple control
    unit could be made to execute faster than a more
    complex one.
  • Instruction pipelining. RISC researchers feel
    that the instruction pipelining technique can be
    applied much more effectively with a reduced
    instruction set.

18
VLSI Implementation
  • Chip real estate a CISC processor typically
    devotes about half of its area to the control
    unit. A RISC processor typically uses only about
    10 of the area for the control unit, using
    precious real estate for registers instead.
  • Design and implementation time. The simple
    control unit and circuitry of RISC result in
    faster design cycles.

19
CISC vs. RISC Characteristics
  • RISC vs. CISC controversy is now 20 years old.
  • After the initial enthusiasm for RISC machines,
    there has been a growing realization that
  • RISC designs may benefits from the inclusion of
    some CISC features, and
  • Vice-versa.
  • The result is that more recent RISC design,
    PowerPC and PSARC, are no longer "pure" RISC and
    the more recent CISC designs, notably the
    Pentium, incorporate core RISC characteristics.

20
Example CISC ISA Intel X86,386/486/Pentium
  • Operand sizes
  • Can be 8, 16, 32, 48, 64, or 80 bits long.
  • Also supports string operations.
  • Instruction Encoding
  • The smallest instruction is one byte.
  • The longest instruction is 12 bytes long.
  • The first bytes generally contain the opcode,
    mode specifiers, and register fields.
  • The remainder bytes are for address displacement
    and immediate data.
  • 12 addressing modes
  • Register.
  • Immediate.
  • Direct.
  • Base.
  • Base Displacement.
  • Index Displacement.
  • Scaled Index Displacement.
  • Based Index.
  • Based Scaled Index.
  • Based Index Displacement.
  • Based Scaled Index Displacement.
  • Relative.

21
Example RISC ISA PowerPC
  • Operand sizes
  • Four operand sizes 1, 2, 4 or 8 bytes.
  • Instruction Encoding
  • Instruction set has 15 different formats with
    many minor variations.
  • All are 32 bits in length.
  • 8 addressing modes
  • Register direct.
  • Immediate.
  • Register indirect.
  • Register indirect with immediate index (loads and
    stores).
  • Register indirect with register index (loads and
    stores).
  • Absolute (jumps).
  • Link register indirect (calls).
  • Count register indirect (branches).

22
Example RISC ISA HP Precision
Architecture, HP-PA
  • Operand sizes
  • Five operand sizes ranging in powers of two from
    1 to 16 bytes.
  • Instruction Encoding
  • Instruction set has 12 different formats.
  • All are 32 bits in length.
  • 7 addressing modes
  • Register
  • Immediate
  • Base with displacement
  • Base with scaled index and displacement
  • Predecrement
  • Postincrement
  • PC-relative

23
Example RISC ISA
SPARC
  • Operand sizes
  • Four operand sizes 1, 2, 4 or 8 bytes.
  • Instruction Encoding
  • Instruction set has 3 basic instruction formats
    with 3 minor variations.
  • All are 32 bits in length.
  • 5 addressing modes
  • Register indirect with immediate displacement.
  • Register inderect indexed by another register.
  • Register direct.
  • Immediate.
  • PC relative.

24
Example RISC ISA Compaq Alpha AXP
  • 4 addressing modes
  • Register direct.
  • Immediate.
  • Register indirect with displacement.
  • PC-relative.
  • Operand sizes
  • Four operand sizes 1, 2, 4 or 8 bytes.
  • Instruction Encoding
  • Instruction set has 7 different formats.
  • All are 32 bits in length.

25
Which is winning?
  • It turns out to be a non-issue.
  • Intel clearly can get their machines to run fast
    (3 Giga-Hertz)
  • How?
  • By making the microarchitecture RISC-like and
    converting CISC to RISC during decode.

26
Another Example
  • Transmeta Crusoe
  • Unknown architecture
  • You cant buy the chip without the software
  • Converts IA-32 to intermediate machine ISA
  • Executes that machine ISA

27
64-Bit Processors
28
32-bit Computing
  • In computer architecture, a word is defined as a
    unit of data that can be addressed and moved
    between the computer processor and the storage
    area.
  • In 32-bit computing a word is 32 bits.
  • Usually, the defined bit-length of a word is
    equivalent to the width of the computer's data
    bus (and registers) so that a word can be moved
    in a single operation from the storage to the
    processor registers

29
32-bit Computing
  • In a 32-bit microprocessor
  • There are 32-bit general purpose registers in
    the processor.
  • There are 232 4GB memory to be addressed.

30
64-bit Computing
  • The best and simple definition is enhancing the
    processing word in the architecture to 64 bits.
  • The addressable memory increases from 4 GB to 264
    18 billion GB
  • Size of registers extended to 64 bits
  • Integer and address data up to 64 bits in length
    can now be operated on
  • 264 1.8 x 1019 integers can be represented with
    64 bits vs. 4.3 x 109 with 32 bits
  • Dynamic range has increased by a factor of 4.3
    billion!

31
64-bit Processor Basics
  • Stepping up from 32 to 64 bits does not mean
    doubling performance
  • Certain applications will benefit, others will not

32
What Applications Can Benefit Most From 64-bit?
  • Large databases
  • Business and scientific simulation and modeling
    programs
  • Highly graphics-intensive software (CAD, 3-D
    games)
  • Cryptography
  • Etc.

33
Benefits of 64-bit Computing
  • Allowing applications to store vast amount of
    data in main memory.
  • Allowing complex calculations with a high-level
    precision.
  • Manipulating data and executing instructions in
    chunks that are twice as large as in 32-bit
    computing.

34
Intel Strategy
35
Intels Approach to the Market
  • Only producing a 64-bit processor for servers and
    workstations Itanium
  • It believes there is not currently enough market
    demand for 64-bit in PCs
  • There is still room to continue to improve
    Pentium 4 for desktop customers

36
64-bit Computing Two industry std architectures
different usages
Intels highest performance, most reliable server
platform for RISC replacement
The platform of choice just got better
X86
EPIC
  • Broadest Software choice
  • Versatile 32 and 64-bit support
  • Enterprise proven
  • High-end Performance
  • Reliability/data integrity
  • OS, HW, SW choice

37
Migration To 64-bit
Validate IA32 binaries to run on 64-bit OS
Step 1
OK ?
no
yes
64-bit code clean
Step 2
Compile
Optimize For X86
Optimize For EPIC
Step 4
Step 4
Step 3
38
The Move to Intel Architecture 64-bit and
Multi-core
39
AMD Strategy
40
AMDs Approach
  • Provide a bridge between the 32-bit present and
    the 64-bit future
  • Design processors for the server, workstation,
    and personal computing markets
  • Beyond 64 bits improve interaction of processor
    with memory and I/O

41
Windows for x64-based Systems32-bit and 64-bit
on a single platform
  • An AMD64-based Processor can run both 32- and
    64-bit Windows operating systems

START
BOOT UP Using 32 bit BIOS
Look at OS
Load 32 bit OS
Load 64 bit OS
32-bit
64-bit
Run 32 bit Applications
Run 32 64 bit apps
42
Before AMD64 Computing infrastructure
islands on either side of the wall
Platform A
Platform B
32-Bit Native Only System
64-Bit Native Only System
43
AMDs Industry VisionCompatible systems that
bridge from 32- to 64-bit
AMD Single Platform
  • Leverages existing infrastructure
  • Runs existing 32-bit applications natively with
    unsurpassed performance
  • No tools or O/S work needed
  • Runs existing 32-bit applications on 64-bit O/S
  • Take full advantage of 4GB local memory
  • Allows customers to migrate to 64-bit performance
    according to their schedule
  • Low learning curve for users and support staff

44
The Role of Compilers
45
Compiler and ISA
  • ISA decisions are no more just for programming
    assembly language (AL) easily
  • Due to HLL, ISA is a compiler target today
  • Performance of a computer will be significantly
    affected by compiler
  • Understanding the compiler technology today is
    critical to designing and efficiently
    implementing an instruction set
  • Architecture choice affects the code quality and
    the complexity of building a compiler for it

46
Goal of the Compiler
  • Primary goal is correctness
  • Second goal is speed of the object code
  • Others
  • Speed of the compilation
  • Ease of providing debug support
  • Inter-operability among languages
  • Flexibility of the implementation - languages may
    not change much but they do evolve - e. g.
    Fortran 66 HPF

Make the frequent cases fast and the rare case
correct
47
Typical Modern Compiler Structure
Common Intermediate Representation
Somewhat language dependentLargely machine
independent
Small language dependentSlight machine dependent
Language independentHighly machine dependent
48
Typical Modern Compiler Structure (Cont.)
  • Multi-pass structure ? easy to write bug-free
    compilers
  • Transform HL, more abstract representations, into
    progressively low-level representations,
    eventually reaching the instruction set
  • Compilers must make assumptions about the ability
    of later steps to deal with certain problems
  • Ex. 1 choose which procedure calls to expand
    inline before they know the exact size of the
    procedure being called
  • Ex. 2 Global common sub-expression elimination
  • Find two instances of an expression that compute
    the same value and saves the result of the first
    one in a temporary
  • Temporary must be register, not memory
    (Performance)
  • Assume register allocator will allocate temporary
    into register

49
Optimization Types
  • High level - done at source code level
  • Procedure called only once - so put it in-line
    and save CALL
  • Local - done on basic sequential block
    (straight-line code)
  • Common sub-expressions produce same value
  • Constant propagation - replace constant valued
    variable with the constant - saves multiple
    variable accesses with same value
  • Global - same as local but done across branches
  • Code motion - remove code from loops that compute
    same value on each pass and put it before the
    loop
  • Simplify or eliminate array addressing
    calculations in loop

50
Optimization Types (Cont.)
  • Register allocation
  • Use graph coloring (graph theory) to allocate
    registers
  • NP-complete
  • Heuristic algorithm works best when there are at
    least 16 (and preferably more) registers
  • Processor-dependent optimization
  • Strength reduction replace multiply with shift
    and add sequence
  • Pipeline scheduling reorder instructions to
    minimize pipeline stalls
  • Branch offset optimization Reorder code to
    minimize branch offsets

51
Register Allocation
  • One the most important optimizations
  • Based on graph coloring techniques
  • Construct graph of possible allocations to a
    register
  • Use graph to allocate registers efficiently
  • Goal is to achieve 100 register allocation for
    all active variables.
  • Graph coloring works best when there are at least
    16 general-purpose registers available for
    integers and more for floating-point variables.

52
Constant propagation a 5 ... // no change to
a so far. if (a b) . . . The
statement (a b) can be replaced by (5 b).
This could free a register when the comparison is
executed. When applied systematically, constant
propagation can improve the code significantly.
53
Strength reduction Example for (j 0 j n
j) Aj 2j for (i 0 4i A4i 0 An optimizing compiler can replace
multiplication by 4 by addition by 4. This is an
example of strength reduction. In general, scalar
multiplications can be replaced by additions.
54
Major Types of Optimizations and Example in Each
Class
55
Change in IC Due to Optimization
  • Level 1 local optimizations, code scheduling,
    and local register allocation
  • Level 2 global optimization, loop transformation
    (software pipelining), global register allocation
  • Level 3 procedure integration

56
How can Architects Help Compiler Writers
  • Provide Regularity
  • Address modes, operations, and data types should
    be orthogonal (independent) of each other
  • Simplify code generation especially multi-pass
  • Counterexample restrict what registers can be
    used for a certain classes of instructions
  • Provide primitives - not solutions
  • Special features that match a HLL construct are
    often un-usable
  • What works in one language may be detrimental to
    others

57
How can Architects Help Compiler Writers (Cont.)
  • Simplify trade-offs among alternatives
  • How to write good code? What is a good code?
  • Metric IC or code size (no longer true) ?caches
    and pipeline
  • Anything that makes code sequence performance
    obvious is a definite win!
  • How many times a variable should be referenced
    before it is cheaper to load it into a register
  • Provide instructions that bind the quantities
    known at compile time as constants
  • Dont hide compile time constants
  • Instructions which work off of something that the
    compiler thinks could be a run-time determined
    value hand-cuffs the optimizer

58
Short Summary -- Compilers
  • ISA has at least 16 GPR (not counting FP
    registers) to simplify allocation of registers
    using graph coloring
  • Orthogonality suggests all supported addressing
    modes apply to all instructions that transfer
    data
  • Simplicity understand that less is more in ISA
    design
  • Provide primitives instead of solutions
  • Simplify trade-offs between alternatives
  • Dont bind constants at runtime
  • Counterexample Lack of compiler support for
    multimedia instructions
Write a Comment
User Comments (0)
About PowerShow.com