Recap - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Recap

Description:

Based on graph coloring techniques. Construct graph of possible allocations ... Graph coloring works best when there are at least 16 general-purpose registers ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 27
Provided by: Ham71
Category:
Tags: recap

less

Transcript and Presenter's Notes

Title: Recap


1
Recap
2
Complex Instruction Set Computing(CISC)
  • CISC older design idea (x86 instruction set is
    CISC)
  • Many (powerful) instructions supported within the
    ISA
  • Upside Makes assembly programming much easier
    (lots of assembly programming in 60-70s)
    compiler is also simpler
  • Upside Reduced instruction memory usage
  • Downside designing CPU is much harder

3
Reduced Instruction Set Computing(CISC)
  • RISC newer concept than CISC (but still old)
  • MIPS, PowerPC, SPARC, all RISC designs
  • Small instruction set, CISC type operation
    becomes a chain of RISC operations
  • Upside Easier to design CPU
  • Upside Smaller instruction set gt higher clock
    speed
  • Downside assembly language typically longer
    (compiler design issue though)
  • Most modern x86 processors are implemented using
    RISC techniques

4
Birth of RISC
  • Roots can be traced to three research projects
  • Berkeley RISC processor (1980, D. Patterson)
  • Stanford MIPS processor (1981, J. Hennessy)
  • Stanford Berkeley projects driven by interest
    in building a simple chip that could be made in a
    university environment
  • Commercialization benefited from these two
    independent projects
  • Berkeley Project -gt began Sun Microsystems
  • Stanford Project -gt began MIPS (used by SGI)

5
Modern RISC processors
  • Complexity has nonetheless increased
    significantly
  • Superscalar execution (where CPU has multiple
    functional units of the same type e.g. two add
    units) require complex circuitry to control
    scheduling of operations
  • What if we could remove the scheduling complexity
    by using a smart compiler?

6
VLIW EPIC
  • VLIW very long instruction word
  • Idea pack a number of noninterdependent
    operations into one long instruction
  • Strong emphasis on compilers to schedule
    instructions
  • Natural successor to RISC designed to avoid the
    need for complex scheduling in RISC designs
  • ISA is called IA-64

Instr 1
Instr 2
Instr 3
3 instructions scheduled into one long
instruction word
7
The EPIC Philosophy
8
Who won?
  • Modern x86 are RISC-CISC hybrids
  • ISA is translated at hardware level to shorter
    instructions
  • Very complicated designs though, lots of
    scheduling hardware
  • MIPS, Sun SPARC, DEC Alpha are much truer
    implementations of the RISC ideal
  • Modern metric for determining RISCkyness of
    design does the ISA have LOAD STORE instructions
    to memory?

9
Multi-Core Technology
2004 2005 2007 Single
Core Dual Core Multi-Core
4 or more cores
Cache
2X more cores
Cache
Cache
Core
2 or more cores
Cache
Cache
Core
Cache
  • Itanium architecture has smaller core size
    enabling up to 2x more cores per die than IA-32
    for higher performance at same cost

Itanium architecture expected to enable up to 2x
more cores per processor than Xeon processors by
2007
10
Multi Core Processors
  • Historical trend 20-30/yr by raising frequency
  • Future tend multiply cores at lower freq for
    better performance-to-power ratio

11
32-Bit and 64-Bit Processors
Long Mode
Legacy Mode
User
Application
Operating System
Kernel
Drivers
12
The Role of Compilers
13
Compiler and ISA
  • ISA decisions are no more just for programming
    assembly language (AL) easily
  • Due to HLL, ISA is a compiler target today
  • Performance of a computer will be significantly
    affected by compiler
  • Understanding the compiler technology today is
    critical to designing and efficiently
    implementing an instruction set
  • Architecture choice affects the code quality and
    the complexity of building a compiler for it

14
Goal of the Compiler
  • Primary goal is correctness
  • Second goal is speed of the object code
  • Others
  • Speed of the compilation
  • Ease of providing debug support
  • Inter-operability among languages
  • Flexibility of the implementation - languages may
    not change much but they do evolve - e. g.
    Fortran 66 gt HPF

Make the frequent cases fast and the rare case
correct
15
Typical Modern Compiler Structure
Common Intermediate Representation
Somewhat language dependentLargely machine
independent
Small language dependentSlight machine dependent
Language independentHighly machine dependent
16
Typical Modern Compiler Structure (Cont.)
  • Multi-pass structure ? easy to write bug-free
    compilers
  • Transform HL, more abstract representations, into
    progressively low-level representations,
    eventually reaching the instruction set
  • Compilers must make assumptions about the ability
    of later steps to deal with certain problems
  • Ex. 1 choose which procedure calls to expand
    inline before they know the exact size of the
    procedure being called
  • Ex. 2 Global common sub-expression elimination
  • Find two instances of an expression that compute
    the same value and saves the result of the first
    one in a temporary
  • Temporary must be register, not memory
    (Performance)
  • Assume register allocator will allocate temporary
    into register

17
Optimization Types
  • High level - done at source code level
  • Procedure called only once - so put it in-line
    and save CALL
  • Local - done on basic sequential block
    (straight-line code)
  • Common sub-expressions produce same value
  • Constant propagation - replace constant valued
    variable with the constant - saves multiple
    variable accesses with same value
  • Global - same as local but done across branches
  • Code motion - remove code from loops that compute
    same value on each pass and put it before the
    loop
  • Simplify or eliminate array addressing
    calculations in loop

18
Optimization Types (Cont.)
  • Register allocation
  • Use graph coloring (graph theory) to allocate
    registers
  • NP-complete
  • Heuristic algorithm works best when there are at
    least 16 (and preferably more) registers
  • Processor-dependent optimization
  • Strength reduction replace multiply with shift
    and add sequence
  • Pipeline scheduling reorder instructions to
    minimize pipeline stalls
  • Branch offset optimization Reorder code to
    minimize branch offsets

19
Register Allocation
  • One the most important optimizations
  • Based on graph coloring techniques
  • Construct graph of possible allocations to a
    register
  • Use graph to allocate registers efficiently
  • Goal is to achieve 100 register allocation for
    all active variables.
  • Graph coloring works best when there are at least
    16 general-purpose registers available for
    integers and more for floating-point variables.

20
Constant propagation a 5 ... // no change to
a so far. if (a gt b) . . . The
statement (a gt b) can be replaced by (5 gt b).
This could free a register when the comparison is
executed. When applied systematically, constant
propagation can improve the code significantly.
21
Strength reduction Example for (j 0 j n
j) Aj 2j for (i 0 4i lt n i)
A4i 0 An optimizing compiler can replace
multiplication by 4 by addition by 4. This is an
example of strength reduction. In general, scalar
multiplications can be replaced by additions.
22
Major Types of Optimizations and Example in Each
Class
23
Change in IC Due to Optimization
  • Level 1 local optimizations, code scheduling,
    and local register allocation
  • Level 2 global optimization, loop transformation
    (software pipelining), global register allocation
  • Level 3 procedure integration

24
How can Architects Help Compiler Writers
  • Provide Regularity
  • Address modes, operations, and data types should
    be orthogonal (independent) of each other
  • Simplify code generation especially multi-pass
  • Counterexample restrict what registers can be
    used for a certain classes of instructions
  • Provide primitives - not solutions
  • Special features that match a HLL construct are
    often un-usable
  • What works in one language may be detrimental to
    others

25
How can Architects Help Compiler Writers (Cont.)
  • Simplify trade-offs among alternatives
  • How to write good code? What is a good code?
  • Metric IC or code size (no longer true) ?caches
    and pipeline
  • Anything that makes code sequence performance
    obvious is a definite win!
  • How many times a variable should be referenced
    before it is cheaper to load it into a register
  • Provide instructions that bind the quantities
    known at compile time as constants
  • Dont hide compile time constants
  • Instructions which work off of something that the
    compiler thinks could be a run-time determined
    value hand-cuffs the optimizer

26
Short Summary -- Compilers
  • ISA has at least 16 GPR (not counting FP
    registers) to simplify allocation of registers
    using graph coloring
  • Orthogonality suggests all supported addressing
    modes apply to all instructions that transfer
    data
  • Simplicity understand that less is more in ISA
    design
  • Provide primitives instead of solutions
  • Simplify trade-offs between alternatives
  • Dont bind constants at runtime
  • Counterexample Lack of compiler support for
    multimedia instructions
Write a Comment
User Comments (0)
About PowerShow.com