Instruction Set Principles and Examples - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Instruction Set Principles and Examples

Description:

By a binary format since the hardware understands only bits. Concatenate together binary encoding for instructions, registers, constants, memories ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 78
Provided by: ccNct
Category:

less

Transcript and Presenter's Notes

Title: Instruction Set Principles and Examples


1
Instruction Set Principles and Examples
2
Outline
  • Introduction
  • Classifying instruction set architectures
  • Instruction set measurements
  • Memory addressing
  • Addressing modes for signal processing
  • Type and size of operands
  • Operations in the instruction set
  • Operations for media and signal processing
  • Instructions for control flow
  • Encoding an instruction set
  • Role of compilers
  • MIPS architecture

3
Brief Introduction to ISA
  • Instruction Set Architecture a set of
    instructions
  • Each instruction is directly executed by the
    CPUs hardware
  • How is it represented?
  • By a binary format since the hardware understands
    only bits
  • Concatenate together binary encoding for
    instructions, registers, constants, memories
  • Typical physical blobs are bits, bytes, words,
    n-words
  • Word size is typically 16, 32, 64 bits today
  • Options - fixed or variable length formats
  • Fixed - each instruction encoded in same size
    field (typically 1 word)
  • Variable half-word, whole-word, multiple word
    instructions are possible

4
Example of Program Execution
  • Command
  • 1 Load AC from Memory
  • 2 Store AC to memory
  • 5 Add to AC from memory
  • Add the contents of memory 940 to the content of
    memory 941 and stores the result at 941

Fetch
Execution
5
A Note on Measurements
  • Were taking the quantitative approach
  • BUT measurements will vary
  • Due to application selection or application mix
  • Due to the particular compiler being used
  • Also dependent on compiler optimization selection
  • And the target ISA
  • Hence the measurements well talk about
  • Are useful to understand the method
  • Are a typical yet small sample derived from
    benchmark codes
  • To do it for real
  • You would want lots of real applications
  • Plus - your compiler and ISA

6
Classifying Instruction Set Architecture
7
Instruction Set Design
The instruction set influences everything
8
Instruction Characteristics
  • Usually a simple operation
  • Which operation is identified by the op-code
    field
  • But operations require operands - 0, 1, or 2
  • To identify where they are, they must be
    addressed
  • Address is to some piece of storage
  • Typical storage possibilities are main memory,
    registers, or a stack
  • 2 options explicit or implicit addressing
  • Implicit - the op-code implies the address of the
    operands
  • ADD on a stack machine - pops the top 2 elements
    of the stack, then pushes the result
  • HP calculators work this way
  • Explicit - the address is specified in some field
    of the instruction
  • Note the potential for 3 addresses - 2 operands
    the destination

9
Classifying Instruction Set Architectures
Based on CPU internal storage optionsAND of
operands
These choices critically affect - instructions,
CPI, and cycle time
10
Operand Locations for Four ISA Classes
11
CAB
  • Stack
  • Push A
  • Push B
  • Add
  • Pop the top-2 values of the stack (A, B) and push
    the result value into the stack
  • Pop C
  • Accumulator (AC)
  • Load A
  • Add B
  • Add AC (A) with B and store the result into AC
  • Store C
  • Register (register-memory)
  • Load R1, A
  • Add R3, R1, B
  • Store R3, C
  • Register (load-store)
  • Load R1, A
  • Load R2, B
  • Add R3, R1, R2
  • Store R3, C

12
Pros and Cons of Stack, Accumulator, Register
Machine
13
Modern Choice Load-store Register (GPR)
Architecture
  • Reasons for choosing GPR (general-purpose
    registers) architecture
  • Registers (stacks and accumulators) are faster
    than memory
  • Registers are easier and more effective for a
    compiler to use
  • (AB) (CD) (EF)
  • May be evaluated in any order (for pipelining
    concerns or )
  • But on a stack machine ? must left to right
  • Registers can be used to hold variables
  • Reduce memory traffic
  • Speed up programs
  • Improve code density (fewer bits are used to name
    a register)
  • Compiler writers prefer that all registers be
    equivalent and unreserved
  • The number of GPR at least 16

14
Characteristics Divide GPR Architectures
  • of operands
  • Three-operand 1 result and 2 source operands
  • Two-operand 1 both source/result and 1 source
  • How many operands are memory addresses
  • 0 3 (two sources 1 result)

Load-store
Register-memory
Memory-memory
15
Pros and Cons of Three Most Common GPR Computers
16
Short Summary Classifying Instruction Set
Architectures
  • Expect the use of general-purpose registers
  • Figure 2.4 pipelining (Appendix A)
  • Expect the use of Register-Register (load-store)
    GPR architecture

17
Memory Addressing
18
Memory Addressing Basics
All architectures must address memory
  • What is accessed - byte, word, multiple words?
  • Todays machine are byte addressable
  • Main memory is organized in 32 - 64 byte lines
  • Big-Endian or Little-Endian addressing
  • Hence there is a natural alignment problem
  • Size s bytes at byte address A is aligned if A
    mod s 0
  • Misaligned access takes multiple aligned memory
    references
  • Memory addressing mode influences instruction
    counts (IC) and clock cycles per instruction (CPI)

19
Typical Address Modes (I)
20
Typical Address Modes (II)
21
Use of Memory Addressing Mode (Figure 2.7)
Based on a VAX which supported everything
Not counting Register mode (50 of all)
22
Displacement Field Size
At least 1216 bits (75 -- 99) of the
displacements
23
Immediate Operands
24
Distribution of Immediate Values
25
Addressing Modes for Signal Processing
  • DSPs deal with infinite, continuous streams of
    data, they routinely rely on circular buffers
  • Modulo or circular addressing mode
  • Support data shuffling in Fast Fourier Transform
    (FFT)
  • Bit reverse addressing
  • 0112 ? 1102
  • However, the two fancy addressing modes do not
    used heavily
  • Mismatch between what programmers and compilers
    actually use versus what architects expect

26
Frequency of Addressing Modes for T1 TMS320C54x
DSP
27
Short Summary Memory Addressing
  • Need to support at least three addressing modes
  • Displacement, immediate, and register deferred
    ( REGISTER)
  • They represent 75 -- 99 of the addressing modes
    in benchmarks
  • The size of the address for displacement mode to
    be at least 1216 bits (75 99)
  • The size of immediate field to be at least 8 16
    bits (50 80)

28
Operand Type Size
  • Specified by instruction (opcode) or by hardware
    tag
  • Tagged machines are extinct
  • Typical types assume word 32 bits
  • Character - byte - ASCII or EBCDIC (IBM) - 4 per
    word
  • Short integer - 2- bytes, 2s complement
  • Integer - one word - 2s complement
  • Float - one word - usually IEEE 754 these days
  • Double precision float - 2 words - IEEE 754
  • BCD or packed decimal - 4- bit values packed 8
    per word
  • Instructions will be needed for common
    conversions -- software can do the rare ones

29
Data Access Patterns
30
Operands for Media and Signal Processing
  • Graphics applications vertex
  • (x, y, z) w to help with color or hidden
    surfaces (R, G, B, A)
  • 32-bit floating-point values
  • DSPs
  • Fixed point a binary point just to the right of
    the sign bit
  • Represent fractions between 1 and 1
  • Have a separate exponent variable
  • Blocked floating point a block of variables has
    a common exponent
  • Need some registers that are wider to guard
    against round-off error

31
Operand Type and Size in DSP
32
Short Summary Type and Size of Operand
  • The future - as we go to 64 bit machines
  • Decimals future is unclear
  • Larger offsets, immediate, etc. is likely
  • Usage of 64 and 128 bit values will increase
  • DSPs need wider accumulating registers than the
    size in memory to aid accuracy in fixed-point
    arithmetic

33
What Operations are Needed
  • Arithmetic Logical
  • Integer arithmetic ADD, SUB, MULT, DIV, SHIFT
  • Logical operation AND, OR, XOR, NOT
  • Data Transfer - copy, load, store
  • Control - branch, jump, call, return, trap
  • System - OS and memory management
  • Well ignore these for now - but remember they
    are needed
  • Floating Point
  • Same as arithmetic but usually take bigger
    operands
  • Decimal - if you go for it what else do you need?
  • legacy from COBOL and the commercial application
    domain
  • String - move, compare, search
  • Graphics pixel and vertex, compression/decompres
    sion operations

34
Top 10 Instructions for 80x86
  • load 22
  • conditional branch 20
  • compare 16
  • store 12
  • add 8
  • and 6
  • sub 5
  • move register-register 4
  • call 1
  • return 1
  • The most widely executed instructions are the
    simple operations of an instruction set
  • The top-10 instructions for 80x86 account for 96
    of instructions executed
  • Make them fast, as they are the common case

35
Control Instructions are a Big Deal
  • Jumps - unconditional transfer
  • Conditional Branches
  • How is condition code set? by flag or part of
    the instruction
  • How is target specified? How far away is it?
  • Calls
  • How is target specified? How far away is it?
  • Where is return address kept?
  • How are the arguments passed? Callee vs. Caller
    save!
  • Returns
  • Where is the return address? How far away is it?
  • How are the results passed?

36
Breakdown of Control Flows
  • Call/Returns
  • Integer 19 FP 8
  • Jump
  • Integer 6 FP 10
  • Conditional Branch
  • Integer 75 FP 82

37
Branch Address Specification
  • Known at compile time for unconditional and
    conditional branches - hence specified in the
    instruction
  • As a register containing the target address
  • As a PC-relative offset
  • Consider word length addresses, registers, and
    instructions
  • Full address desired? Then pick the register
    option.
  • BUT - setup and effective address will take
    longer.
  • If you can deal with smaller offset then PC
    relative works
  • PC relative is also position independent - so
    simple linker duty

38
Returns and Indirect Jumps
  • Branch target is not known at compile time
  • Need a way to specify the target dynamically
  • Use a register
  • Permit any addressing mode
  • RegsR4 ? RegsR4 MemRegsR1
  • Also useful for
  • case or switch
  • Dynamically shared libraries
  • High-order functions or function pointers
  • Virtual functions in OO

39
Branch Stats - 90 are PC Relative
  • Call/Return
  • TeX 16, Spice 13, GCC 10
  • Jump
  • TeX 18, Spice 12, GCC 12
  • Conditional
  • TeX 66, Spice 75, GCC 78

40
Branch Distances
41
Condition Testing Options
42
What kinds of compares do Branches Use?
43
Direction, Frequency, and real Change
Key points 75 are forward branch Most
backward branches are loops - taken about 90
Branch statistics are both compiler and
application dependent Any loop optimizations
may have large effect
44
Short Summary Operations in the Instruction Set
  • Branch addressing to be able to jump to about
    100 instructions either above or below the
    branch
  • Imply a PC-relative branch displacement of at
    least 8 bits
  • Register-indirect and PC-relative addressing for
    jump instructions to support returns as well as
    many other features of current systems

45
Encoding an Instruction Set
46
Encoding the ISA
  • Encode instructions into a binary representation
    for execution by CPU
  • Can pick anything but
  • Affects the size of code - so it should be tight
  • Affects the CPU design - in particular the
    instruction decode
  • So it may have a big influence on the CPI or
    cycle-time
  • Must balance several competing forces
  • Desire for lots of addressing modes and registers
  • Desire to make average program size compact
  • Desire to have instructions encoded into lengths
    that will be easy to handle in a pipelined
    implementation (multiple of bytes)

47
3 Popular Encoding Choices
  • Variable (compact code but difficult to encode)
  • Primary opcode is fixed in size, but opcode
    modifiers may exist
  • Opcode specifies number of arguments - each used
    as address fields
  • Best when there are many addressing modes and
    operations
  • Use as few bits as possible, but individual
    instructions can vary widely in length
  • e. g. VAX - integer ADD versions vary between 3
    and 19 bytes
  • Fixed (easy to encode, but lengthy code)
  • Every instruction looks the same - some field may
    be interpreted differently
  • Combine the operation and the addressing mode
    into the opcode
  • e. g. all modern RISC machines
  • Hybrid
  • Set of fixed formats
  • e. g. IBM 360 and Intel 80x86

Trade-off between size of programVS. ease of
decoding
48
3 Popular Encoding Choices (Cont.)
49
An Example of Variable Encoding -- VAX
  • addl3 r1, 737(r2), (r3) 32-bit integer add
    instruction with 3 operands ? need 6 bytes to
    represent it
  • Opcode for addl3 1 byte
  • A VAX address specifier is 1 byte (4-bits
    addressing mode, 4-bits register)
  • r1 1 byte (register addressing mode r1)
  • 737(r2)
  • 1 byte for address specifier (displacement
    addressing r2)
  • 2 bytes for displacement 737
  • (r3) 1 byte for address specifier (register
    indirect r3)
  • Length of VAX instructions 153 bytes

50
Short Summary Encoding the Instruction Set
  • Choice between variable and fixed instruction
    encoding
  • Code size than performance ? variable encoding
  • Performance than code size ? fixed encoding

51
Role of Compilers
  • Critical goals in ISA from the compiler viewpoint
  • What features will lead to high-quality code
  • What makes it easy to write efficient compilers
    for an architecture

52
Compiler and ISA
  • ISA decisions are no more for programming AL
    easily
  • Due to HLL, ISA is a compiler target today
  • Performance of a computer will be significantly
    affected by compiler
  • Understanding compiler technology today is
    critical to designing and efficiently
    implementing an instruction set
  • Architecture choice affects the code quality and
    the complexity of building a compiler for it

53
Goal of the Compiler
  • Primary goal is correctness
  • Second goal is speed of the object code
  • Others
  • Speed of the compilation
  • Ease of providing debug support
  • Inter-operability among languages
  • Flexibility of the implementation - languages may
    not change much but they do evolve - e. g.
    Fortran 66 gt HPF

Make the frequent cases fast and the rare case
correct
54
Typical Modern Compiler Structure
Common Intermediate Representation
Somewhat language dependentLargely machine
independent
Small language dependentSlight machine dependent
Language independentHighly machine dependent
55
Typical Modern Compiler Structure (Cont.)
  • Multi-pass structure ? easy to write bug-free
    compilers
  • Transform HL, more abstract representations, into
    progressively low-level representations,
    eventually reaching the instruction set
  • Compilers must make assumptions about the ability
    of later steps to deal with certain problems
  • Ex. 1 choose which procedure calls to expand
    inline before they know the exact size of the
    procedure being called
  • Ex. 2 Global common sub-expression elimination
  • Find two instances of an expression that compute
    the same value and saves the result of the first
    one in a temporary
  • Temporary must be register, not memory
    (Performance)
  • Assume register allocator will allocate temporary
    into register

56
Optimization Types
  • High level - done at source code level
  • Procedure called only once - so put it in-line
    and save CALL
  • Local - done on basic sequential block
    (straight-line code)
  • Common sub-expressions produce same value
  • Constant propagation - replace constant valued
    variable with the constant - saves multiple
    variable accesses with same value
  • Global - same as local but done across branches
  • Code motion - remove code from loops that compute
    same value on each pass and put it before the
    loop
  • Simplify or eliminate array addressing
    calculations in loop

57
Optimization Types (Cont.)
  • Register allocation
  • Use graph coloring (graph theory) to allocate
    registers
  • NP-complete
  • Heuristic algorithm works best when there are at
    least 16 (and preferably more) registers
  • Processor-dependent optimization
  • Strength reduction replace multiply with shift
    and add sequence
  • Pipeline scheduling reorder instructions to
    minimize pipeline stalls
  • Branch offset optimization Reorder code to
    minimize branch offsets

58
Major Types of Optimizations and Example in Each
Class
59
Change in IC Due to Optimization
  • Level 1 local optimizations, code scheduling,
    and local register allocation
  • Level 2 global optimization, loop transformation
    (software pipelining), global register allocation
  • Level 3 procedure integration

60
Optimization Observations
  • Hard to reduce branches
  • Biggest reduction is often memory references
  • Some ALU operation reduction happens but it is
    usually a few
  • Implication
  • Branch, Call, and Return become a larger relative
    of the instruction mix
  • Control instructions among the hardest to speed up

61
Impact of Compiler Technology on Architects
Decisions
  • Important questions
  • How are variables allocated and addressed?
  • How many registers will be needed?
  • We must look at 3 areas to allocate data

62
Where to allocate data?
  • Stack
  • Local variable access in activation records,
    almost no push/pop
  • Addressing is relative to the stack pointer
  • Grown or shrunk on calls and returns
  • Global data area - the easy one
  • Constants and global static structures
  • For arrays addressing may be indexed off head
  • Heap
  • Used for dynamic objects
  • Access usually by pointers
  • Data is typically not scalar

63
Register Allocation Data
  • Reasonably simple for stack objects
  • Hard for global data due to aliasing opportunity
  • Must be conservative
  • Heap objects pointers in general are even
    harder
  • Computed pointers make allocation impossible to
    register save the target data
  • Any structured data - string, array, etc. is too
    big to save
  • Since register allocation is a major optimization
    source
  • The effect is clearly important

p a a p a
64
How can Architects Help Compiler Writers
  • Provide Regularity
  • Address modes, operations, and data types should
    be orthogonal (independent) of each other
  • Simplify code generation especially multi-pass
  • Counterexample restrict what registers can be
    used for a certain classes of instructions
  • Provide primitives - not solutions
  • Special features that match a HLL construct are
    often un-usable
  • What works in one language may be detrimental to
    others

65
How can Architects Help Compiler Writers (Cont.)
  • Simplify trade-offs among alternatives
  • How to write good code? What is a good code?
  • Metric IC or code size (no longer true) ?caches
    and pipeline
  • Anything that makes code sequence performance
    obvious is a definite win!
  • How many times a variable should be referenced
    before it is cheaper to load it into a register
  • Provide instructions that bind the quantities
    known at compile time as constants
  • Dont hide compile time constants
  • Instructions which work off of something that the
    compiler thinks could be a run-time determined
    value hand-cuffs the optimizer

66
Short Summary -- Compilers
  • ISA has at least 16 GPR (not counting FP
    registers) to simplify allocation of registers
    using graph coloring
  • Orthogonality suggests all supported addressing
    modes apply to all instructions that transfer
    data
  • Simplicity understand that less is more in ISA
    design
  • Provide primitives instead of solutions
  • Simplify trade-offs between alternatives
  • Dont bind constants at runtime
  • Counterexample Lack of compiler support for
    multimedia instructions

67
The MIPS Architecture
68
Expectations for New ISA
  • Use general-purpose registers, with a load-store
    architecture
  • Support displacement (offset size12-16 bits),
    immediate (size 8 to 16 bits), and register
    indirect
  • Support 8-, 16-, 32-, and 64-bit integers and
    64-bit IEEE 754 floating-point numbers
  • Support the following simple instructions load,
    store, add, subtract, move register-register,
    and, shift, compare equal, compare not equal,
    branch (with a PC-relative address at least 8
    bits long), jump, call, return
  • Use fixed instruction encoding if interested in
    performance and use variable instruction encoding
    if interested in code size
  • Provide at least 16 general-purpose registers
    (GPA) separate floating-point registers, be
    sure all addressing modes apply to all data
    transfer instructions, and aim for a minimalist
    instruction set

69
MIPS
  • Simple load- store ISA
  • Enable efficient pipeline implementation
  • Fixed instruction set encoding
  • Efficiency as a compiler target
  • MIPS64 variant is discussed here

70
Register for MIPS
  • 32 64-bit integer GPRs - R0, R1, ... R31, R0 0
    always
  • 32 FPRs - used for single or double precision
  • For single precision F0, F1, ... , F31 (32-bit)
  • For double precision F0, F2, ... , F30 (64-bit)
  • Extra status registers - moves via GPRs
  • Instructions for moving between an FRP and a GPR

71
Data Types for MIPS
  • 8-bit byte, 16-bit half words, 32-bit word, and
    64-bit double words for integer data
  • 32-bit single precision and 64-bit double
    precision for FP
  • MIPS64 operations work on 64-bit integer and 32-
    or 64-bit floating point
  • Bytes, half words, and words are loaded into the
    GPRs with zeros or the sign bit replicated to
    fill the 64 bits of the GPRs
  • All references between memory and either GPRs or
    FPRs are through load or stores

72
Addressing Modes for MIPS
  • Data addressing immediate and displacement (16
    bits)
  • Displacement Add R4, 100(R1) (RegsR4?RegsR4M
    em100RegsR1)
  • Register-indirect placing 0 in displacement
    field
  • Add R4, (R1) (RegsR4?RegsR4MemRegsR1)
  • Absolute addressing (16 bits) using R0 as the
    base register
  • Add R1, (1001) (RegsR4?RegsR4Mem1001)
  • Byte addressable with 64-bit address
  • Mode selection for Big Endian or Little Endian

73
MIPS Instruction Format
  • Encode addressing mode into the opcode
  • All instructions are 32 bits with 6-bit primary
    opcode

74
MIPS Instruction Format (Cont.)

I-Type Instruction
  • Loads and Stores LW R1, 30(R2), S.S F0, 40(R4)
  • ALU ops on immediates DADDIU R1, R2, 3
  • rt lt-- rs op immediate
  • Conditional branches BEQZ R3, offset
  • rs is the register checked
  • rt unused
  • immediate specifies the offset
  • Jump registers ,jump and link register JR R3
  • rs is target register
  • rt and immediate are unused but 011

75
MIPS Instruction Format (Cont.)
R-Type Instruction
  • Register-register ALU operations rd?rs funct rt
    DADDU R1, R2, R3
  • Function encodes the data path operations Add,
    Sub...
  • read/write special registers
  • Moves

J-Type Instruction Jump, Jump and Link, Trap and
return from exception
6 26
opcode
Offset added to PC
76
MIPS instruction MIX
SPECint2000
77
MIPS instruction MIX (Cont.)
SPECfp2000
Write a Comment
User Comments (0)
About PowerShow.com