Advanced Computer Architecture 5MD00 5Z033 Instruction Set Design - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Advanced Computer Architecture 5MD00 5Z033 Instruction Set Design

Description:

The instruction set architecture serves as the interface between software and hardware ... Same compiler can compile different languages. Multiple Target Compilers ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 39
Provided by: henkcor2
Category:

less

Transcript and Presenter's Notes

Title: Advanced Computer Architecture 5MD00 5Z033 Instruction Set Design


1
Advanced Computer Architecture5MD00 /
5Z033Instruction Set Design
  • Henk Corporaal
  • www.ics.ele.tue.nl/heco/courses/aca
  • TUEindhoven
  • November 2009

2
Lecture overview
  • ISA and Evolution
  • Architecture classes
  • Addressing
  • Operands
  • Operations
  • Encoding
  • RISC
  • SIMD extensions

3
Instruction Set Architecture
  • The instruction set architecture serves as the
    interface between software and hardware
  • It provides the mechanism by which the software
    tells the hardware what should be done
  • Architecture definitionthe architecture of a
    system/processor is (a minimal description of)
    its behavior as observed by its immediate users

4
Instruction Set Design Issues
  • Where are operands stored?
  • registers, memory, stack, accumulator
  • How many explicit operands are there?
  • 0, 1, 2, or 3
  • How is the operand location specified?
  • register, immediate, indirect, . . .
  • What type size of operands are supported?
  • byte, int, float, double, string, vector. . .
  • What operations are supported?
  • basic operations add, sub, mul, move, compare .
    . .
  • or also very complex operations?

5
Operands
  • How are operands designated?
  • fixed always in the same place
  • by opcode always the same for groups of
    instructions
  • by a field in the instruction requires decode
    first
  • What is the format of the data?
  • binary
  • character
  • decimal (packed and unpacked)
  • floating-point IEEE 754 (others used less and
    less)
  • size 8-, 16-, 32-, 64-, 128-bit
  • What is the influence on the ISA (
    Instruction-Set Architecture)?

6
Operand Locations
7
Classifying ISAs
Accumulator (before 1960) 1 address add A acc
acc memA Stack (1960s to 1970s) 0
address add tos tos next Memory-Memory
(1970s to 1980s) 2 address add A, B memA
memA memB 3 address add A, B, C memA
memB memC Register-Memory (1970s to
present) 2 address add R1, A R1 R1
memA load R1, A R1 memA Register-Regis
ter (Load/Store) (1960s to present) 3
address add R1, R2, R3 R1 R2 R3 load R1,
R2 R1 memR2 store R1, R2 memR1 R2
8
Evolution of Architectures
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
(Vax, Intel 8086 1977-80)
RISC
(Mips,Sparc,88000,IBM RS6000, . . .1987)
9
Addressing Modes
  • Types
  • Register data in a register
  • Immediate data in the instruction
  • Memory data in memory
  • Calculation of Effective Address
  • Direct address in instruction
  • Indirect address in register
  • Displacement address register or PC offset
  • Indexed address register register
  • Memory Indirect address at address in register
  • What is the influence on ISA?

10
Types of Addressing Mode (VAX)
  • Addressing Mode Example Action
  • 1. Register direct Add R4, R3 R4 lt- R4 R3
  • 2. Immediate Add R4, 3 R4 lt- R4 3
  • 3. Displacement Add R4, 100(R1) R4 lt- R4 M100
    R1
  • 4. Register indirect Add R4, (R1) R4 lt- R4
    MR1
  • 5. Indexed Add R4, (R1 R2) R4 lt- R4 MR1
    R2
  • 6. Direct Add R4, (1000) R4 lt- R4 M1000
  • 7. Memory Indirect Add R4, _at_(R3) R4 lt- R4
    MMR3
  • 8. Autoincrement Add R4, (R2) R4 lt- R4 MR2
  • R2 lt- R2 d
  • 9. Autodecrement Add R4, (R2)- R4 lt- R4 MR2
  • R2 lt- R2 - d
  • 10. Scaled Add R4, 100(R2)R3 R4 lt- R4
  • M100 R2 R3d
  • Studies by Clark and Emer indicate that modes
    1-4 account for 93 of all operands on the VAX

11
Operations
  • Types
  • ALU Integer arithmetic and logical functions
  • Data transfer Loads/stores
  • Control Branch, jump, call, return, traps,
    interrupts
  • System O/S calls, virtual memory management
  • Floating point Floating point arithmetic
  • Decimal Decimal arithmetic (BCD binary coded
    decimal)
  • String moves, compares, search, etc.
  • Graphics Pixel/vertex operations
  • Vector Vector (SIMD) functions
  • more complex ones
  • Addressing
  • Which addressing modes for which operands are
    supported?

12
80x86 Instruction Frequency
13
Relative Frequency of Control Instructions
  • Design hardware to handle branches quickly,
    since these occur most frequently

14
Frequency of Operand Sizeson 32-bit Load-Store
Machines
  • For floating-point want good performance for 64
    bit operands.
  • For integer operations want good performance for
    32 bit operands
  • Recent architectures also support 64-bit integers

15
Instruction Encoding
  • Variable
  • Instruction length varies based on opcode and
    address specifiers
  • For example, VAX instructions vary between 1 and
    53 bytes, while x86 instruction vary between 1
    and 17 bytes.
  • Good code density, but difficult to decode and
    pipeline
  • Fixed
  • Only a single size for all instructions
  • For example MIPS, Power PC, Sparc all have 32 bit
    instructions
  • Not as good code density, but easier to decode
    and pipeline
  • Hybrid
  • Have multiple format lengths specified by the
    opcode
  • For example, IBM 360/370
  • Compromise between code density and ease of decode

16
Instruction Encoding
17
Example MIPS
Operands mostly at fixed positions
Fixed instruction size few formats
18
Compilers and ISA
  • Compiler Goals
  • All correct programs compile correctly
  • Most compiled programs execute quickly
  • Most programs compile quickly
  • Achieve small code size
  • Provide debugging support
  • Multiple Source Compilers
  • Same compiler can compile different languages
  • Multiple Target Compilers
  • Same compiler can generate code for different
    machines
  • 'cross-compiler'

19
Compiler basics trajectory
Source program
Preprocessor
Compiler
Error messages
Assembler
Library code
Loader/Linker
Object program
20
Compiler basics structure / passes
Source code
Lexical analyzer
token generation
check syntax check semantic
parse tree generation
Parsing
Intermediate code
data flow analysis local optimizations
global optimizations
Code optimization
code selection peephole optimizations
Code generation
making interference graph graph
coloring
spill code insertion
caller / callee save and restore code
Register allocation
Sequential code
Scheduling and allocation
exploiting ILP
Object code
21
Compiler basics structure Simple compilation
example
position initial rate 60
Lexical analyzer
temp1 intoreal(60) temp2 id3 temp1 temp3
id2 temp2 id1 temp3
id id id 60
Syntax analyzer
Code optimizer
temp1 id3 60.0 id1 id2 temp1
Code generator
movf id3, r2 mulf 60, r2, r2 movf id2,
r1 addf r2, r1 movf r1, id1
Intermediate code generator
22
Designing ISA to Improve Compilation
  • Provide enough general purpose registers to ease
    register allocation ( at least 16)
  • Provide regular instruction sets by keeping the
    operations, data types, and addressing modes
    largely orthogonal
  • Provide primitive constructs rather than trying
    to map to a high-level language
  • Allow compilers to help make the common case fast

23
A "Typical" RISC
  • 32-bit fixed length instruction
  • Only few instruction formats
  • 32 32-bit GPRs (general purpose registers)
  • 3-address, reg-reg-reg / reg-imm-reg arithmetic
    instruction
  • Single address mode for load/store base
    displacement
  • no indirection
  • Simple branch conditions
  • Pipelined implementation
  • Separate Instruction and Data level-1 caches
  • Delayed branch ?

24
Comparison MIPS with 80x86
  • How would you expect the x86 and MIPS
    architectures to compare on the following
  • CPI on SPEC benchmarks
  • Ease of design and implementation
  • Ease of writing assembly language compilers
  • Code density
  • Overall performance
  • What other advantages/disadvantages are there to
    the two architectures?

25
Instruction Set ExtensionsSubword parallelism
  • Support graphics and multimedia applications
  • Intels MMX Technology (introduced in 1997)
  • Intels Internet Streaming SIMD Extensions (SSE
    SSE4)
  • AMDs 3DNow! Technology
  • Suns Visual Instruction Set
  • Motorolas and IBMs AltiVec Technology
  • These extensions improve the performance of
  • Computer-aided design
  • Internet applications
  • Computer visualization
  • Video games
  • Speech recognition

26
MMX Data Types
  • MMX Technology supports operations on the
    following 64-bit integer data types

Packed byte (eight 8-bit elements)
Packed word (four 16-bit elements)
Packed double word (two 32-bit elements)
Packed quad word (one 64-bit elements)
27
SIMD Operations
  • MMX Technology allows a Single Instruction to
    work on Multiple pieces of Data (SIMD)
  • PADDW Packed add word
  • In the above example, 4 parallel adds are
    performed on 16-bit elements
  • Most MMX instructions only require a single cycle

A3
A2
A1
A0
B3
B2
B1
B0
A3B3
A2B2
A1B1
A0B0
28
Saturating Arithmetic
  • Both wrap-around and saturating adds are
    supported
  • With saturating arithmetic, results that
    overflow/underflow are set to the
    largest/smallest value

PADDW Packed wrap-around add
PADDUSW Packed saturating add
29
Pack and Unpack Instructions
  • Pack and unpack instructions provide conversion
    between standard data types and packed data types

PACKSSDW Pack signed, with saturating, double
to packed word
30
Multiply-Add Operations
  • Many graphics applications require
    multiply-accumulate operations
  • Vector Dot Products a b
  • Matrix Multiplies
  • Fast Fourier Transforms (FFTs)
  • Filter implementations

PMADDWD Packed multiply-add word to double
31
Vector Dot Product
  • A dot product on an 8-element vector can be
    performed using 9 MMX instructions
  • Without MMX 40 instructions are required

a0c0.. a3c3
a4c4.. a7c7
0
0
a0c0.. a7c7
32
Packed Compare Instructions
  • Packed compare instructions allow a bit mask to
    be set or cleared
  • This is useful when images with certain qualities
    need to be extracted

33
MMX Instructions
  • MMX Technology adds 57 new instructions to the
    x86 architecture.
  • Some of these instructions include
    (bbytew32-bitd64-bit)
  • PADD(b, w, d) Packed addition
  • PSUB(b, w, d) Packed subtraction
  • PCMPE(b, w, d) Packed compare equal
  • PMULLw Packed word multiply low
  • PMULHw Packed word multiply high
  • PMADDwd Packed word multiply-add
  • PSRL(w, d, q) Pack shift right logical
  • PACKSS(wb, dw) Pack data
  • PUNPCK(bw, wd, dq) Unpack data
  • PAND, POR, PXOR Packed logical operations

34
MMX Performance Comparison
35
MMX Technology Summary
  • MMX technology extends the Intel x86 architecture
    to improve the performance of multimedia and
    graphics applications.
  • It provides a speedup of 1.5 to 2.0 for certain
    applications.
  • MMX instructions are hand-coded in assembly or
    implemented as libraries to achieve high
    performance.
  • MMX data types use the x86 floating point
    registers to avoid adding state to the processor.
  • Makes it easy to handle context switches
  • Makes it hard to perform MMX and floating point
    instructions at the same time
  • Only increase the chip area by about 5.

36
Questions on MMX
  • What are the strengths and weaknesses of MMX
    Technology?
  • How could MMX Technology potentially be improved?
  • How did the developers of MMX preserve backward
    compatibility with the x86 architecture?
  • Why was this important?
  • What are the disadvantages of this approach?
  • What restrictions/limitations are there on the
    use of MMX Technology?

37
Internet Streaming SIMD Extensions (SSE)
  • Help improve the performance of video and 3D
    applications
  • Are designed for streaming data, which is used
    once and then discarded.
  • 70 new instructions beyond MMX Technology
  • Adds 8 new 128-bit vector registers (XMM0 XMM7)
  • Provide the ability to perform multiple floating
    point operations
  • Four parallel operations on 32-bit numbers
  • Reciprocal and reciprocal root instructions -
    normalization
  • Packed average instruction Motion compensation
  • Provide data prefetch instructions
  • Make certain applications 1.5 to 2.0 times faster

38
Beyond SSE
  • SSE2 SIMD on any data type from 8-bit int to
    64-bit double, using XMM vector registers
  • SSE4 dot-product operation
  • AVX (Advanced Vector Extensions) 2010
  • 16 256-bit vector registers, YMM0-YMM15
  • later extended to 512 and 1024 bits
  • 3 operand instructions (instead of 2, with one
    implicit register operand)
Write a Comment
User Comments (0)
About PowerShow.com