ECE 4100/6100 Advanced Computer Architecture Lecture 4 ISA Taxonomy - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

ECE 4100/6100 Advanced Computer Architecture Lecture 4 ISA Taxonomy

Description:

Advanced Computer Architecture Lecture 4 ISA Taxonomy Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute of Technology – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 22
Provided by: hsienhsi
Category:

less

Transcript and Presenter's Notes

Title: ECE 4100/6100 Advanced Computer Architecture Lecture 4 ISA Taxonomy


1
ECE 4100/6100Advanced Computer Architecture
Lecture 4 ISA Taxonomy
  • Prof. Hsien-Hsin Sean Lee
  • School of Electrical and Computer Engineering
  • Georgia Institute of Technology

2
Instruction Set Architecture
  • Specification of a microprocessor design
  • Interface between user and machines
    functionality
  • Good instruction set design principles
  • Compatibility
  • Implementability
  • Programmability
  • Usability
  • Encoding efficiency

3
Main ISA Design Philosophy
  • CISC (Complex Instruction Set Computer)
  • RISC (Reduced Instruction Set Computer)
  • VLIW (Very Long Instruction Word)
  • EPIC (Explicitly Parallel Instruction Computer)

4
CISC
  • Complex Instruction Set Computers
  • Close semantic gap between programming and
    execution
  • Smaller code size (memory was expensive!)
  • Simplify compilation
  • Another state machine (controlled by microcode)
    inside the machine
  • Example x86, Intel 432, IBM 360, DEC VAX

5
CISC Example x86
  • MOVSD move a double word, 1-byte instruction
  • MOVSD // m32DSEDI m32DSESI
  • REP 1-byte prefix to repeat string operations
  • REP MOVSD // count set up in ECX
  • LOCK ADD dsesiecx20x67452301, 0xEFCDAB89
    // 13-byte
  • F0 3E 81 84 4E 01 23 45 67 89 AB CD EF

6
RISC
  • Observation made by IBM (John Cocke,
    Eckert-Mauchly Award85, Turing Award87, Natl
    Medal of Technology91, Natl Medal of
    Science94)
  • Few of the available instructions are used
  • CISC n1 phenomenon
  • Adding an instruction requiring an extra level of
    decoding logic can slow down the entire ISA
  • Reduced Instruction Set Computer
  • Originated at IBM in 1975, a telephone project
  • To achieve 12 MIPS (300 calls per sec, 20k inst
    per call)
  • Simple instructions
  • IBM 801 in 1978
  • More compiler effort to gain performance

7
A Typical RISC
  • Smaller number of instructions
  • Fixed format instruction (e.g., 32 bits)
  • 3-address, reg-to-reg arithmetic instructions
  • Single cycle operation for execution
  • Load-store architecture
  • Simple address modes
  • Base displacement
  • No indirection
  • Simple branch conditions
  • Hardwired control (No microcode)
  • More compiler effort
  • Examples
  • RISC I and RISC II at Berkeley
  • MIPS (Microprocessors without Interlocked Pipe
    Stage) at Stanford
  • IBM RISC Technology, Sun Sparc, HP PA-RISC, ARM

8
RISC Example MIPS
R-format (Register-Register)
5
6
10
11
31
26
0
15
16
20
21
25
add 1, 2, 3
Shamt
Op
Rs
Rt
Rd
Funct
I-format (Register-Immediate)
31
26
0
15
16
20
21
25
addi 1, 2, -5
immediate
Op
Rs
Rt
I-format (Load/Store)
31
26
0
15
16
20
21
25
lw 1, 24(9)
immediate
Op
Base
Dest
I-format (Branch)
31
26
0
15
16
20
21
25
beq L1, 4, 0
immediate
Op
Rs
Rt
J-format (Jump / Call)
31
26
0
25
j L2
target
Op
9
CISC vs. RISC
CISC RISC
Variable length instructions Fixed-length instructions, single-cycle operation
Abundant instructions and addressing modes Fewer instructions and addressing modes
Long, complex decoding Simple decoding
Contain mem-to-mem operations Load/store architecture
Use microcode No microinstructions, directly decoded and executed by HW logic
Closer semantic gap (shift complexity to microcode) Needs smart compilers, or intelligent hardware to reorder instructions
IBM 360, DEC VAX, x86, Moto 68030 IBM 801, MIPS, RISC I, IBM POWER, Sun Sparc
  • Some definitions were from the paper by Colwell
    et al. in 1985

10
CISC vs. RISC (Reality)
CISC
RISC
IBM 370/168 VAX 11/780 Xerox Dorado IBM 801 Berkeley RISC1 Stanford MIPS
Year introduced 1973 1978 1978 1980 1981 1983
instructions 208 303 270 120 39 55
Microcode 54KB 61KB 17KB 0 0 0
Instruction size 2 to 6 B 2 to 57 B 1 to 3 B 4B 4B 4B
Execution model Reg-reg Reg-mem Mem-mem Reg-reg Reg-mem Mem-mem Stack Reg-reg Reg-reg Reg-reg
11
Observation and Controversy
  • Instruction Set and Beyond Computers,
    Complexity and Controversy by Bob Colwell
    (Eckert-Mauchly Award, 2005) and gang from CMU,
    also see response from RISC camp Patterson
    (Eckert-Mauchly Award, 2008) and Hennessy
    (Eckert-Mauchly Award, 2001)
  • CISC/RISC classification should not be a
    dichotomy
  • Case in point MicroVAX-32 by DEC, a single chip
    implementation
  • Subsetting VAX instructions (but still, 175
    instructions!)
  • Emulate complex instructions
  • a RISC or a CISC? (Well, it has variable length
    instructions, not a ld/st machine, with a
    microcode control, have all VAX addressing mode)
  • Effective processor design CISC experiences
    RISC tenets
  • RISC features are not incompatible or mutually
    exclusive
  • Large register file (w/ register windows)
  • RISC/CISC issues are best considered in light of
    their function-to-implementation level assignment

12
Modern X86 Machine Design
  • CISC outfit
  • RISC inside
  • E.g., Intel P6/Netburst/Core, AMD
    Athlon/Phenom/Opteron
  • Each x86 instruction is decoded into micro-op
    (?op) or RISC-op on-the-fly
  • Internal microarchitecture resembles RISC design
    philosophy
  • Processor dynamically schedules ?ops
  • Compilers scheduling is still beneficial

13
Recent ISA Design Trend
  • Look at this instruction in MIPS (CISC or RISC?)
  • CABS.LE.PS fcc0, f8, f10 y?w ,
    x?w?
  • Many complex instructions emerged for new apps
  • Viterbi instruction for wireless
    communication/DSP
  • Sum of absolute differences in SSE (PSAD) or
    other DSP C ?A-B for MPEG (motion
    estimation)
  • In embedded domain, code size is critical
  • Reducing programming efforts
  • Optimizing performance via
  • Specialized hardware (accelerator-based)
  • Co-processor (controlled by main processor)
  • ISA plug-in (flexible)

14
VLIW
  • Very Long Instruction Word
  • Originated from microcode compaction
  • Coined by Josh Fisher (Eckert-Mauchly Award,
    2003)
  • Compiler will
  • Perform instruction scheduling (latency-aware)
  • Pack several independent instructions into a VLIW
    instruction
  • Issues
  • Compatibility
  • Many nops
  • Very complex compiler
  • Information unavailable at static compile time
  • interprocedural optimization is difficult)
  • Pioneers
  • Culler Scientific
  • Led by Prof. Glen J. Culler (National Medal of
    Technology winner 2000, Berkeley Prof. David
    Cullers father)
  • Multiflow (Fisher)
  • Led by Josh Fisher (Eckert-Mauchly Award 2003),
    John ODonnell, John Ruttenberg, David Papworth,
    Bob Colwell (Eckert-Mauchly Award 2005), Geoffery
    Lowney, etc.
  • Several Multiflow TRACE were delivered

15
Intel/HP EPIC
  • Explicitly Parallel Instruction Computer
  • A kin breed of VLIW (e.g., compiler holding the
    key to high performance)
  • Some new features
  • Stop bits to address compatibility
  • ISA enabling data speculation and control
    speculation (minimum hardware support needed)
  • Fully predicated ISA
  • Rotating registers, RSE (not so new, e.g., MRS in
    RISC I)
  • Lots of ideas from Polycyclic architecture (TRW)
    and Cydrome by the late Bob Rau (Eckert-Mauchly
    Award, 2002)

An Itanium Instruction Bundle
ld4 r43r38 add r3816,r38 br.call.sptk
b0printf
16
VLIW Tradeoffs
  • Plentiful registers, simple encodings,
  • Potentially lower of transistors than other
    designs
  • Reduced speculation, OoO not needed
  • Size efficiencies, price, power consumption
  • Is this true for Itanium?
  • Drawbacks
  • Backward compatibility or upgradeability
  • Due to exposed implementation details
  • VLIW is orthogonal to other techniques
  • Pipeline, SMT, and CMP/Multi-core can be built on
    top of processors including VLIW

17
Design Philosophy VLIW vs. Superscalar
RISC Object code
Static _VOID _DEFUN(_mor_nu), struct _reent
ptr _AND register size_t . . .
IM1 I1 IM2 I2 IM3 I3 T1 LOAD . T3
2T1 . .
Scheduling and Operation Independence Recognizing
hardware
Run-time
The same ILP Hardware in Both cases
Compile Time
Static _VOID _DEFUN(_mor_nu), struct _reent
ptr _AND register size_t . . .
Normal compiler plus scheduling and
operation Independence Recognizing software
18
Design Philosophy VLIW vs. Superscalar
  • VLIW
  • Requiring less hardware and lower power
  • Programs need to be changed to run correctly when
    even small changes (not always though)
  • Superscalar
  • Object-code compatible
  • Sequential programs can be presented to different
    superscalar implementation of the same ISA

19
Design Philosophy VLIW vs. Superscalar
20
Superscalar or VLIW?
  • Reality the current world is dominated by
  • X86 Core (quad-issue) ATOM (dual-issue)
  • And ARM (Cortex A8 is a dual-issue A9 has OOO)
  • VLIW is largely embraced by the DSP camp

21
Should we continue to teach this Chapter about
ISA?
Should we continue to teach this Chapter about
ISA?
Write a Comment
User Comments (0)
About PowerShow.com