EE37E2005 - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

EE37E2005

Description:

Instruction set processor design. Principles ... (http://babbage.clarku.edu/~jbreecher/arch/arch.html) EE37E 2005. 30. Introduction ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 58
Provided by: drlucien7
Category:

less

Transcript and Presenter's Notes

Title: EE37E2005


1
Lesson 5 Processor Design
  • Topic 1 Methods and Concepts

2
Introduction
  • References
  • -Modern Processor Design Book ( pp. 1 16)
  • - Computer Organization and Design Book (pp. 54-
    89)

3
  • While introducing this topic we will focus on
    these points
  • Evolution of microprocessors
  • Instruction set processor design
  • Principles
  • Microprocessors are Instruction set processors
    (ISPs).
  • An ISP executes instructions from a predefined
    instruction set.
  • A microprocessors functionality is fully
    characterized by the instruction set it is
    capable of executing.
  • This predefined instruction set is also called
    the instruction set architecture.

4
  • An ISA serves as an interface between software
    and hardware.
  • In terms of processor design methodology, an ISA
    is the specification of the design while the
    microprocessor or ISP is the implementation of a
    design.

5
Computer System Components
1000MHZ - 3 GHZ (a multiple of system bus
speed) Pipelined ( 7 -21 stages ) Superscalar
(max 4 instructions/cycle) single-threaded Dyn
amically-Scheduled or VLIW Dynamic and static
branch prediction
CPU
L1 L2 L3
Examples Alpha, AMD K7 EV6, 400MHZ
Intel PII, PIII GTL 133MHZ
Intel P4
800MHZ
Caches
SDRAM PC100/PC133 100-133MHZ 64-128 bits
wide 2-way inteleaved 900 MBYTES/SEC Double
Date Rate (DDR) SDRAM PC3200 400MHZ (effective
200x2) 64-128 bits wide 4-way interleaved 3.2
GBYTES/SEC (second half 2002) RAMbus DRAM
(RDRAM) PC800, PC1060 400-533MHZ (DDR) 16-32
bits wide channel 1.6 - 3.2 GBYTES/SEC
( per channel)
System Bus
Support for one or more CPUs
adapters
I/O Buses
Example PCI-X 133MHZ PCI,
33-66MHZ 32-64 bits wide
133-1024 MBYTES/SEC
Memory Bus
Controllers
Disks Displays Keyboards
Networks
I/O Devices
Fast Ethernet Gigabit Ethernet ATM, Token Ring ..
North Bridge
South Bridge
Chipset
6
Computer System Components
Enhanced CPU Performance Capabilities
  • Support for Simultaneous Multithreading (SMT)
    Alpha EV8.
  • VLIW intelligent compiler techniques
    Intel/HP EPIC IA-64.
  • More Advanced Branch Prediction Techniques.
  • Chip Multiprocessors (CMPs) The Hydra
    Project. IBM Power 4,5
  • Vector processing capability Vector
    Intelligent RAM (VIRAM).
  • Or Multimedia ISA extension.
  • Digital Signal Processing (DSP) capability in
    system.
  • Re-Configurable Computing hardware capability
    in system.

SMT CMP
Memory Latency Reduction
Conventional Block-based Trace Cache.
L1 L2 L3
CPU
Caches
Integrate Memory Controller a portion of main
memory with CPU Intelligent RAM Integrated
memory Controller AMD Opetron IBM
Power5
System Bus
adapters
I/O Buses
Memory Bus
Controllers
Disks (RAID) Displays Keyboards
Networks
North Bridge
South Bridge
I/O Devices
Chipset
7
Recent Trends in Computer Design
  • The cost/performance ratio of computing systems
    have seen a steady decline due to advances in
  • Integrated circuit technology decreasing
    feature size, ?
  • Clock rate improves roughly proportional to
    improvement in ?
  • Number of transistors improves proportional to
    ????(or faster).
  • Architectural improvements in CPU design.
  • Microprocessor systems directly reflect IC
    improvement in terms of a yearly 35 to 55
    improvement in performance.
  • Assembly language has been mostly eliminated and
    replaced by other alternatives such as C or C
  • Standard operating Systems (UNIX, NT) lowered
    the cost of introducing new architectures.
  • Emergence of RISC architectures and RISC-core
    architectures.
  • Adoption of quantitative approaches to computer
    design based on empirical performance
    observations.

8
Microprocessor Architecture Trends
CMPs
(SMT)
SMT/CMPs (e.g. IBM Power5 in 2004)
9
Evolution of microprocessors
Graduation Window
Alpha 21264 15 million Pentium Pro 5.5
million PowerPC 620 6.9 million Alpha 21164 9.3
million Sparc Ultra 5.2 million
Moores Law
  • CMOS improvements
  • Die size 2X every 3 yrs
  • Line width halve / 4-7 yrs

Figure1 Evolution of microprocessors
10
  • Three decades of the history of microprocessors
    tell a truly remarkable story of advances in the
    computer industry (Table 1).

Table 1. The amazing decades of the evolution of
microprocessors
11
Hierarchy of Computer Architecture
High-Level Language Programs
Assembly Language Programs
Software
Machine Language Program
Software/Hardware Boundary
Hardware
Microprogram
Register Transfer Notation (RTN)
Logic Diagrams
Circuit Diagrams
12
Instruction Set Processor Design
  • Critical to an ISP is the instruction set
    architecture, which specifies the functionality
    that must be implemented by the instruction set
    processor (ISP).

13
The Design Process
  • "To Design Is To Represent
  • Design activity yields description/representation
    of an object
  • Traditional craftsman does not distinguish
    between the conceptualization and the artifact
  • Separation comes about because of complexity
  • Concept is captured in one or more representation
    languages
  • This process IS design
  • Design Begins With Requirements
  • Functional Capabilities what it will do
  • Performance Characteristics Speed, Power, Area,
    Cost, . . .

14
Design Process (cont.)
CPU
  • Design Finishes As Assembly
  • Design understood in terms of components and how
    they have been assembled
  • Top Down decomposition of complex functions
    (behaviors) into more primitive functions
  • Bottom-up composition of primitive building
    blocks into more complex assemblies

Datapath
Control
ALU
Regs
Shifter
Nand Gate
Design is a "creative process," not a simple
method
15
Design as Search
Problem A
Strategy 1
Strategy 2
SubProb2
SubProb3
SubProb 1
BB1
BB2
BB3
BBn
16
Instruction Set Architecture(subset of Computer
Architecture)
  • ... the attributes of a computing system as
    seen by the programmer, i.e., the conceptual
    structure and functional behavior, as distinct
    from the organization of the data flows and
    controls the logic design, and the physical
    implementation. Amdahl, Blaaw, and Brooks,
    1964
  • Organization of Programmable Storage
  • Data Types Data Structures
  • Encodings Representations
  • Instruction Set
  • Instruction Formats
  • Modes of Addressing and Accessing Data Items
    and Instructions
  • Exceptional Conditions

17
The Instruction Set a Critical Interface
software
instruction set
hardware
Figure 2 ISA
18
Dynamic Static Interface
  • We have discussed two critical roles played by
    the ISA
  • Contract between software and Hardware, which
    facilitates the development pf programs and
    machines
  • Specification for microprocessor design
  • The third role is an associated definition of an
    interface that separates what is done statically
    at the compile time versus what is done
    dynamically at run time. This interface is called
    the Dynamic-static Interface

19
(Software)
Program
Compiler complexity
Exposed to software
Static
Architecture (DSI)
Hardware complexity
Hidden in hardware
Dynamic
Machine
(Hardware)
Figure 3 The dynamic-static feature
20
Computer Architecture Topics
Input/Output and Storage
Disks, WORM, Tape
RAID
Emerging Technologies Interleaving Bus protocols
DRAM
Coherence, Bandwidth, Latency
Memory Hierarchy
L2 Cache
L1 Cache
Addressing, Protection, Exception Handling
VLSI
Instruction Set Architecture
Pipelining and Instruction Level Parallelism
Pipelining, Hazard Resolution, Superscalar,
Reordering, Prediction, Speculation, Vector, DSP
21
Principles of Processor Performance
22
Definitions
  • Performance is in units of things per sec
  • bigger is better
  • If we are primarily concerned with response time

" X is n times faster than Y" means
23
Cycles Per Instruction
IC Instruction Count CPI Clock Per Instruction
24
Cycles Per Instruction
We may separate the contribution of each type
of instruction to the execution time defining
Processor pipelining and memory interactions
limit the accuracy of this approach, but its a
good first guess. For accuracy, it is necessary
to simulate the instructions of an entire program
with issue, pipeline and memory interactions.
25
Aspects of CPU Performance (CPU Law)
26
Amdahl's Law
  • Speedup due to enhancement E
  • Suppose that enhancement E accelerates a fraction
    F of the task by a factor S, and the remainder of
    the task is unaffected
  • E.g. special instructions, memory, IO, parallel
    processing

27
Amdahls Law
28
Amdahls Law
  • Example Floating point instructions improved to
    run 2X but only 10 of actual instructions are FP

29
Topic 2 Instruction Set Architecture Design
  • Adapted from Prof. Jerry Breechers Notes my
    CS21Q Notes
  • (http//babbage.clarku.edu/jbreecher/arch/arch.ht
    ml)

30
Introduction
  • 7.1 Introduction
  • 7.2 Classifying Instruction Set Architectures
  • 7.3 Memory Addressing
  • 7.4 Operations in the Instruction Set
  • 7.5 Type and Size of Operands
  • 7.6 Encoding and Instruction Set
  • 7.7 The Role of Compilers
  • 7.8 The MIPS Architecture and Bonus
  • 7.9. Endianess

31
Introduction
  • The Instruction Set Architecture is that portion
    of the machine visible to the assembly level
    programmer or to the compiler writer.

Questions - What are the advantages and
disadvantages of various instruction set
alternatives? - How do languages and compilers
affect ISA?
32
Classifying Instruction Set Architectures
  • Classifications can be by
  • Stack/accumulator/register
  • Number of memory operands.
  • Number of total operands.

33
Instruction Set Architectures
Basic ISA Classes
  • Accumulator
  • 1 address add A acc acc memA
  • 1x address addx A acc acc memA x
  • Stack
  • 0 address add tos tos next
  • General Purpose Register
  • 2 address add A B EA(A) EA(A) EA(B)
  • 3 address add A B C EA(A) EA(B) EA(C)
  • Load/Store
  • 0 Memory load R1, Mem1
  • load R2, Mem2
  • add R1, R2
  • 1 Memory add R1, Mem2

ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands.
Shown here are cases of 0 and 1.
34
Instruction Set Architectures
Basic ISA Classes
The results of different address classes is
easiest to see with the examples here, all of
which implement the sequences for C A B.
Registers are the class that won out. The more
registers on the CPU, the better.
35
Instruction Set Architectures
Intel 80x86 Integer Registers
36
Memory Addressing
  • Sections Include
  • Interpreting Memory Addresses
  • Addressing Modes
  • Displacement Address Mode
  • Immediate Address Mode

37
Memory Addressing
Interpreting Memory Addresses
  • What object is accessed as a function of the
    address and length?
  • Objects have byte addresses an address refers
    to the number of bytes counted from the beginning
    of memory.
  • Little Endian puts the byte whose address is
    xx00 at the least significant position in the
    word.
  • Big Endian puts the byte whose address is xx00
    at the most significant position in the word.
  • Alignment data must be aligned on a boundary
    equal to its size. Misalignment typically
    results in an alignment fault that must be
    handled by the Operating System.

38
Memory Addressing
Addressing Modes
  • This table shows the most common modes. A more
    complete set is in Figure 2.6

39
Memory Addressing
Displacement Addressing Mode
  • How big should the displacement be?
  • For addresses that do fit in displacement size
  • Add R4, 10000 (R0)
  • For addresses that dont fit in displacement
    size, the compiler must do the following
  • Load R1, address
  • Add R4, 0 (R1)
  • Depends on typical displaces as to how big this
    should be.
  • On both IA32 and DLX, the space allocated is 16
    bits.

40
Memory Addressing
Immediate Address Mode
  • Used where we want to get to a numerical value in
    an instruction.

At high level a b 3 if ( a gt 17
) goto Addr
At Assembler level Load R2, 3 Add R0,
R1, R2 Load R2, 17 CMPBGT R1,
R2 Load R1, Address Jump (R1)
So how would you get a 32 bit value into a
register?
41
Operations In The Instruction Set
  • Sections Include
  • Detailed information about types of instructions.
  • Instructions for Control Flow (conditional
    branches, jumps)

42
Operations In The Instruction Set
Operator Types
  • Arithmetic and logical and, add
  • Data transfer move, load
  • Control branch, jump, call
  • System system call, traps
  • Floating point add, mul, div, sqrt
  • Decimal add, convert
  • String move, compare
  • Multimedia - 2D, 3D? e.g., Intel MMX and Sun
    VIS

43
Operations In The Instruction Set
Control Instructions
Conditional branches are 20 of all instructions!!
  • Control Instructions Issues
  • taken or not
  • where is the target
  • link return address
  • save or restore
  • Instructions that change the PC
  • (conditional) branches, (unconditional) jumps
  • function calls, function returns
  • system calls, system returns

44
Type And Size of Operands
  • The type of the operand is usually encoded in the
    Opcode a LDW implies loading of a word.
  • Common sizes are
  • Character (1 byte)
  • Half word (16 bits)
  • Word (32 bits)
  • Single Precision Floating Point (1 Word)
  • Double Precision Floating Point (2 Words)
  • Integers are twos complement binary.
  • Floating point is IEEE 754.
  • Some languages (like COBOL) use packed decimal.

45
The MIPS Architecture
  • MIPS is very RISC oriented.

46
The MIPS Architecture
  • MIPS Characteristics
  • Addressing Modes
  • Immediate
  • Displacement
  • (Register Mode used only for ALU)

Theres MIPS 32 that we learned in CS140
32bit byte addresses aligned Load/store only
displacement addressing Standard datatypes 3
fixed length formats 32 32bit GPRs (r0 0) 16
64bit (32 32bit) FPRs FP status register No
Condition Codes
  • Data transfer
  • load/store word, load/store byte/halfword signed?
  • load/store FP single/double
  • moves between GPRs and FPRs
  • ALU
  • add/subtract signed? immediate?
  • multiply/divide signed?
  • and,or,xor immediate?, shifts ll, rl, ra
    immediate?
  • sets immediate?

Theres MIPS 64 the current arch. Standard
datatypes 4 fixed length formats (8,16,32,64) 32
64bit GPRs (r0 0) 64 64bit FPRs
47
The MIPS Architecture
  • MIPS Characteristics
  • Control
  • branches 0, ltgt 0
  • conditional branch testing FP bit
  • jump, jump register
  • jump link, jump link register
  • trap, returnfromexception
  • Floating Point
  • add/sub/mul/div
  • single/double
  • fp converts, fp set

48
The MIPS Architecture
  • The MIPS Encoding

49
Byte Ordering
  • How should bytes within multi-byte word be
    ordered in memory?
  • Conventions
  • Suns, Macs are Big Endian machines
  • Least significant byte has highest address
  • Alphas, PCs are Little Endian machines
  • Least significant byte has lowest address

50
Byte Ordering Example
  • Big Endian
  • Least significant byte has highest address
  • Little Endian
  • Least significant byte has lowest address
  • Example
  • Variable x has 4-byte representation 0x01234567
  • Address given by x is 0x100

Big Endian
01
23
45
67
Little Endian
67
45
23
01
51
Machine-Level Code Representation
  • Encode Program as Sequence of Instructions
  • Each simple operation
  • Arithmetic operation
  • Read or write memory
  • Conditional branch
  • Instructions encoded as bytes
  • Alphas, Suns, Macs use 4 byte instructions
  • Reduced Instruction Set Computer (RISC)
  • PCs use variable length instructions
  • Complex Instruction Set Computer (CISC)
  • Different instruction types and encodings for
    different machines
  • Most code not binary compatible
  • Programs are Byte Sequences Too!

52
Classification of Processors
  • We can classify processors according to the areas
    in which they are mostly used.
  • We can identity four different group of
    processors
  • General purpose processors that are used in
    building computers
  • Digital Signal processors which are processors
    designed specifically for signal processing.
  • Microcontrollers which are small microcromputers
    which integrate in the same chip a core
    processors plus I/O elements and small amount of
    memories
  • Application specific processors which design to
    performed specific function (i.e. Network
    processors)

53
General Purpose Processors
  • These processors are used to built major computer
    platforms.
  • We can name
  • Intel / AMD based computers also called IBM
    compatible
  • Macintosh computers built using PowerPC
    processors
  • Sun machines that use Ultrasparc Processors.

54
Examples of General Purpose Processors
55
DSP
  • Digital Signal Processing (DSP) is used in a wide
    variety of applications, and it is hard to find a
    good definition that is general.
  • We can start by dictionary definitions of the
    words
  • Digital
  • operating by the use of discrete signals to
    represent data in the form of numbers
  • Signal
  • a variable parameter by which information is
    conveyed through an electronic circuit
  • Processing
  • to perform operations on data according to
    programmed instructions
  • Which leads us to a simple definition of Digital
    Signal processing
  • changing or analyzing information which is
    measured as discrete sequences of numbers

56
  • Note two unique features of Digital Signal
    processing as opposed to plain old ordinary
    digital processing
  • signals come from the real world - this intimate
    connection with the real world leads to many
    unique needs such as the need to react in real
    time and a need to measure signals and convert
    them to digital numbers
  • signals are discrete - which means the
    information in between discrete samples is lost
  • The advantages of DSP are common to many digital
    systems and include
  • Versatility
  • digital systems can be reprogrammed for other
    applications (at least where programmable DSP
    chips are used)
  • digital systems can be ported to different
    hardware (for example a different DSP chip or
    board level product)
  • Repeatability
  • digital systems can be easily duplicated
  • digital systems do not depend on strict component
    tolerances
  • digital system responses do not drift with
    temperature
  • Simplicity
  • some things can be done more easily digitally
    than with analogue systems

57
  • DSP is used in a very wide variety of
    applications.
  • But most share some common features
  • they use a lot of math (multiplying and adding
    signals)
  • they deal with signals that come from the real
    world
  • they require a response in a certain time
  • Where general purpose DSP processors are
    concerned, most applications deal with signal
    frequencies that are in the audio range.
Write a Comment
User Comments (0)
About PowerShow.com