CpE 442 Introduction to Computer Architecture The Role of Performance - PowerPoint PPT Presentation

About This Presentation
Title:

CpE 442 Introduction to Computer Architecture The Role of Performance

Description:

Introduction to Computer Architectures. Relating Processor Metrics ... SPEC first round. First round 1989; 10 programs, single number to summarize performance ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 26
Provided by: hanya6
Category:

less

Transcript and Presenter's Notes

Title: CpE 442 Introduction to Computer Architecture The Role of Performance


1
CpE 442Introduction to Computer Architecture
The Role of Performance
  • Instructor H. H. Ammar

2
Overview of Todays Lecture The Role of
Performance
  • Review from Last Lecture
  • Definition and Measures of Performance
  • Summarizing Performance and Performance Pitfalls

3
Review What is "Computer Architecture"
Co-ordination of levels of abstraction
Application
Operating
System
Compiler
Instruction Set Architecture
I/O system
Instr. Set Proc.
Digital Design
Circuit Design
Under a set of rapidly changing Forces
4
Review Levels of Representation
temp vk vk vk1 vk1 temp
High Level Language Program
Compiler
  • lw 15, 0(2)
  • lw 16, 4(2)
  • sw 16, 0(2)
  • sw 15, 4(2)

Assembly Language Program
Assembler
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111
0101 1000 0000 1001 1100 0110 1100 0110 1010
1111 0101 1000 0000 1001 0101 1000 0000 1001
1100 0110 1010 1111
Machine Language Program
Machine Interpretation
Control Signal Specification
5
Review Levels of Organization
SPARCstation 20
Computer
SPARC Processor
Memory
Devices
Control
Input
Datapath
Output
6
Review Summary from Last Lecture
  • All computers consist of five components
  • Processor (1) datapath and (2) control
  • (3) Memory
  • (4) Input devices and (5) Output devices
  • Not all memory are created equally
  • Cache fast (expensive) memory are placed closer
    to the processor
  • Main memory less expensive memory--we can have
    more
  • Input and output (I/O) devices has the messiest
    organization
  • Wide range of speed graphics vs. keyboard
  • Wide range of requirements speed, standard, cost
    ... etc.
  • Least amount of research (so far)

7
Processor Performance
8
Metrics of performance
Answers per month Operations per second
Application
Programming Language
Compiler
(millions) of Instructions per second
MIPS (millions) of (F.P.) operations per second
MFLOP/s
ISA
Datapath
Megabytes per second
Control
Function Units
Cycles per second (clock rate)
Transistors
Wires
Pins
9
Relating Processor Metrics
  • CPU execution time CPU clock cycles/pgm X clock
    cycle time
  • or CPU execution time CPU clock cycles/pgm
    clock rate
  • CPU clock cycles/pgm Instructions/pgm X CPI the
    avg. clock cycles per instruction
  • or CPI CPU clock cycles/pgm Instructions/pgm
  • CPI tells us something about the Instruction Set
    Architecture, the Implementation of that
    architecture, and the program measured

10
Aspects of CPU Performance
  • instr. count CPI clock rate
  • Program
  • Compiler
  • Instr. Set Arch.
  • Organization
  • Technology

11
Aspects of CPU Performance
  • instr count CPI clock rate
  • Program X (x)
  • Compiler X (x)
  • Instr. Set. X X
  • Organization X X
  • Technology X

12
Organizational Trade-offs
Application
Programming Language
Compiler
ISA
Instruction Mix
Datapath
CPI
Control
Function Units
Transistors
Wires
Pins
Cycle Time
13
CPI
Average cycles per instruction
CPI (CPU Time Clock Rate) / Instruction Count
Clock Cycles / Instruction Count
n
CPU time ClockCycleTime S CPI I
i
i
i 1
n
"instruction frequency"

CPI S CPI F where F
I
i
i
i
i
i 1
Instruction Count
  • Invest Resources where time is Spent!

14
Example
Base Machine (Reg / Reg) Op Freq(Fi) CPI(i)
Time ALU 50 1 .5 33 Load 20 2 .4
27 Store 10 2 .2 13 Branch 20 2 .4
27 1.5
Typical Mix
The CPI 1.5 cycles per instruction
Assignment 1 Turn in the solution of the
following problems from the text book By
Thursday September 4, Chapter 2, Exercises
Section, problems number 2.1, 2.2, 2.3, 2.4,
2.10, 2.11, 2.12, 2.13, and 2.15
15
Assume a program of 1 million instructions,
Compare the performance of Base Machine (B)
with the above CPI, 1 GHZ clock, and Enhanced
Machine (E) with 1.333 GHZ and a one cycle
increase for L/S And branch instructions
Enhanced Machine (Reg / Reg) Op Freq CPI(i)
Time ALU 50 1 .5 25 Load 20 3
.6 30 Store 10 3 .3 15 Branch20 3
.6 30 2.0
16
Perf. of machine X 1 / exec. Time of prog on
machine X Perf. of E / Perf. of B exec. Time
of B / exec. Time of E 1.5 1 / 2 0.75
1 Performance of B is similar to that of E, No
gain in performance
17
Marketing Metrics
  • MIPS Instruction Count / (Time 106)
  • Clock Rate / (CPI 106)
  • machines with different instruction sets ?
  • programs with different instruction mixes ?
  • dynamic frequency of instructions
  • uncorrelated with performance
  • MFLOP/S FP Operations / (Time 106)
  • machine dependent
  • often not where time is spent

18
Example showing why MIPS can failCompare
performance with Compilers 1 and 2 for a given
program on a given machine Instruction Count in
Billion for instruction classes A B
CCompiler 1 5 1
1Compiler 2 10 1
1clock cycles 1 2
3Clock cycles using compiler1 10
BillionClock cycles using compiler2 15
Billionassuming 1GHZ clockCPU Time 1 10
secsCPU Time 2 15 secsyet the MIPS rating
isMIPS 1 (instr. Count/cpu time in sec x 106)
700MIPS 2 800
19
Why Do Benchmarks?
  • How we evaluate differences
  • Different systems
  • Changes to a single system
  • Provide a target
  • Benchmarks should represent large class of
    important programs
  • Improving benchmark performance should help many
    programs
  • For better or worse, benchmarks shape a field
  • Good ones accelerate progress
  • good target for development
  • Bad benchmarks hurt progress
  • help real programs v. sell machines/papers?
  • Inventions that help real programs dont help
    benchmark

20
Programs to Evaluate Processor Performance
  • (Toy) Benchmarks
  • 10-100 line
  • e.g., sieve, puzzle, quicksort
  • Synthetic Benchmarks
  • attempt to match average frequencies of real
    workloads
  • e.g., Whetstone, dhrystone
  • Kernels
  • Time critical excerpts Real programs
  • e.g., gcc, spice

21
Successful Benchmark SPEC
  • EE Times 5 companies band together to perform
    Systems Performance Evaluation Committee (SPEC)
    in 1988 Sun, MIPS, HP, Apollo, DEC
  • Create standard list of programs, inputs,
    reporting some real programs, includes OS calls,
    some I/O

22
SPEC first round
  • First round 1989 10 programs, single number to
    summarize performance
  • One program 99 of time in single line of code
  • New front-end compiler could improve dramatically

23
SPEC second round, SPEC95
  • 8 integer benchmarks in C and 10 floating pt
    benchmarks in Fortran

24
Amdahl's Law
  • Speedup due to enhancement E
  • ExTime w/o E
    Performance w/ E
  • Speedup(E) --------------------
    ---------------------
  • ExTime w/ E
    Performance w/o E
  • Suppose that enhancement E accelerates a fraction
    F of the task
  • by a factor S and the remainder of the task is
    unaffected then,
  • ExTime(with E) ((1-F) F/S) X ExTime(without
    E)
  • Speedup(with E) ExTime(without E) ((1-F)
    F/S) X ExTime(without E)
  • lt 1/(1-F) speed up is bounded by this factor

25
Performance Evaluation Summary
  • Time is the measure of computer performance!
  • Good products created when have
  • Good benchmarks
  • Good ways to summarize performance
  • If not good benchmarks and summary, then choice
    between improving product for real programs vs.
    improving product to get more salesgt sales
    almost always wins
  • Remember Amdahls Law Speedup is limited by
    unimproved part of program
Write a Comment
User Comments (0)
About PowerShow.com