Title: CpE 442 Introduction to Computer Architecture The Role of Performance
1CpE 442Introduction to Computer Architecture
The Role of Performance
2Overview of Todays Lecture The Role of
Performance
- Review from Last Lecture
- Definition and Measures of Performance
- Summarizing Performance and Performance Pitfalls
3Review What is "Computer Architecture"
Co-ordination of levels of abstraction
Application
Operating
System
Compiler
Instruction Set Architecture
I/O system
Instr. Set Proc.
Digital Design
Circuit Design
Under a set of rapidly changing Forces
4Review Levels of Representation
temp vk vk vk1 vk1 temp
High Level Language Program
Compiler
- lw 15, 0(2)
- lw 16, 4(2)
- sw 16, 0(2)
- sw 15, 4(2)
Assembly Language Program
Assembler
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111
0101 1000 0000 1001 1100 0110 1100 0110 1010
1111 0101 1000 0000 1001 0101 1000 0000 1001
1100 0110 1010 1111
Machine Language Program
Machine Interpretation
Control Signal Specification
5Review Levels of Organization
SPARCstation 20
Computer
SPARC Processor
Memory
Devices
Control
Input
Datapath
Output
6Review Summary from Last Lecture
- All computers consist of five components
- Processor (1) datapath and (2) control
- (3) Memory
- (4) Input devices and (5) Output devices
- Not all memory are created equally
- Cache fast (expensive) memory are placed closer
to the processor - Main memory less expensive memory--we can have
more - Input and output (I/O) devices has the messiest
organization - Wide range of speed graphics vs. keyboard
- Wide range of requirements speed, standard, cost
... etc. - Least amount of research (so far)
7Processor Performance
8Metrics of performance
Answers per month Operations per second
Application
Programming Language
Compiler
(millions) of Instructions per second
MIPS (millions) of (F.P.) operations per second
MFLOP/s
ISA
Datapath
Megabytes per second
Control
Function Units
Cycles per second (clock rate)
Transistors
Wires
Pins
9Relating Processor Metrics
- CPU execution time CPU clock cycles/pgm X clock
cycle time - or CPU execution time CPU clock cycles/pgm
clock rate - CPU clock cycles/pgm Instructions/pgm X CPI the
avg. clock cycles per instruction - or CPI CPU clock cycles/pgm Instructions/pgm
- CPI tells us something about the Instruction Set
Architecture, the Implementation of that
architecture, and the program measured
10Aspects of CPU Performance
- instr. count CPI clock rate
- Program
- Compiler
- Instr. Set Arch.
- Organization
- Technology
11Aspects of CPU Performance
- instr count CPI clock rate
- Program X (x)
- Compiler X (x)
- Instr. Set. X X
- Organization X X
- Technology X
12Organizational Trade-offs
Application
Programming Language
Compiler
ISA
Instruction Mix
Datapath
CPI
Control
Function Units
Transistors
Wires
Pins
Cycle Time
13CPI
Average cycles per instruction
CPI (CPU Time Clock Rate) / Instruction Count
Clock Cycles / Instruction Count
n
CPU time ClockCycleTime S CPI I
i
i
i 1
n
"instruction frequency"
CPI S CPI F where F
I
i
i
i
i
i 1
Instruction Count
- Invest Resources where time is Spent!
14Example
Base Machine (Reg / Reg) Op Freq(Fi) CPI(i)
Time ALU 50 1 .5 33 Load 20 2 .4
27 Store 10 2 .2 13 Branch 20 2 .4
27 1.5
Typical Mix
The CPI 1.5 cycles per instruction
Assignment 1 Turn in the solution of the
following problems from the text book By
Thursday September 4, Chapter 2, Exercises
Section, problems number 2.1, 2.2, 2.3, 2.4,
2.10, 2.11, 2.12, 2.13, and 2.15
15Assume a program of 1 million instructions,
Compare the performance of Base Machine (B)
with the above CPI, 1 GHZ clock, and Enhanced
Machine (E) with 1.333 GHZ and a one cycle
increase for L/S And branch instructions
Enhanced Machine (Reg / Reg) Op Freq CPI(i)
Time ALU 50 1 .5 25 Load 20 3
.6 30 Store 10 3 .3 15 Branch20 3
.6 30 2.0
16Perf. of machine X 1 / exec. Time of prog on
machine X Perf. of E / Perf. of B exec. Time
of B / exec. Time of E 1.5 1 / 2 0.75
1 Performance of B is similar to that of E, No
gain in performance
17Marketing Metrics
- MIPS Instruction Count / (Time 106)
- Clock Rate / (CPI 106)
- machines with different instruction sets ?
- programs with different instruction mixes ?
- dynamic frequency of instructions
- uncorrelated with performance
- MFLOP/S FP Operations / (Time 106)
- machine dependent
- often not where time is spent
18Example showing why MIPS can failCompare
performance with Compilers 1 and 2 for a given
program on a given machine Instruction Count in
Billion for instruction classes A B
CCompiler 1 5 1
1Compiler 2 10 1
1clock cycles 1 2
3Clock cycles using compiler1 10
BillionClock cycles using compiler2 15
Billionassuming 1GHZ clockCPU Time 1 10
secsCPU Time 2 15 secsyet the MIPS rating
isMIPS 1 (instr. Count/cpu time in sec x 106)
700MIPS 2 800
19Why Do Benchmarks?
- How we evaluate differences
- Different systems
- Changes to a single system
- Provide a target
- Benchmarks should represent large class of
important programs - Improving benchmark performance should help many
programs - For better or worse, benchmarks shape a field
- Good ones accelerate progress
- good target for development
- Bad benchmarks hurt progress
- help real programs v. sell machines/papers?
- Inventions that help real programs dont help
benchmark
20Programs to Evaluate Processor Performance
- (Toy) Benchmarks
- 10-100 line
- e.g., sieve, puzzle, quicksort
- Synthetic Benchmarks
- attempt to match average frequencies of real
workloads - e.g., Whetstone, dhrystone
- Kernels
- Time critical excerpts Real programs
- e.g., gcc, spice
21Successful Benchmark SPEC
- EE Times 5 companies band together to perform
Systems Performance Evaluation Committee (SPEC)
in 1988 Sun, MIPS, HP, Apollo, DEC - Create standard list of programs, inputs,
reporting some real programs, includes OS calls,
some I/O
22SPEC first round
- First round 1989 10 programs, single number to
summarize performance - One program 99 of time in single line of code
- New front-end compiler could improve dramatically
23SPEC second round, SPEC95
- 8 integer benchmarks in C and 10 floating pt
benchmarks in Fortran
24Amdahl's Law
- Speedup due to enhancement E
- ExTime w/o E
Performance w/ E - Speedup(E) --------------------
--------------------- - ExTime w/ E
Performance w/o E - Suppose that enhancement E accelerates a fraction
F of the task - by a factor S and the remainder of the task is
unaffected then, - ExTime(with E) ((1-F) F/S) X ExTime(without
E) - Speedup(with E) ExTime(without E) ((1-F)
F/S) X ExTime(without E) - lt 1/(1-F) speed up is bounded by this factor
25Performance Evaluation Summary
- Time is the measure of computer performance!
- Good products created when have
- Good benchmarks
- Good ways to summarize performance
- If not good benchmarks and summary, then choice
between improving product for real programs vs.
improving product to get more salesgt sales
almost always wins - Remember Amdahls Law Speedup is limited by
unimproved part of program