Performance - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Performance

Description:

Performance what is it: measures of performance The CPU Performance Equation: Execution time as the measure what affects execution time examples Choosing good benchmarks? – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 39
Provided by: Systemtek1
Category:

less

Transcript and Presenter's Notes

Title: Performance


1
Performance
  • Performance
  • what is it measures of performance
  • The CPU Performance Equation
  • Execution time as the measure
  • what affects execution time
  • examples
  • Choosing good benchmarks?
  • choosing bad benchmarks?
  • Amdahl's Law

2
Performance is Time
  • Time to do the task (Execution Time)
  • execution time, response time, latency
  • Tasks per unit time (sec, minute, ...)
  • throughput, bandwidth

3
Performance as Response Time
  • Performance is most often measured as response
    time or execution time for some task.
  • X is n times faster than Y means
  • Performance(X) Execution Time(Y)
  • n
  • Performance(Y) Execution Time(X)
  • Example
  • Execution time of program P
  • X is 5 sec Y is 10 sec.
  • X is 2 times faster than Y.

4
What time to measure?
  • Elapsed time, wall-clock time
  • actual time from start to completion
  • depends on CPU, system, I/O, etc.
  • often used in real benchmarks
  • only suitable choice when I/O is included
  • CPU Time
  • measure/analyze CPU performance only
  • may be suitable when machine is timeshared
  • possibly both user and system component
  • User CPU time is our focus for first part of
    course
  • Elapsed time CPU time Idle time
  • usually and assuming time is accurately accounted
    for

5
Metrics of performance
  • Different performance metrics are appropriate at
    different levels

Frames per second Operations per second
(millions) of Instructions per second
MIPS (millions) of (F.P.) operations per second
MFLOP/s
ISA
Cycles per second (clock rate) Cycles per
Instruction
Datapath
Control
Function Units
Transistors
6
Relating Processor Metrics
  • CPU execution time per program
  • CPU clock cycles/program X Clock cycle time
  • CPU clock cycles/program Clock rate
    (frequency)
  • CPU clock cycles/program
  • Instructions/program X Clock cycles Per
    Instruction
  • Clock cycles Per Instruction (CPI) is an average
    measurement, it depends on
  • ISA, the implementation, and the program measured
  • CPI CPU clock cycles/program
    Instructions/program
  • Also, Instructions per clock cycle or IPC 1 /
    CPI
  • CPU execution time Instructions X CPI X Clock
    cycle

7
Lets look at the single-cycle model analytically
8
Static timing analysis
  • Memories 10 ns
  • Register 5 ns
  • Adders 10 ns
  • ALU 10 ns
  • Use topological sort!

9
Zero ext.
35 ns delay
5 ns
10 ns
Branch logic
0
A
10 ns
ALU
4
B

31

10 ns
Sgn/Ze extend
lw 2 const(3) 10 ns
10 ns
10
But that path goes through the data memory!
  • What if this is not a load/store?
  • How about an instruction that does nothing?
  • NOP

11
Zero ext.
10 ns delay
5 ns
10 ns
Branch logic
0
A
10 ns
ALU
4
B

31

10 ns
Sgn/Ze extend
Nop 10 ns
10 ns
12

Zero ext.
25 ns delay
5 ns
10 ns
Branch logic
0
A
10 ns
ALU
4
B

31

10 ns
Sgn/Ze extend
Add ra rb rc 10 ns
10 ns
13

Zero ext.
20 ns delay
5 ns
10 ns
Branch logic
0
A
10 ns
ALU
4
B

31

10 ns
Sgn/Ze extend
B label 10 ns
10 ns
14
  • 35 ns for load/store
  • but
  • 10 ns for NOP !?

15
Amdahls Law
  • Make the common case fast

16
Amdahl's Law
  • Handy for evaluating impact of a change not tied
    to CPU performance equation
  • Insight No improvement of a feature enhances
    performance by more than the use of the feature.
  • Suppose that enhancement E accelerates fraction F
    of a program by a factor S (remainder of the task
    is unaffected)
  • ExecTimeE (1 F(1 1/S)) X ExecTimewithout

E
F
1-F
1-F
F/S
S
17
What if we dont need the ALU?
  • A branch instruction?

18
BUT!
  • The single cycle model has to accomodate the
    slowest instruction
  • Even if it rarely occurs!

19
How much work can our structure perform?
  • For a program Q
  • Time Number of executed instruction
  • Number of cycles per instruction
  • Time per cycle
  • T Nq CPI Tc

20
For the single cycle model....
  • CPI 1 for all instructions
  • Tc determined by the slowest instruction

21
How to reduce T?
  • T Nq CPI Tc
  • Reduce Nq.
  • More powerful instructions!
  • More hardware, longer paths, cycle time
  • goes up (slower machine)

22
No free lunch
  • Why designers are so well paid -
  • to optimize designs.

23
How to reduce T?
  • T Nq CPI Tc
  • Faster hardware
  • Technological limits
  • Cost increase not linearly related
  • Sales volume drops

24
How to reduce T?
  • T Nq CPI Tc
  • Make this a function of the instruction
  • For example NOP 1 cycle
  • LW 4 cycles
  • Chapter 5.4, the classical method

25
How to reduce T?
  • T Nq CPI Tc
  • Make this a function of the instruction
  • CPI goes up, but we can use an average,
  • not the worst case
  • Tc goes down, time to do the longes step,
  • not the entire instruction

26
Example
  • Branch Step 1 fetch
  • Step 2 New PC
  • Add Step 1 fetch
  • Step 2 decode/ register fetch
  • Step 3 Compute and write back

27
Example
  • LW 4 steps
  • Cycletime 1/4 old time
  • T 4 1/4 old time,
  • LW CPI
  • just as slow for the lw instruction
  • our worst case!

28
But thats not important if LW is not common!
  • T Nq CPI 1/4 old time

Averaged over this many instructions
1,3? 1,7? Never 4,0!
29
We win because of quantitative statisticalpropert
ies of our programs!
30
What value of CPI do we use?
  • 1,3? 1,5? 1,7?
  • Easy Use average program!
  • ?

31
There is no such thing!
32
Artificial average programs called benchmarks
  • Are they something to trust?
  • What about peak performance values
  • mips? mflops?
  • We have a peak at CPI 1....
  • ...a program of only NO-OPS!

33
Why Do Benchmarks?
  • How we evaluate performance differences
  • Across and within a single system (design
    variations)
  • What should benchmarks do?
  • Represent a large class of important programs
  • Behave like typical programs
  • improved benchmark performance gt improved
    performance broadly
  • For better or worse, benchmarks shape a field
  • Good ones accelerate progress
  • Bad benchmarks hurt progress
  • help real programs vs. sell machines/papers?
  • Enhancements that help benchmarks may not help
    most programs and v.v.

34
Classes of Benchmarks
  • (Toy) Benchmarks
  • 10-100 linee.g., sieve, puzzle, quicksort
  • good first programming assignments
  • Synthetic Benchmarks
  • attempt to match average frequencies of real
    workloads
  • e.g., Whetstone, dhrystone
  • mostly good for nothing too artificial
  • Kernels
  • Time critical excerpts of real programs
  • e.g., Livermore loops, Linpack
  • good for micro-performance studies
  • Real programs
  • e.g., gcc, spice, Verilog, Database, stock trading

35
Successful Benchmark SPEC Collection
  • 1987 RISC industry (workstations) mired in bench
    marketing
  • (That is an 8 MIPS machine, but they claim 10
    MIPS!)
  • EE Times 5 companies band together to perform
    Systems Performance Evaluation Committee (SPEC)
    in 1988
  • Sun, MIPS, HP, Apollo, DEC
  • Create standard list of programs, inputs,
    reporting rules
  • several real programs, including OS calls
  • some I/O
  • rules for running and reporting

36
Multiple clock cycle designs
  • State machines
  • Micro programming
  • chapter 5.4
  • Computer Organization Design

37
How to reduce T?
  • T Nq CPI Tc
  • Reduce quotient cycles / instruction
  • reduce cycles multiple clock-
  • cycle design
  • Increase instruction execute more
  • than one instr.
  • per cycle!

38
More than one instruction per cycle?
  • Parallelism
  • Div/mult floating point integer
  • Superscalarity
  • Multiple issue etc.
  • Pipelining
  • Of general importance
Write a Comment
User Comments (0)
About PowerShow.com