Computer System Performance Evaluation: Introduction - PowerPoint PPT Presentation

Loading...

PPT – Computer System Performance Evaluation: Introduction PowerPoint presentation | free to download - id: 5427d2-M2RlM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Computer System Performance Evaluation: Introduction

Description:

Title: Computer System Performance Evaluation: Introduction Author: Computer Science Last modified by: Eileen Kraemer Created Date: 8/22/2002 12:47:34 PM – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 43
Provided by: ComputeR48
Learn more at: http://www.cs.uga.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Computer System Performance Evaluation: Introduction


1
Computer System Performance Evaluation
Introduction
  • Eileen Kraemer
  • August 25, 2004

2
Evaluation Metrics
  • What are the measures of interest?
  • Time to complete task
  • Per workload type (RT /TP/ IC/batch)
  • Ability to deal with failures
  • Catastrophic / benign
  • Effective use of system resources

3
Performance Measures
  • Responsiveness
  • Usage level
  • Missionability
  • Dependability
  • Productivity

4
Classification of Computer Systems
  • General purpose
  • High availability
  • Real-time control
  • Mission-oriented
  • Long-life

5
Techniques in Performance Evaluation
  • Measurement
  • Simulation Modeling
  • Analytic Modeling
  • Hybrid Modeling

6
Applications of Performance Evaluation
  • System Design
  • System Selection
  • System Upgrade
  • System Tuning
  • System Analysis

7
Workload Characterization
  • Inputs to evaluation
  • Under admin control
  • Scheduling discipline, device connections,
    resource allocation policies .
  • Environmental inputs
  • Inter-event times, service demands, failures
  • workload
  • Drives the real system (measurement)
  • Input to simulation
  • Basis of distribution for analytic modeling

8
Workload characterization
  • How much detail? How to represent?
  • Analytical modeling
  • statistical properties
  • Simulation
  • Event trace, either recorded or generated
    according to some statistical properties

9
Benchmarking
  • Benchmarks are sets of well-known programs
  • Vendors run these programs and report results
    (some problems with this process)

10
Metrics used (in absence of benchmarks)..
  • Processing rate
  • MIPS (million instructions per second)
  • MFLOPS (million f.p. ops per second)
  • Not particularly useful
  • different instructions can take different amounts
    of time
  • Instructions and complexity of instructions
    differ from machine to machine, as will the of
    instructions required to execute a particular
    program

11
Benchmarks
  • Provide opportunity to compare running times of
    programs written in a HLL
  • Characterize an application domain
  • Consist of a set of typical programs
  • Some application benchmarks (real programs),
    others are synthetic benchmarks

12
Synthetic benchmarks
  • Programs designed to mimic real programs by
    matching their statistical properties
  • Fraction of statements of each type (, if, for)
  • Fraction of variables of each type (int v real v
    char) (local v global)
  • Fraction of expressions with certain number and
    type of operators, operands

13
Synthetic Benchmarks
  • Pro
  • Can model a domain of application programs in a
    single program

14
Synthetic Benchmarks
  • Con
  • If expressions for conditionals are chosen
    randomly, then code sections may be unreachable
    and eliminated by a smart compiler
  • Locality-of-reference seen in normal programs may
    be violated gt resource allocation algorithms
    that rely on locality-of-reference affected
  • May be small enough to fit in cache gt unusually
    good performance, not representative of domain
    the benchmark is designed to represent

15
Well-known benchmarks for measuring CPU
performance
  • Whetsone old
  • Dhrystone improved on Whetstone
  • Linpack
  • Newer
  • Spice, gcc, li, nasa7, livermore
  • See http//www.netlib.org/benchmark/
  • Java benchmarks
  • See
  • http//www-2.cs.cmu.edu/jch/java/resources.html

16
Whetstone (1972)
  • Synthetic
  • Models Fortran, heavy on f.p. ops
  • Outdated, arbitrary instruction mixes
  • Not useful with optimizing or parallelizing
    compilers
  • Results in mega-whetstones/sec

17
Dhrystone (1984)
  • Synthetic, C (originally Ada)
  • Models progs with mostly integer arithmetic and
    string manipulation
  • Only 100 HLL statements fits in cache
  • Calls only strcpy(), strcmp() if compiler
    inlines these, then not representative of real
    programs
  • Results stated in Dhrystones / second

18
Linpack
  • Solves a dense 100 x 100 linear system of
    equations using the Linpack library package
  • A(x) B(x) CD(x)
  • .. 80 of time
  • Still too small to really test out hw

19
Newer
  • Spice
  • Mostly Fortran, int and fp arith, analog circuit
    simulation
  • gcc
  • Gnu C compiler
  • Li
  • Lisp interpreter, written in C
  • Nasa7
  • Fortran, 7 kernels using double-precision
    arithmetic

20
How to compare machines?
B
A
C
E
D
21
How to compare machines?
A
B
VAX 11/780
C
Typical 1 MIPS machine
D
E
22
To calculate MIPS rating
  • Choose a benchmark
  • MIPS time on VAX / time on X
  • So, if benchmark takes 100 sec on VAX and 4 sec
    on X, then X is a 25 MIPS machine

23
Cautions in calculating MIPS
  • Benchmarks for all machines should be compiled by
    similar compilers with similar settings
  • Need to control and explicitly sate the
    configuration (cahce size, buffer sizes, etc.)

24
Features of interest for evaluation
  • Integer arithmetic
  • Floating point arithmetic
  • Cache management
  • Paging
  • I/O
  • Could test one at a time or, using synthetic
    program, exercise all at once

25
Synthetic programs ..
  • Evaluate multiple features simultaneously,
    parameterized for characteristics of workload
  • Pro
  • Beyond CPU performance, can also measure system
    throughput, investigate alternative strategies
  • Con
  • Complex, OS-dependent
  • Difficult to choose params that accurately
    reflect real workload
  • Generates lots of raw data

26
Script approach
  • Have real users work on machine of interest,
    recording all actions of users in real computing
    environment
  • Pro
  • Can compare system under control and test
    conditions (disk 1 v. disk 2), (buf size 1 v. buf
    size 2), etc. under real workload conditions
  • Con
  • Too many dependencies, may not work on other
    installations even if same machine
  • System neees to be up and running already
  • bulky

27
SPEC System Performance Evaluation Cooperative
(Corporation)
  • Mission to establish, maintain, and endorse a
    standardized set of relevant benchmarks for
    performance evaluation of modern computer systems
  • SPECCPU both int and fp version
  • Also for JVMs, web, graphics, other special
    purpose benchmarks
  • See http//www.specbench.org

28
Methodology
  • 10 benchmarks
  • Integer gcc, espresso, li, eqntott
  • Floating point spice, doduc, nasa7, matrix,
    fpppp, tomcatv

29
Metrics
  • SPECint
  • Geometric mean of t(gcc), t(espresso), t(li),
    t(eqntott)
  • SPECfp
  • Geometric mean of t(spice), t(doduc), t(nasa7),
    t(matrix), t(fppp), t(tomcatv)
  • SPECmark
  • Geometric mean of SPECint, SPECfp

30
Metrics, contd
  • SPEC thruput measure of CPU performance under
    moderate CPU contention
  • Multiprocessor with n processors two copies of
    SPEC benchmark run concurrently on each CPU,
    elapsed time noted
  • SPECthruput Time on machine X /time on VAX
    11/780

31
Geometric mean ???
  • Arithmetic mean(x1, x2xn)
  • (x1x2xn)/n
  • AM(10,50,90) (105090)/3 50
  • Geometric mean(x1,x2,xn)
  • nth root(x1x2xn)
  • GM(10,50,90) (105090)1/3 35-36
  • Harmonic mean(x1,x2,..,xn)
  • n/ (1/x1 1/x2 1/xn)
  • HM(10,50,90) 3/( 1/10 1/50 1/90) 22.88

32
Why geometric mean? Why not AM?
  • Arithmetic mean doesnt preserve running time
    ratios (nor does harmonic mean) geometric mean
    does
  • Example

33
Highly Parallel Architectures
  • For parallel machines/programs, performance
    depends on
  • Inherent parallelism of application
  • Ability of machine to exploit parallelism
  • Less than full parallelism may result in
    performance ltlt peak rate

34
Amdahls Law
  • f fraction of a program that is parallelizable
  • 1 f fraction of a program that is purely
    sequential
  • S(n) effective speed with n processors
  • S(n) S(1) / (1-f) f/n
  • As n-gtinfinity, S(n) -gt S(1)/(1-f)

35
Example
  • S(n) S(1) / (1-f) f/n
  • As n-gtinfinity, S(n) -gt S(1)/(1-f)
  • Let f 0.5, infinite n, max S(inf) 2
  • Let f 0.8, infinite n, max S(inf) 5
  • MIPS/MFLOPS not particularly useful for a
    parallel machine

36
Are synthetic benchmarks useful for evaluating
parallel machines?
  • Will depend on inherent parallelism
  • Data parallelism
  • Code parallelism

37
Data parallelism
  • multiple data items operated on in parallel by
    same op
  • SIMD machines
  • Works well with vectors, matrices, lists, sets
  • Metrics
  • avg data items operated on per op
  • (depends on problem size)
  • (data items operated on / data items) per op
  • Depends on type of problem

38
Code parallelism
  • How finely can problem be divided into parallel
    sub-units?
  • Metric average parallelism
  • ? Sum(n1, inf) n f(n)
  • f(n) fraction of code that can be split into at
    most n parallel activities
  • not that easy to estimate
  • not all that informative when you do ..
  • dependencies may exist between parallel tasks,
    or between parallel and non-parallel sections of
    code

39
Evaluating performance of parallel machines is
more difficult than doing so for sequential
machines
  • Problem
  • Well-designed parallel algorithm depends on
    number of processors, interconnection pattern
    (bus, crossbar, mesh), interaction
    mechanism(shared memory, message passing), vector
    register size
  • Solution
  • pick the optimal algorithm for each machine
  • Problem thats hard to do! .. And may also
    depend on actual number of processors, etc.

40
Other complications
  • Language limitations, dependencies
  • Compiler dependencies
  • OS characteristics
  • Timing (communication v. computation)
  • Process management (light v. heavy)

41
More complications
  • Small benchmark may reside in cache (Dhrystone)
  • Large memory may eliminate paging for medium
    programs, and effects of poor paging scheme
    hidden
  • Benchmark may not have enough I/o
  • Benchmark may have dead code, optimizable code

42
Metrics
  • Speedup S(p) running time of the best
    possible sequential alg / rt of the parallel imp
    using p processors
  • Efficiency S(p) /p
About PowerShow.com