Computer Architecture Chapter 4 Assessing and Understanding Performance - PowerPoint PPT Presentation

Loading...

PPT – Computer Architecture Chapter 4 Assessing and Understanding Performance PowerPoint presentation | free to download - id: 7b43ce-YTllN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Computer Architecture Chapter 4 Assessing and Understanding Performance

Description:

Department of Computer Science and Information Engineering Tunghai University, Taichung, Taiwan R.O.C. sscc6991_at_gmail.com http://www.csie.ntu.edu.tw/~d95037/ ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Computer Architecture Chapter 4 Assessing and Understanding Performance


1
Computer ArchitectureChapter 4 Assessing and
Understanding Performance
  • Yu-Lun Kuo ???
  • Department of Computer Science and Information
    Engineering
  • Tunghai University, Taichung, Taiwan R.O.C.
  • sscc6991_at_gmail.com
  • http//www.csie.ntu.edu.tw/d95037/

2
Opening
  • Indeed, the cost-performance ratio of the
    product will depend most heavily on the
    implementer, just as ease of use depends most
    heavily on the architect.
  • The Mythical Man-Month, Brooks

3
Introduction
  • Measure, Report, and Summarize
  • Make intelligent choices
  • See through the marketing hype
  • Key to understanding underlying organizational
    motivation
  • Why is some hardware better than others for
    different programs?
  • What factors of system performance are hardware
    related? (e.g., Do we need a new machine, or a
    new operating system?)
  • How does the machines instruction set affect
    performance?

4
Best Performance
  • How much faster is the Concorde compared to the
    747?
  • How much bigger is the 747 than the Douglas DC-8?

Response Time
Throughput
5
Performance Metrics
  • Purchasing perspective
  • given a collection of machines, which has the
  • best performance ?
  • least cost ?
  • best cost/performance?
  • Design perspective
  • faced with design options, which has the
  • best performance improvement ?
  • least cost ?
  • best cost/performance?

6
Performance Metrics
  • Both require
  • Basis for comparison
  • Metric for evaluation
  • Our goal is to understand what factors in the
    architecture
  • Contribute to overall system performance
  • The relative importance (and cost) of these
    factors

7
Computer Performance
  • Response Time (latency)(execution time)
  • The total time required for the computer to
    compute a task, including disk access, memory
    access, I/O activities, OS overhead, CPU
    execution time, and so on.
  • How long does it take for my job to run?
  • How long does it take to execute a job?
  • How long must I wait for the database query?
  • Throughput (???)
  • How many jobs can the machine run at once?
  • What is the average execution rate?
  • How much work is getting done?

8
Computer Performance
  • If we upgrade a machine with a new processor what
    do we increase?
  • If we add a new machine to the lab what do we
    increase?

9
Measuring Performance
  • Elapsed Time
  • Counts everything (disk and memory accesses, I/O,
    etc.)
  • A useful number, but often not good for
    comparison purposes
  • CPU time
  • Doesnt count I/O or time spent running other
    programs
  • Can be broken up into system time, and user time
  • Our focus user CPU time
  • Time spent executing the lines of code that are
    in our program

10
Example (Linux instruction)
  • Time instruction time
  • gt 90.7u 12.9s 239 65
  • CPU time ratio (90.712.9) / 159 65
  • I/O time ? more than 1/3

11
Books Definition of Performance
  • Normally interested in reducing
  • Response time (aka execution time) the time
    between the start and the completion of a task
  • Important to individual users
  • Thus, to maximize performance, need to minimize
    execution time

performanceX 1 / execution_timeX
If X is n times faster than Y, then
12
Books Definition of Performance
  • Throughput
  • The total amount of work done in a given time
  • Important to data center managers
  • Decreasing response time almost always improves
    throughput

13
Books Definition of Performance
  • Problem
  • Machine X runs a program in 10 seconds
  • Machine Y runs the same program in 15 seconds

14
Performance Factors
  • Want to distinguish elapsed time and the time
    spent on our task
  • CPU execution time (CPU time) time the CPU
    spends working on a task
  • Does not include time waiting for I/O or running
    other programs

15
Performance Factors
or
  • Can improve performance by reducing either the
    length of the clock cycle or the number of clock
    cycles required for a program

16
Clock Cycles
  • Instead of reporting execution time in seconds,
    we often use cycles
  • Clock ticks indicate when to start activities
  • Cycle time time between ticks seconds per
    cycle
  • Clock rate (frequency) cycles per second (1Hz
    1 cycle/sec)
  • A 4GHz clock has a cycle time

17
Review Machine Clock Rate
  • Clock rate (MHz, GHz) is inverse of clock cycle
    time (clock period)
  • CC 1 / CR

one clock period
10 nsec clock cycle gt 100 MHz clock rate 5
nsec clock cycle gt 200 MHz clock rate 2
nsec clock cycle gt 500 MHz clock rate 1 nsec
clock cycle gt 1 GHz clock rate 500 psec
clock cycle gt 2 GHz clock rate 250 psec
clock cycle gt 4 GHz clock rate 200 psec
clock cycle gt 5 GHz clock rate
18
How to Improve Performance
  • So, to improve performance (everything else being
    equal) you can either (increase or
    decrease?)_decrease_ the of required cycles
    for a program, or_decrease_ the clock cycle time
    or, said another way, _increase_ the clock rate.

19
How many cycles are required for a program?
  • Could assume that number of cycles equals number
    of instructions
  • This assumption is incorrect, different
    instructions take different amounts of time on
    different machines.Why? hint remember that
    these are machine instructions, not lines of C
    code

time
20
Different Numbers of Cycles for Different
Instructions
  • Multiplication takes more time than addition
  • Floating point operations take longer than
    integer ones
  • Accessing memory takes more time than accessing
    registers
  • Important point changing the cycle time often
    changes the number of cycles required for various
    instructions (more later)

time
21
Improving Performance
  • Our favorite program runs in 10 seconds on
    computer A, which has a 4GHz clock.
  • We are trying to help a computer designer build a
    new machine B, that will run this program in 6
    seconds.
  • The designer can use new (or perhaps more
    expensive) technology to substantially increase
    the clock rate, but has informed us that this
    increase will affect the rest of the CPU design,
    causing machine B to require 1.2 times as many
    clock cycles as machine A for the same program.
    What clock rate should we tell the designer to
    target?

22
Improving Performance
23
Improving Performance
24
Clock Cycles per Instruction (CPI)
  • Not all instructions take the same amount of time
    to execute
  • One way to think about execution time is that it
    equals the number of instructions executed
    multiplied by the average time per instruction

25
Clock Cycles per Instruction (CPI)
  • Clock cycles per instruction (CPI)
  • The average number of clock cycles each
    instruction takes to execute
  • A way to compare two different implementations of
    the same ISA

CPI for this instruction class CPI for this instruction class CPI for this instruction class
A B C
CPI 1 2 3
26
Effective CPI
  • Computing the overall effective CPI is done by
    looking at the different types of instructions
    and their individual cycle counts and averaging
  • Where ICi is the count (percentage) of the number
    of instructions of class i executed
  • CPIi is the (average) number of clock cycles per
    instruction for that instruction class
  • n is the number of instruction classes

n
Overall effective CPI ? (CPIi x ICi)
i 1
27
Performance Equation
  • Our basic performance equation is then

CPU time Instruction_count x CPI
x clock_cycle
or
28
Determinates of CPU Performance
  • CPU time Instruction_count x CPI x
    clock_cycle

Instruction_count CPI clock_cycle
Algorithm
Programming language
Compiler
ISA
Processor organization
Technology
29
Determinates of CPU Performance
  • CPU time Instruction_count x CPI x
    clock_cycle

Instruction_count CPI clock_cycle
Algorithm
Programming language
Compiler
ISA
Processor organization
Technology
X
X
X
X
X
X
X
X
X
X
X
X
30
CPI Example
  • Suppose we have two implementations of the same
    instruction set architecture (ISA).
  • For some program, Machine A has a clock cycle
    time of 250 ps and a CPI of 2.0
  • Machine B has a clock cycle time of 500 ps and a
    CPI of 1.2
  • What machine is faster for this program, and by
    how much?
  • If two machines have the same ISA which of our
    quantities (e.g., clock rate, CPI, execution
    time, of instructions, MIPS) will always be
    identical?

31
CPI Example (2)
  • ?????????(ISA)????????
  • ?????????
  • Machine A has a clock cycle time of 10 ns. and a
    CPI of 2.0
  • Machine B has a clock cycle time of 20 ns. and a
    CPI of 1.2
  • ????????, ??????
  • ??????????? ISA, ????????????(e.g., clock rate,
    CPI, execution time, of instructions,
    MIPSmillion instructions per second)?

32
Improving Performance
  • ?????????????????, ??????????, ???????
  • ???????I????
  • CPU clock cyclesAI2.0
  • CPU clock cyclesBI1.2
  • CPU time CPU clocks cycles clock cycle time
  • CPU performanceA/CPU performanceB ExecutionB /
    ExecutionA (1.220)/(210)1.2

33
Improving Performance
34
Now that We Understand Cycles
  • A given program will require
  • some number of instructions (machine
    instructions)
  • some number of cycles
  • some number of seconds
  • We have a vocabulary that relates these
    quantities
  • cycle time (seconds per cycle)
  • clock rate (cycles per second)
  • CPI (cycles per instruction) a floating point
    intensive application might have a higher CPI
  • MIPS (millions of instructions per second)this
    would be higher for a program using simple
    instructions

35
Performance
  • Performance is determined by execution time
  • Do any of the other variables equal performance?
  • of cycles to execute program?
  • of instructions in program?
  • of cycles per second?
  • average of cycles per instruction?
  • average of instructions per second?

36
Performance - Time
37
of Instructions Example
38
of Instructions Example
39
of Instructions Example
40
Performance What to measure
  • Usually rely on benchmarks vs. real workloads
  • To increase predictability, collections of
    benchmark applications-- benchmark suites -- are
    popular
  • SPECCPU popular desktop benchmark suite
  • CPU only, split between integer and floating
    point programs
  • SPECint2000 has 12 integer, SPECfp2000 has 14
    integer pgms
  • SPECCPU2006 to be announced Spring 2006
  • SPECSFS (NFS file server) and SPECWeb (WebServer)
    added as server benchmarks

41
Performance What to measure
  • Transaction Processing Council measures server
    performance and cost-performance for databases
  • TPC-C Complex query for Online Transaction
    Processing
  • TPC-H models ad hoc decision support
  • TPC-W a transactional web benchmark
  • TPC-App application server and web services
    benchmark

42
Benchmarks
  • Performance best determined by running a real
    application
  • Use programs typical of expected workload
  • Or, typical of expected class of
    applications e.g., compilers/editors, scientific
    applications, graphics, etc.
  • Small benchmarks
  • nice for architects and designers
  • easy to standardize
  • can be abused

43
Benchmarks
  • SPEC (System Performance Evaluation Cooperative)
  • companies have agreed on a set of real program
    and inputs
  • valuable indicator of performance (and compiler
    technology)
  • can still be abused

44
SPEC Benchmarks www.spec.org
Integer benchmarks Integer benchmarks FP benchmarks FP benchmarks
gzip compression wupwise Quantum chromodynamics
vpr FPGA place route swim Shallow water model
gcc GNU C compiler mgrid Multigrid solver in 3D fields
mcf Combinatorial optimization applu Parabolic/elliptic pde
crafty Chess program mesa 3D graphics library
parser Word processing program galgel Computational fluid dynamics
eon Computer visualization art Image recognition (NN)
perlbmk perl application equake Seismic wave propagation simulation
gap Group theory interpreter facerec Facial image recognition
vortex Object oriented database ammp Computational chemistry
bzip2 compression lucas Primality testing
twolf Circuit place route fma3d Crash simulation fem
sixtrack Nuclear physics accel
apsi Pollutant distribution
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
MIPS example
52
(No Transcript)
53
MIPS Example
54
Amdahls Law
  • Execution Time After Improvement
  • Execution Time Unaffected ( Execution Time
    Affected / Amount of Improvement )
  • Example "Suppose a program runs in 100 seconds
    on a machine, with multiply responsible for 80
    seconds of this time. How much do we have to
    improve the speed of multiplication if we want
    the program to run 4 times faster?" How about
    making it 5 times faster???!!
  • Principle Make the common case fast

55
Amdahls Law
56
Remember
  • Performance is specific to a particular program
  • Total execution time is a consistent summary of
    performance
  • For a given architecture performance increases
    come from
  • Increases in clock rate (without adverse CPI
    affects)
  • Improvements in processor organization that lower
    CPI
  • Compiler enhancements that lower CPI and/or
    instruction count
  • Algorithm/Language choices that affect
    instruction count

57
Textbook Contents
  • 1 Computer Abstractions and Technology
  • 2 Instructions Language of the Computer
  • 3 Arithmetic for Computers
  • 4 Assessing and Understanding Performance
  • 5 The Processor Datapath and Control
  • 6 Enhancing Performance with Pipelining
  • 7 Large and Fast Exploiting Memory Hierarchy
  • 8 Storage, Networks, and Other Peripherals
  • 9 Multiprocessors and Clusters
About PowerShow.com