CPE%20619%20Workloads:%20Types,%20Selection,%20Characterization - PowerPoint PPT Presentation

About This Presentation
Title:

CPE%20619%20Workloads:%20Types,%20Selection,%20Characterization

Description:

CPU-I/O overlap may not be representative ... SPECapc: performance of several 3D-intensive popular applications on a given system ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 106
Provided by: Mil36
Learn more at: http://www.ece.uah.edu
Category:

less

Transcript and Presenter's Notes

Title: CPE%20619%20Workloads:%20Types,%20Selection,%20Characterization


1
CPE 619Workloads Types, Selection,
Characterization
  • Aleksandar Milenkovic
  • The LaCASA Laboratory
  • Electrical and Computer Engineering Department
  • The University of Alabama in Huntsville
  • http//www.ece.uah.edu/milenka
  • http//www.ece.uah.edu/lacasa

2
Part II Measurement Techniques and Tools
  • Measurements are not to provide numbers but
    insight - Ingrid Bucher
  • Measure computer system performance
  • Monitor the system that is being subjected to a
    particular workload
  • How to select appropriate workload
  • In general performance analysis should know
  • What are the different types of workloads?
  • Which workloads are commonly used by other
    analysts?
  • How are the appropriate workload types selected?
  • How is the measured workload data summarized?
  • How is the system performance monitored?
  • How can the desired workload be placed on the
    system in a controlled manner?
  • How are the results of the evaluation presented?

3
Types of Workloads
benchmark v. trans. To subject (a system) to a
series of tests In order to obtain prearranged
results not available on Competitive systems.
S. Kelly-Bootle, The Devils DP Dictionary
  • Test workload denotes any workload used in
    performance study
  • Real workload one observed on a system while
    being used
  • Cannot be repeated (easily)
  • May not even exist (proposed system)
  • Synthetic workload similar characteristics to
    real workload
  • Can be applied in a repeated manner
  • Relatively easy to port Relatively easy to
    modify without affecting operation
  • No large real-world data files No sensitive data
  • May have built-in measurement capabilities
  • Benchmark Workload
  • Benchmarking is process of comparing 2 systems
    with workloads

4
Test Workloads for Computer Systems
  • Addition instructions
  • Instruction mixes
  • Kernels
  • Synthetic programs
  • Application benchmarks

5
Addition Instructions
  • Early computers had CPU as most expensive
    component
  • System performance Processor Performance
  • CPUs supported few operations the most frequent
    one was addition
  • Computer with faster addition instruction
    performed better
  • Run many addition operations as test workload
  • Problem
  • More operations, not only addition
  • Some more complicated than others

6
Instruction Mixes
  • Number and complexity of instructions increased
  • Additions were no longer sufficient
  • Could measure instructions individually, but
    they are used in different amounts
  • gt Measure relative frequencies of various
    instructions on real systems
  • Use as weighting factors to get average
    instruction time
  • Instruction mix specification of various
    instructions coupled with their usage frequency
  • Use average instruction time to compare different
    processors
  • Often use inverse of average instruction time
  • MIPS Million Instructions Per Second
  • FLOPS Millions of Floating-Point Operations Per
    Second
  • Gibson mix Developed by Jack C. Gibson in 1959
    for IBM 704 systems

7
Example Gibson Instruction Mix
  • Load and Store 13.2
  • Fixed-Point Add/Sub 6.1
  • Compares 3.8
  • Branches 16.6
  • Float Add/Sub 6.9
  • Float Multiply 3.8
  • Float Divide 1.5
  • Fixed-Point Multiply 0.6
  • Fixed-Point Divide 0.2
  • Shifting 4.4
  • Logical And/Or 1.6
  • Instructions not using regs 5.3
  • Indexing 18.0
  • Total 100

1959, IBM 650 IBM 704
8
Problems with Instruction Mixes
  • In modern systems, instruction time variable
    depending upon
  • Addressing modes, cache hit rates, pipelining
  • Interference with other devices during
    processor-memory access
  • Distribution of zeros in multiplier
  • Times a conditional branch is taken
  • Mixes do not reflect special hardware such as
    page table lookups
  • Only represents speed of processor
  • Bottleneck may be in other parts of system

9
Kernels
  • Pipelining, caching, address translation, made
    computer instruction times highly variable
  • Cannot use individual instructions in isolation
  • Instead, use higher level functions
  • Kernel the most frequent function (kernel
    nucleus)
  • Commonly used kernels Sieve, Puzzle, Tree
    Searching, Ackerman's Function, Matrix Inversion,
    and Sorting
  • Disadvantages
  • Do not make use of I/O devices
  • Ad-hoc selection of kernels (not based on real
    measurements)

10
Synthetic Programs
  • Proliferation in computer systems, OS emerged,
    changes in applications
  • No more processing-only apps, I/O became
    important too
  • Use simple exerciser loops
  • Make a number of service calls or I/O requests
  • Compute average CPU time and elapsed time for
    each service call
  • Easy to port, distribute (Fortran, Pascal)
  • First exerciser loop by Buchholz (1969)
  • Called it synthetic program
  • May have built-in measurement capabilities

11
Example of Synthetic Workload Generation Program
Buchholz, 1969
12
Synthetic Programs
  • Advantages
  • Quickly developed and given to different vendors
  • No real data files
  • Easily modified and ported to different systems
  • Have built-in measurement capabilities
  • Measurement process is automated
  • Repeated easily on successive versions of the
    operating systems
  • Disadvantages
  • Too small
  • Do not make representative memory or disk
    references
  • Mechanisms for page faults and disk cache may not
    be adequately exercised
  • CPU-I/O overlap may not be representative
  • Not suitable for multi-user environments because
    loops may create synchronizations, which may
    result in better or worse performance

13
Application Workloads
  • For special-purpose systems, may be able to run
    representative applications as measure of
    performance
  • E.g. airline reservation
  • E.g. banking
  • Make use of entire system (I/O, etc)
  • Issues may be
  • Input parameters
  • Multiuser
  • Only applicable when specific applications are
    targeted
  • For a particular industry Debit-Credit for Banks

14
Benchmarks
  • Benchmark workload
  • Kernels, synthetic programs, application-level
    workloads are all called benchmarks
  • Instruction mixes are not called benchrmarks
  • Some authors try to restrict the term benchmark
    only to a set of programs taken from real
    workloads
  • Benchmarking is the process of performance
    comparison of two or more systems by measurements
  • Workloads used in measurements are called
    benchmarks

15
Popular Benchmarks
  • Sieve
  • Ackermans Function
  • Whetstone
  • Linpack
  • Dhrystone
  • Lawrence Livermore Loops
  • SPEC
  • Debit-card Benchmark
  • TPC
  • EMBS

16
Sieve (1 of 2)
  • Sieve of Eratosthenes (finds primes)
  • Write down all numbers 1 to n
  • Strike out multiples of k for k 2, 3, 5
    sqrt(n)
  • In steps of remaining numbers

17
Sieve (2 of 2)
18
Ackermanns Function (1 of 2)
  • Assess efficiency of procedure calling mechanisms
  • Ackermanns Function has two parameters, and it
    is defined recursively
  • Benchmark is to call Ackerman(3,n) for values of
    n 1 to 6
  • Average execution time per call, the number of
    instructions executed, and the amount of stack
    space required for each call are used to compare
    various systems
  • Return value is 2n3-3, can be used to verify
    implementation
  • Number of calls
  • (512x4n-1 15x2n3 9n 37)/3
  • Can be used to compute time per call
  • Depth is 2n3 4, stack space doubles when n

19
Ackermanns Function (2 of 2)
(Simula)
20
Whetstone
  • Set of 11 modules designed to match observed
    frequencies in ALGOL programs
  • Array addressing, arithmetic, subroutine calls,
    parameter passing
  • Ported to Fortran, most popular in C,
  • Many variations of Whetstone, so take care when
    comparing results
  • Problems specific kernel
  • Only valid for small, scientific (floating) apps
    that fit in cache
  • Does not exercise I/O

21
LINPACK
  • Developed by Jack Dongarra (1983) at ANL
  • Programs that solve dense systems of linear
    equations
  • Many float adds and multiplies
  • Core is Basic Linear Algebra Subprograms (BLAS),
    called repeatedly
  • Usually, solve 100x100 system of equations
  • Represents mechanical engineering applications
    on workstations
  • Drafting to finite element analysis
  • High computation speed and good graphics
    processing

22
Dhrystone
  • Pun on Whetstone
  • Intent to represent systems programming
    environments
  • Most common was in C, but many versions
  • Low nesting depth and instructions in each call
  • Large amount of time copying strings
  • Mostly integer performance with no float
    operations

23
Lawrence Livermore Loops
  • 24 vectorizable, scientific tests
  • Floating point operations
  • Physics and chemistry apps spend about 40-60 of
    execution time performing floating point
    operations
  • Relevant for fluid dynamics, airplane design,
    weather modeling

24
SPEC
  • Systems Performance Evaluation Cooperative (SPEC)
    (http//www.spec.org)
  • Non-profit, founded in 1988, by leading HW and SW
    vendors
  • Aim ensure that the marketplace has a fair and
    useful set of metrics to differentiate candidate
    systems
  • Product fair, impartial and meaningful
    benchmarks for computers
  • Initially, focus on CPUs SPEC89, SPEC92, SPEC95,
    SPEC CPU 2000, SPEC CPU 2006
  • Now, many suites are available
  • Results are published on the SPEC web site

25
SPEC (contd)
  • Benchmarks aim to test "real-life" situations
  • E.g., SPECweb2005 tests web server performance by
    performing various types of parallel HTTP
    requests
  • E.g., SPEC CPU tests CPU performance by measuring
    the run time of several programs such as the
    compiler gcc and the chess program crafty.
  • SPEC benchmarks are written in a platform neutral
    programming language (usually C or Fortran), and
    the interested parties may compile the code using
    whatever compiler they prefer for their platform,
    but may not change the code
  • Manufacturers have been known to optimize their
    compilers to improve performance of the various
    SPEC benchmarks

26
SPEC Benchmark Suits (Current)
  • SPEC CPU2006 combined performance of CPU, memory
    and compiler
  • CINT2006 ("SPECint") testing integer arithmetic,
    with programs such as compilers, interpreters,
    word processors, chess programs etc.
  • CFP2006 ("SPECfp") testing floating point
    performance, with physical simulations, 3D
    graphics, image processing, computational
    chemistry etc.
  • SPECjms2007 Java Message Service performance
  • SPECweb2005 PHP and/or JSP performance.
  • SPECviewperf performance of an OpenGL 3D
    graphics system, tested with various rendering
    tasks from real applications
  • SPECapc performance of several 3D-intensive
    popular applications on a given system
  • SPEC OMP V3.1 for evaluating performance of
    parallel systems using OpenMP (http//www.openmp.o
    rg) applications.
  • SPEC MPI2007 for evaluating performance of
    parallel systems using MPI (Message Passing
    Interface) applications.
  • SPECjvm98 performance of a java client system
    running a Java virtual machine
  • SPECjAppServer2004 a multi-tier benchmark for
    measuring the performance of Java 2 Enterprise
    Edition (J2EE) technology-based application
    servers.
  • SPECjbb2005 evaluates the performance of server
    side Java by emulating a three-tier client/server
    system (with emphasis on the middle tier).
  • SPEC MAIL2001 performance of a mail server,
    testing SMTP and POP protocols
  • SPECpower_2008 evaluates the energy efficiency
    of server systems.
  • SPEC SFS97_R1 NFS file server throughput and
    response time

27
SPEC CPU Benchmarks
28
SPEC CPU2006 Speed Metrics
  • Run and reporting rules guidelines required to
    build, run, and report on the SPEC CPU2006
    benchmarks
  • http//www.spec.org/cpu2006/Docs/runrules.html
  • Speed metrics
  • SPECint_base2006 (Required Base result)
    SPECint2006 (Optional Peak result)
  • SPECfp_base2006 (Required Base result)
    SPECfp2006 (Optional Peak result)
  • The elapsed time in seconds for each of the
    benchmarks is given and the ratio to the
    reference machine (a Sun UltraSparc II system at
    296MHz) is calculated
  • The SPECint_base2006 and SPECfp_base2006 metrics
    are calculated as a Geometric Mean of the
    individual ratios
  • Each ratio is based on the median execution time
    from three VALIDATED runs

29
SPEC CPU2006 Throughput Metrics
  • SPECint_rate_base2006 (Required Base result)
    SPECint_rate2006 (Optional Peak result)
  • SPECfp_rate_base2006 (Required Base result)
    SPECfp_rate2006 (Optional Peak result)
  • Select the number of concurrent copies of each
    benchmark to be run (e.g. CPUs)
  • The same number of copies must be used for all
    benchmarks in a base test
  • This is not true for the peak results where the
    tester is free to select any combination of
    copies
  • The "rate" calculated for each benchmark is a
    function of    (the number of copies run
    reference factor for the benchmark) / elapsed
    time in seconds which yields a rate in
    jobs/time.
  • The rate metrics are calculated as a geometric
    mean from the individual SPECrates using the
    median result from three runs

30
Debit-Credit (1/3)
  • Application-level benchmark
  • Was de-facto standard for Transaction Processing
    Systems
  • Retail bank wanted 1000 branches, 10k tellers,
    10,000k accounts online with peak load of 100 TPS
  • Performance in TPS where 95 of all transactions
    with 1 second or less of response time (arrival
    of last bit, sending of first bit)
  • Each TPS requires 10 branches, 100 tellers, and
    100,000 accounts
  • System claiming 50 TPS performance should run
    500 branches 5,000 tellers 5,000,000 accounts

31
Debit-Credit (2/3)
32
Debit-Credit (3/3)
  • Metric price/performance ratio
  • Performance Throughput in terms of TPS such that
    95 of all transactions provide one second or
    less response time
  • Response time Measured as the time interval
    between the arrival of the last bit from the
    communications line and the sending of the first
    bit to the communications line
  • Cost Total expenses for a five-year period on
    purchase, installation, and maintenance of the
    hardware and software in the machine room
  • Cost does not include expenditures for
    terminals, communications, application
    development, or operations
  • Pseudo-code Definition of Debit-Credit
  • See Figure 4.5 in the book

33
TPC
  • Transaction Processing Council (TPC)
  • Mission create realistic and fair benchmarks for
    TP
  • For more info http//www.tpc.org
  • Benchmark types
  • TPC-A (1985)
  • TPC-C (1992) complex query environment
  • TPC-H models ad-hoc decision support (unrelated
    queries, no local history to optimize future
    queries)
  • TPC-W transaction Web benchmark (simulates the
    activities of a business-oriented transactional
    Web server)
  • TPC-App application server and Web services
    benchmark (simulates activities of a B2B
    transactional application server operating 24/7)
  • Metric transaction per second, also include
    response time (throughput performance is measure
    only when response time requirements are met).

34
EMBS
  • Embedded Microprocessor Benchmark Consortium
    (EEMBC, pronounced embassy)
  • Non-profit consortium supported by member dues
    and license fees
  • Real world benchmark software helps designers
    select the right embedded processors for their
    systems
  • Standard benchmarks and methodology ensure fair
    and reasonable comparisons
  • EEMBC Technology Center manages development of
    new benchmark software and certifies benchmark
    test results
  • For more info http//www.eembc.com/
  • 41 kernels used in different embedded
    applications
  • Automotive/Industrial
  • Consumer
  • Digital Entertainment
  • Java
  • Networking
  • Office Automation
  • Telecommunications

35
The Art of Workload Selection
36
The Art of Workload Selection
  • Workload is the most crucial part of any
    performance evaluation
  • Inappropriate workload will result in misleading
    conclusions
  • Major considerations in workload selection
  • Services exercised by the workload
  • Level of detail
  • Representativeness
  • Timeliness

37
Services Exercised
  • SUT System Under Test
  • CUS Component Under Study

38
Services Exercised (contd)
  • Do not confuse SUT w CUS
  • Metrics depend upon SUT MIPS is ok for two CPUs
    but not for two timesharing systems
  • Workload depends upon the system
  • Examples
  • CPU instructions
  • System Transactions
  • Transactions not good for CPU and vice versa
  • Two systems identical except for CPU
  • Comparing Systems Use transactions
  • Comparing CPUs Use instructions
  • Multiple services Exercise as complete a set of
    services as possible

39
Example Timesharing Systems
  • Hierarchy of interfaces
  • Applications Application benchmark
  • Operating System Synthetic Program
  • Central Processing Unit Instruction Mixes
  • Arithmetic Logical Unit Addition instruction

40
Example Networks
  • Application user applications, such as mail,
    file transfer, http,
  • Workload frequency of various types of
    applications
  • Presentation data compression, security,
  • Workload frequency of various types of security
    and (de)compression requests
  • Session dialog between the user processes on the
    two end systems (init., maintain, discon.)
  • Workload frequency and duration of various types
    of sessions
  • Transport end-to-end aspects of communication
    between the source and the destination nodes
    (segmentation and reassembly of messages)
  • Workload frequency, sizes, and other
    characteristics of various messages
  • Network routes packets over a number of links
  • Workload the source-destination matrix, the
    distance, and characteristics of packets
  • Datalink transmission of frames over a single
    link
  • Workload characteristics of frames, length,
    arrival rates,
  • Physical transmission of individual bits (or
    symbols) over the physical medium
  • Workload frequency of various symbols and bit
    patterns

41
Example Magnetic Tape Backup System
  • Backup System
  • Services Backup files, backup changed files,
    restore files, list backed-up files
  • Factors File-system size, batch or background
    process, incremental or full backups
  • Metrics Backup time, restore time
  • Workload A computer system with files to be
    backed up. Vary frequency of backups
  • Tape Data System
  • Services Read/write to the tape, read tape
    label, auto load tapes
  • Factors Type of tape drive
  • Metrics Speed, reliability, time between
    failures
  • Workload A synthetic program generating
    representative tape I/O requests

42
Magnetic Tape System (contd)
  • Tape Drives
  • Services Read record, write record, rewind, find
    record, move to end of tape, move to beginning of
    tape
  • Factors Cartridge or reel tapes, drive size
  • Metrics Time for each type of service, for
    example, time to read record and to write record,
    speed (requests/time), noise, power dissipation
  • Workload A synthetic program exerciser
    generating various types of requests in a
    representative manner
  • Read/Write Subsystem
  • Services Read data, write data (as digital
    signals)
  • Factors Data-encoding technique, implementation
    technology (CMOS, TTL, and so forth)
  • Metrics Coding density, I/O bandwidth (bits per
    second)
  • Workload Read/write data streams with varying
    patterns of bits

43
Magnetic Tape System (contd)
  • Read/Write Heads
  • Services Read signal, write signal (electrical
    signals)
  • Factors Composition, inter-head spacing, gap
    sizing, number of heads in parallel
  • Metrics Magnetic field strength, hysteresis
  • Workload Read/write currents of various
    amplitudes, tapes moving at various speeds

44
Level of Detail
  • Workload description varies from least detailed
    to a time-stamped list of all requests
  • 1) Most frequent request
  • Examples Addition Instruction, Debit-Credit,
    Kernels
  • Valid if one service is much more frequent than
    others
  • 2) Frequency of request types
  • List various services, their characteristics, and
    frequency
  • Examples Instruction mixes
  • Context sensitivity
  • A service depends on the services required in the
    past
  • gt Use set of services (group individual service
    requests)
  • E.g., caching is a history-sensitive mechanism

45
Level of Detail (Cont)
  • 3) Time-stamped sequence of requests (trace)
  • May be too detailed
  • Not convenient for analytical modeling
  • May require exact reproduction of component
    behavior
  • 4) Average resource demand
  • Used for analytical modeling
  • Grouped similar services in classes
  • 5) Distribution of resource demands
  • Used if variance is large
  • Used if the distribution impacts the performance
  • Workloads used in simulation and analytical
    modeling
  • Non executable Used in analytical/simulation
    modeling
  • Executable can be executed directly on a system

46
Representativeness
  • Workload should be representative of the real
    application
  • How do we define representativeness?
  • The test workload and real workload should have
    the same
  • Arrival Rate the arrival rate of requests should
    be the same or proportional to that of the real
    application
  • Resource Demands the total demands on each of
    the key resources should be the same or
    proportional to that of the application
  • Resource Usage Profile relates to the sequence
    and the amounts in which different resources are
    used

47
Timeliness
  • Workloads should follow the changes in usage
    patterns in a timely fashion
  • Difficult to achieve users are a moving target
  • New systems ? new workloads
  • Users tend to optimize the demand
  • Use those features that the system performs
    efficiently
  • E.g., fast multiplication ? higher frequency of
    multiplication instructions
  • Important to monitor user behavior on an ongoing
    basis

48
Other Considerations in Workload Selection
  • Loading Level A workload may exercise a system
    to its
  • Full capacity (best case)
  • Beyond its capacity (worst case)
  • At the load level observed in real workload
    (typical case)
  • For procurement purposes ? Typical
  • For design ? best to worst, all cases
  • Impact of External Components
  • Do not use a workload that makes external
    component a bottleneck ? All alternatives in the
    system give equally good performance
  • Repeatability
  • Workload should be such that the results can be
    easily reproduced without too much variance

49
Summary
  • Services exercised determine the workload
  • Level of detail of the workload should match that
    of the model being used
  • Workload should be representative of the real
    systems usage in recent past
  • Loading level, impact of external components, and
    repeatability or other criteria in workload
    selection

50
WorkloadCharacterization
51
Workload Characterization Techniques
Speed, quality, price. Pick any two. James M.
Wallace
  • Want to have repeatable workload so can compare
    systems under identical conditions
  • Hard to do in real-user environment
  • Instead
  • Study real-user environment
  • Observe key characteristics
  • Develop workload model
  • ? Workload Characterization

52
Terminology
  • Assume system provides services
  • User (workload component, workload unit) entity
    that makes service requests at the SUT interface
  • Applications mail, editing, programming ..
  • Sites workload at different organizations
  • User Sessions complete user sessions from login
    to logout
  • Workload parameters the measure quantities,
    service requests, resource demands used to model
    or characterize workload
  • Ex instructions, packet sizes, source or
    destination of packets, page reference pattern,

53
Choosing Parameters
  • The workload component should be at the SUT
    interface.
  • Each component should represent as homogeneous a
    group as possible. Combining very different users
    into a site workload may not be meaningful.
  • Better to pick parameters that depend upon
    workload and not upon system
  • Ex response time of email not good
  • Depends upon system
  • Ex email size is good
  • Depends upon workload
  • Several characteristics that are of interest
  • Arrival time, duration, quantity of resources
    demanded
  • Ex network packet size
  • Have significant impact (exclude if little
    impact)
  • Ex type of Ethernet card

54
Techniques for Workload Characterization
  • Averaging
  • Specifying dispersion
  • Single-parameter histograms
  • Multi-parameter histograms
  • Principal-component analysis
  • Markov models
  • Clustering

55
Averaging
  • Mean
  • Standard deviation
  • Coefficient Of Variation
  • Mode (for categorical variables) Most frequent
    value
  • Median 50-percentile

56
Case Study Program Usage in Educational
Environments
  • High Coefficient of Variation

57
Characteristics of an Average Editing Session
  • Reasonable variation

58
Techniques for Workload Characterization
  • Averaging
  • Specifying dispersion
  • Single-parameter histograms
  • Multi-parameter histograms
  • Principal-component analysis
  • Markov models
  • Clustering

59
Single Parameter Histograms
  • n buckets m parameters k components values
  • Use only if the variance is high
  • Ignores correlation among parameters
  • E.g., short jobs have low CPU time and a small
    number of disk I/O requests With a single
    histogram parameters, we may generate a workload
    with low CPU time and a large number of I/O
    requests something that is not possible in real
    systems

60
Multi-parameter Histograms
  • Difficult to plot joint histograms for more than
    two parameters

61
Techniques for Workload Characterization
  • Averaging
  • Specifying dispersion
  • Single-parameter histograms
  • Multi-parameter histograms
  • Principal-component analysis
  • Markov models
  • Clustering

62
Principal-Component Analysis
  • Goal is to reduce number of factors
  • PCA transforms a number of (possibly) correlated
    variables into a (smaller) number of uncorrelated
    variables called principal components

63
Principal Component Analysis (contd)
  • Key Idea Use a weighted sum of parameters to
    classify the components
  • Let xij denote the ith parameter for jth
    component
  • yj åi1n wi xij
  • Principal component analysis assigns weights wi's
    such that yj's provide the maximum discrimination
    among the components
  • The quantity yj is called the principal factor
  • The factors are ordered. First factor explains
    the highest percentage of the variance

64
Principal Component Analysis (contd)
  • Given a set of n parameters x1, x2, xn,the
    PCA produces a set of factors y1, y2, yn such
    that
  • 1) The y's are linear combinations of x's
  • yi åj1n aij xjHere, aij is called the
    loading of variable xj on factor yi.
  • 2) The y's form an orthogonal set, that is,
    their inner product is zero
  • ltyi, yjgt åk aikakj 0
  • This is equivalent to stating that yi's are
    uncorrelated to each other
  • 3) The y's form an ordered set such that y1
    explains the highest percentage of the variance
    in resource demands

65
Finding Principal Factors
  • Find the correlation matrix
  • Find the eigen values of the matrix and sort them
    in the order of decreasing magnitude
  • Find corresponding eigen vectors These give the
    required loadings

66
Principal Component Analysis Example
  • xs packets sent, xr packet received

67
Principal Component Analysis
  • 1) Compute the mean and standard deviations of
    the variables

68
Principal Component Analysis (contd)
  • Similarly

69
Principal Component Analysis (contd)
  • 2) Normalize the variables to zero mean and unit
    standard deviation. The normalized values xs and
    xr are given by

70
Principal Component Analysis (contd)
  • 3) Compute the correlation among the variables
  • 4) Prepare the correlation matrix

71
Principal Component Analysis (contd)
  • 5) Compute the eigenvalues of the correlation
    matrix By solving the characteristic equation
  • The eigenvalues are 1.916 and 0.084.

72
Principal Component Analysis (contd)
  • 6) Compute the eigenvectors of the correlation
    matrix. The eigenvector q1 corresponding to
    l11.916 1.916 are defined by the following
    relationship
  • Cq1 l1 q1
  • or
  • or
  • q11q21

73
Principal Component Analysis (contd)
  • Restricting the length of the eigenvectors to
    one
  • 7) Obtain principal factors by multiplying the
    eigen vectors by the normalized vectors

74
Principal Component Analysis (contd)
  • 8) Compute the values of the principal factors
    (last two columns)
  • 9) Compute the sum and sum of squares of the
    principal factors
  • The sum must be zero
  • The sum of squares give the percentage of
    variation explained

75
Principal Component Analysis (contd)
  • The first factor explains 32.565/(32.5651.435)
    or 95.7 of the variation
  • The second factor explains only 4.3 of the
    variation and can, thus, be ignored

76
Techniques for Workload Characterization
  • Averaging
  • Specifying dispersion
  • Single-parameter histograms
  • Multi-parameter histograms
  • Principal-component analysis
  • Markov models
  • Clustering

77
Markov Models
  • Sometimes, important not to just have number of
    each type of request but also order of requests
  • If next request depends upon previous request,
    then can use Markov model
  • Actually, more general. If next state depends
    upon current state

78
Markov Models (contd)
  • Example process between CPU, disk, terminal
  • Transition matrices can be used also for
    application transitions
  • E.g., P(LinkCompile)
  • Used to specify page-reference locality
  • P(Reference module i Referenced module j)

79
Transition Probability
  • Given the same relative frequency of requests of
    different types, it is possible to realize the
    frequency with several different transition
    matrices
  • Each matrix may result in a different performance
    of the system
  • If order is important, measure the transition
    probabilities directly on the real system
  • Example Two packet sizes Small (80), Large
    (20)

80
Transition Probability (contd)
  • Option 1 An average of four small packets are
    followed by an average of one big packet, e.g.,
    ssssbssssbssss.
  • Option 2 Eight small packets followed by two
    big packets, e.g., ssssssssbbssssssssbb
  • 3) Generate a random number x If x lt 0.8,
    generate a small packet otherwise generate a
    large packet

81
Techniques for Workload Characterization
  • Averaging
  • Specifying dispersion
  • Single-parameter histograms
  • Multi-parameter histograms
  • Principal-component analysis
  • Markov models
  • Clustering

82
Clustering
  • May have large number of components
  • Cluster such that components within are similar
    to each other
  • Then, can study one member to represent
    component class
  • Ex 30 jobs with CPU I/O. Five clusters.

83
Clustering Steps
  1. Take sample
  2. Select parameters
  3. Transform, if necessary
  4. Remove outliers
  5. Scale observations
  6. Select distance metric
  7. Perform clustering
  8. Interpret
  9. Change and repeat 3-7
  10. Select representative components

84
1) Sampling
  • Usually too many components to do clustering
    analysis
  • Thats why we are doing clustering in the first
    place!
  • Select small subset
  • If careful, will show similar behavior to the
    rest
  • May choose randomly
  • However, if are interested in a specific aspect,
    may choose to cluster only top consumers
  • E.g., if interested in a disk, only do clustering
    analysis on components with high I/O

85
2) Parameter Selection
  • Many components have a large number of parameters
    (resource demands)
  • Some important, some not
  • Remove the ones that do not matter
  • Two key criteria impact on perf variance
  • If have no impact, omit.
  • If have little variance, omit.
  • Method redo clustering with 1 less parameter.
  • Count the number of components that change
    cluster membership. If not many change, remove
    parameter
  • Principal component analysis Identify parameters
    with the highest variance

86
3) Transformation
  • If distribution is skewed, may want to transform
    the measure of the parameter
  • Ex one study measured CPU time
  • Two programs taking 1 and 2 seconds are as
    different as two programs taking 10 and 20
    milliseconds
  • ? Take ratio of CPU time and not difference
  • (More in Chapter 15)

87
4) Outliers
  • Data points with extreme parameter values
  • Can significantly affect max or min (or mean or
    variance)
  • For normalization (scaling, next) their
    inclusion/exclusion may significantly affect
    outcome
  • Only exclude if do not consume significant
    portion of resources
  • E.g., disk backup may make a number of disk I/O
    requests, and should not be excluded if backup is
    done frequently (e.g., several times a day) may
    be excluded if done once in a month

88
5) Data Scaling
  • Final results depend upon relative ranges
  • Typically scale so relative ranges equal
  • Different ways of doing this

89
5) Data Scaling (contd)
  • Normalize to Zero Mean and Unit Variance
  • Weights xik0 wk xik wk / relative
    importance or wk 1/sk
  • Range Normalization
  • Change from xmin,k,xmax,k to 0,1

Affected by outliers
90
5) Data Scaling (contd)
  • Percentile Normalization
  • Scale so 95 of values between 0 and 1

Less sensitive to outliers
91
6) Distance Metric
  • Map each component to n-dimensional space and see
    which are close to each other
  • Euclidean Distance between two components
  • xi1, xi2, xin and xj1, xj2, , xjn
  • Weighted Euclidean Distance
  • Assign weights ak for n parameters
  • Used if values not scaled or if significantly
    different in importance

92
6) Distance Metric (contd)
  • Chi-Square Distance
  • Used in distribution fitting
  • Need to use normalized or the relative sizes
    influence chi-square distance measure
  • Overall, Euclidean Distance is most commonly used

93
7) Clustering Techniques
  • Goal Partition into groups so the members of a
    group are as similar as possible and different
    groups are as dissimilar as possible
  • Statistically, the intragroup variance should be
    as small as possible, and inter-group variance
    should be as large as possible
  • Total Variance Intra-group Variance
    Inter-group Variance

94
7) Clustering Techniques (contd)
  • Nonhierarchical techniques Start with an
    arbitrary set of k clusters, Move members until
    the intra-group variance is minimum.
  • Hierarchical Techniques
  • Agglomerative Start with n clusters and merge
  • Divisive Start with one cluster and divide.
  • Two popular techniques
  • Minimum spanning tree method (agglomerative)
  • Centroid method (Divisive)

95
Clustering Techniques Minimum Spanning Tree
Method
  • Start with k n clusters.
  • Find the centroid of the ith cluster, i1, 2, ,
    k.
  • Compute the inter-cluster distance matrix.
  • Merge the the nearest clusters.
  • Repeat steps 2 through 4 until all components are
    part of one cluster.

96
Minimum Spanning Tree Example (1/5)
  • Workload with 5 components (programs), 2
    parameters (CPU/IO)
  • Measure CPU and I/O for each 5 programs

97
Minimum Spanning Tree Example(2/5)
  • Step 1) Consider 5 clusters with ith cluster
    having only ith program
  • Step 2) The centroids are 2,4, 3,5, 1,6,
    4,3 and 5,2

98
Minimum Spanning Tree Example (3/5)
  • Step 3) Euclidean distance

Step 4) Minimum ? merge
99
Minimum Spanning Tree Example (4/5)
  • The centroid of AB is (23)/2, (45)/2
  • 2.5, 4.5. DE 4.5, 2.5

Minimum ? merge
100
Minimum Spanning Tree Example (5/5)
  • Centroid ABC (231)/3, (456)/3 2,5
  • Minimum
  • Merge
  • Stop

101
Representing Clustering
  • Spanning tree called a dendrogram
  • Each branch is cluster, height where merges

Can obtain clusters for any allowable
distance Ex at 3, get abc and de
102
Nearest Centroid Method
  • Start with k 1.
  • Find the centroid and intra-cluster variance for
    ith cluster, i 1, 2, , k.
  • Find the cluster with the highest variance and
    arbitrarily divide it into two clusters
  • Find the two components that are farthest apart,
    assign other components according to their
    distance from these points.
  • Place all components below the centroid in one
    cluster and all components above this hyper plane
    in the other.
  • Adjust the points in the two new clusters until
    the inter-cluster distance between the two
    clusters is maximum
  • Set k k1. Repeat steps 2 through 4 until k n

103
Interpreting Clusters
  • Clusters will small populations may be discarded
  • If use few resources
  • If cluster with 1 component uses 50 of
    resources, cannot discard!
  • Name clusters, often by resource demands
  • Ex CPU bound or I/O bound
  • Select 1 components from each cluster as a test
    workload
  • Can make number selected proportional to cluster
    size, total resource demands or other

104
Problems with Clustering
105
Problems with Clustering (Cont)
  • Goal Minimize variance
  • The results of clustering are highly variable.
    No rules for
  • Selection of parameters
  • Distance measure
  • Scaling
  • Labeling each cluster by functionality is
    difficult
  • In one study, editing programs appeared in 23
    different clusters
  • Requires many repetitions of the analysis
Write a Comment
User Comments (0)
About PowerShow.com