Evaluating Performance Information for Mapping Algorithms to Advanced Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating Performance Information for Mapping Algorithms to Advanced Architectures

Description:

Evaluating Performance Information for Mapping Algorithms to ... Scree test. Plot of eigenvalues of correlation matrix. Cumulative Percentage of Total Variation ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 54
Provided by: naydags
Learn more at: http://www.ece.uprm.edu
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures


1
Evaluating Performance Information for Mapping
Algorithms to Advanced Architectures
  • Nayda G. Santiago, PhD, PE
  • Electrical and Computer Engineering Department
  • University of Puerto Rico, Mayaguez Campus

Sept 1, 2006
2
Outline
  • Introduction
  • Problems
  • Methodology
  • Objectives
  • Previous Work
  • Description of Methodology
  • Case Study
  • Results
  • Conclusions
  • Future Work

3
Introduction
  • Problem solving on HPC facility
  • Conceptualization
  • Instantiation
  • Mapping
  • Parallel Implementation
  • Goal
  • Can we obtain metrics to characterize what is
    happening in the HPC system?
  • Test a methodology for obtaining information from
    HPC system.
  • Compare with current results.

4
Introduction
Mapping Process
Source Code
Compiler
Linker
Executable File
Running Program
Instrumentation
Libraries
Measurement
5
Introduction
  • Application Programmer Decisions
  • Programming paradigm
  • Language
  • Compiler
  • Libraries
  • Advanced architecture
  • Programming style
  • Algorithms

6
Problems
  • Different factors affect computer performance of
    an implementation.
  • Information of high-level effects is lost in the
    mapping process
  • Out of order execution
  • Compiler optimizations
  • Complex interactions of parallel code and systems
  • Current performance analysis tools not appealing

7
Current Tuning Methodology
System Configuration
Programming Style
Programming Paradigm
Languages
High-level Code
Computer System
Instrumentation Tools
Libraries
Algorithms
Performance Data
Use
Analysis and Evaluation Tools
Programmer
Evaluation
Experience
Knowledge On Tools
In-depth Knowledge On Computer System
Understand Relations Between Performance Data
and Code
Burden on Programmer
8
New Tuning Methodology
Computer System
High-level Code
Instrumentation Tools
Experimentation
Alternatives
Performance Data
Problem Solving Environment
Modify
Statistical Data Analysis
Programmer
Knowledge-Based System
Suggestions
Information
9
Proposed Tuning Methodology
10
Integrative Performance Analysis
Measurement low level information is collected
Abstraction low-level information is hidden
Metrics
Problem Translation
Mapping back to users view?
Users View
System Levels
  • Machine
  • OS
  • Node
  • Network
  • Tools
  • High-level Language
  • Domain Factors

11
Objectives
  • Obtain information on the relation between
    low-level performance information and factors
    that affect performance.
  • Lessen the burden of the programmer to
    incorporate experience and knowledge into the
    tuning process.
  • Identify the most important metrics describing
    system-software interactions.
  • Identify how many metrics convey most of the
    information of the system.

12
Methodology
Design of Experiment
Data Analysis
Data Collection
Preliminary Problem Analysis
13
Methodology
Computer System
High-level Code
Instrumentation Tools
Experimentation
Alternatives
Performance Data
Modify
Statistical Data Analysis
Programmer
Knowledge-Based System
Suggestions
Information
14
Preliminary Problem Analysis
Preliminary Problem Analysis
-Evaluation of alternatives -Screening
experiment Factors for experimentation -Unders
tanding of problem
-Application -Performance goal -Potential factors
affecting performance
  • Results
  • Profiling is useful for preliminary analysis
  • Contribution
  • Screening is required to limit number of factors
    in experiment
  • Feasibility
  • Significance
  • Due to the large number of factors affecting
    performance and the long running times,
    experimentation has not been commonly used for
    performance data evaluation. Screening will make
    it feasible.

15
Design of Experiment (DOE)
Design of Experiment
-Levels of each factor -Response variable -Choice
of design -Order of treatments
-Design Factors
  • Systematic planning of experiments
  • Most information
  • Minimize effect extraneous factors
  • Causal relations
  • Correlational relations

16
Design of Experiment
  • Three basic principles
  • Replication
  • Estimate experimental error
  • Precision
  • Randomization
  • Independence between observations
  • Average out effect extraneous factors
  • Blocking
  • Block Set homogeneous experimental conditions

17
Design of Experiment
  • Results
  • The appropriate randomization scheme, number of
    replications, and treatment order for the
    experimental runs.
  • Contributions
  • Innovative use of DOE for establishing causal
    relations for application tuning
  • The appropriate design of experiment should be
    selected according to the performance analysis
    problem
  • Significance
  • The use of DOE and ANOVA will determine the cause
    of the performance differences in the results

18
Data Collection
Data Collection
-Executable File -System -Instrumentation Tool
-Raw data -Sampling -Profiles
  • Instrumentation
  • Dependent on the system
  • Observable computing system
  • Metrics can be observed in the system

19
Data Collection
  • Instrumentation
  • Software
  • Hardware
  • Instrumentation tool setup
  • Experimental runs and data collection

20
Data Collection
-Tool Configuration -Order of runs -Crontab file
-System
Data Collection
Raw data (metrics)
  • Results
  • Measurements of the metrics observed from the
    system
  • Particular to this case study
  • Between 36 and 52 metrics

21
Data Analysis
Data Analysis
Raw Data
Information
  • Statistical Analysis
  • Correlation matrix
  • Multidimensional methods
  • Dimensionality estimation
  • Subset Selection
  • Entropy cost function
  • ANOVA
  • Post hoc comparisons

22
Data Analysis
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
23
Data Analysis Data Conversion
  • Raw data
  • Sampling
  • Profiling
  • Performance Data Matrix
  • Random process
  • Average
  • Random variable

Convert Format
Raw Data
Performance Data Matrix
24
Data Analysis Data Conversion
  • Performance data matrix


ma0,0
ma0,1
ma0,P-1

ma1,0
ma1,1
ma1,P-1
M





maK-1,0
maK-1,1
maK-1,P-1
a abs or avg k experimental run p metric
identification number
Multidimensional
ma(k,p), where
25
Data Analysis Data Conversion
  • Performance data matrix example


ExecTime0
Pgfaults/s0
IdleTime0
Run 0

Run 1
ExecTime1
Pgfaults/s1
IdleTime1
M





ExecTimeK-1
IdleTimeK-1
Pgfaults/sK-1
Run K-1
Metric 0
Metric 1
Metric P-1
26
Data Analysis Correlation Study
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
27
Data Analysis Correlation Study
  • Correlation
  • Measure of linear relation among variables
  • No causal

28
Data Analysis Correlation Study
Example
29
Data Analysis Correlation Study
  • Correlation formula
  • Which metrics were most correlated with execution
    time
  • Results of correlation analysis
  • Collinearity
  • Software instrumentation

Where Sx and Sy are the sample estimate of the
standard deviation
30
Data Analysis Normalization
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
31
Data Analysis Normalization
  • Normalization
  • Log normalization
  • Min-max normalization
  • Dimension normalization

Performance Data Matrix
Normalized Performance Data Matrix
nak,plog(mak,p)
Normalize
mak,p-min(mpak)
nak,p
max(mpak)-min(mpak)
  • Scales of metrics vary widely

mak,p
nak,p
EuclNorm(mpak)
32
Data Analysis Normalization
  • Normalization Evaluation
  • Artificially assign classes to data set
  • Long execution time
  • Short execution time
  • Used visual separability criteria
  • Principal Component Analysis (PCA)
  • Project data along principal components
  • Visualized separation of data

33
Data Analysis Normalization
Not Normalized
34
Data Analysis Normalization
Not Normalized
35
Data Analysis Normalization
Min-max normalization
36
Data Analysis Normalization
Normalizing to range (0,1)
37
Data Analysis Normalization
Normalizing with Euclidean Norm
38
Data Analysis Normalization
Normalizing with Euclidean Norm
39
Data Analysis Normalization
  • Results
  • Appropriate normalization scheme
  • Euclidean Normalization
  • Contribution
  • Usage of normalization schemes for performance
    data
  • Significance
  • Due to the effect of differences in scale, some
    statistical methods may be biased. By
    normalizing, results obtained will be due to the
    true nature of the problem and not caused by
    scale variations.

40
Data Analysis Dimension Estimation
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
41
Data Analysis Dimension Estimation
  • Dimensionality estimation
  • How many metrics will explain the systems
    behavior?
  • Scree test
  • Plot of eigenvalues of correlation matrix
  • Cumulative Percentage of Total Variation
  • Keep components explaining variance of data
  • Kaiser-Guttman
  • Eigenvalues of correlation matrix greater than
    one.

Dimension
P metrics
K metrics
K ltlt P
42
Data Analysis Dimension Estimation
  • Example

43
Data Analysis Dimension Estimation
  • Results
  • Dimension reduction to approximately 18 of the
    size
  • All three methods have similar results
  • Contribution
  • Estimation of performance data sets dimension
  • Significance
  • Provides the minimum set of metrics that contain
    the most amount of information needed to evaluate
    the system

44
Data Analysis Metric Subset Selection
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
45
Data Analysis Metric Subset Selection
  • Subset Selection
  • Sequential Forward Search
  • Entropy Cost Function

where
is the similarity value of two instances
46
Data Analysis Metric Subset Selection
  • Results
  • Establishment of most important metrics
  • For case study
  • For experiment 1 Paging Activity
  • For experiment 2 Memory faults
  • For experiment 3 Buffer activity
  • For experiment 4 Mix of metrics

47
Data Analysis Metric Subset Selection
  • Contributions
  • The usage of
  • Feature subset selection to identify the most
    important metrics
  • Entropy as a cost function for this purpose
  • Significance
  • The system is viewed as a source of information.
    If we can select metrics based on the amount of
    information they provide, we can narrow down the
    search for sources of performance problems.

48
Data Analysis ANOVA
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
49
Data Analysis ANOVA
  • Analysis of Variance (ANOVA)
  • Cause of variations
  • Null hypothesis
  • Post Hoc Comparisons
  • After null hypothesis is rejected

Raw Data
Anova
Factors
If factors Cause Variations
Which level? How? Significant Differences?
Post Hoc Comparisons
50
Data Analysis ANOVA
  • Results
  • Set of factors affecting metric values and the
    values
  • Contribution
  • Use of ANOVA for analysis of performance metrics
  • Significance
  • ANOVA allows to identify whether the variations
    of the measurements are due to the random nature
    of the data or the factors. Incorrect conclusions
    may be reached if personal judgment is used.

51
Publications
  • N. G. Santiago, D. T. Rover, and D. Rodriguez, "A
    Statistical Approach for the Analysis of the
    Relation Between Low-Level Performance
    Information, the Code, and the Environment",
    Information An International Interdisciplinary
    Journal, Vol. 9, No 3, May 2006, pp 503 - 518.
  • N. G. Santiago, D. T. Rover, D. Rodriguez,
    Subset Selection of Performance Metrics
    Describing System-Software Interactions, SC2002,
    Supercomputing High Performance Networking and
    Computing 2002, Baltimore MD, November 16-22,
    2002.
  • Santiago, N.G. Rover, D.T. Rodriguez, D., A
    statistical approach for the analysis of the
    relation between low-level performance
    information, the code, and the environment, The
    4th Workshop on High Performance Scientific and
    Engineering Computing with Applications,
    HPSECA-02, Proceedings of the International
    Conference on Parallel Processing Workshops,
    August 18-21, 2002, Vancouver, British Columbia,
    Canada, Page(s) 282 -289.

52
Future Work
  • Develop a means of providing feedback to the
    scientific programmer
  • Design a knowledge-based system, PR system?
  • Assign classes to performance outcomes and use a
    classifier
  • Compare different entropy estimators for
    performance data evaluation.
  • Evaluate other subset selection schemes
  • Compare software versus hardware metrics
  • Compare different architectures and programming
    paradigms

53
Questions?
Write a Comment
User Comments (0)
About PowerShow.com