Evaluating Performance Information for Mapping Algorithms to Advanced Architectures - PowerPoint PPT Presentation

About This Presentation

Title:

Evaluating Performance Information for Mapping Algorithms to Advanced Architectures

Description:

Evaluating Performance Information for Mapping Algorithms to ... Scree test. Plot of eigenvalues of correlation matrix. Cumulative Percentage of Total Variation ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 54

Provided by: naydags

Learn more at: http://www.ece.uprm.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures

1
Evaluating Performance Information for Mapping
Algorithms to Advanced Architectures

Nayda G. Santiago, PhD, PE
Electrical and Computer Engineering Department
University of Puerto Rico, Mayaguez Campus

Sept 1, 2006
2
Outline

Introduction
Problems
Methodology
Objectives
Previous Work
Description of Methodology
Case Study
Results
Conclusions
Future Work

3
Introduction

Problem solving on HPC facility
Conceptualization
Instantiation
Mapping
Parallel Implementation
Goal
Can we obtain metrics to characterize what is
happening in the HPC system?
Test a methodology for obtaining information from
HPC system.
Compare with current results.

4
Introduction
Mapping Process
Source Code
Compiler
Linker
Executable File
Running Program
Instrumentation
Libraries
Measurement
5
Introduction

Application Programmer Decisions
Programming paradigm
Language
Compiler
Libraries
Advanced architecture
Programming style
Algorithms

6
Problems

Different factors affect computer performance of
an implementation.
Information of high-level effects is lost in the
mapping process
Out of order execution
Compiler optimizations
Complex interactions of parallel code and systems
Current performance analysis tools not appealing

7
Current Tuning Methodology
System Configuration
Programming Style
Programming Paradigm
Languages
High-level Code
Computer System
Instrumentation Tools
Libraries
Algorithms
Performance Data
Use
Analysis and Evaluation Tools
Programmer
Evaluation
Experience
Knowledge On Tools
In-depth Knowledge On Computer System
Understand Relations Between Performance Data
and Code
Burden on Programmer
8
New Tuning Methodology
Computer System
High-level Code
Instrumentation Tools
Experimentation
Alternatives
Performance Data
Problem Solving Environment
Modify
Statistical Data Analysis
Programmer
Knowledge-Based System
Suggestions
Information
9
Proposed Tuning Methodology
10
Integrative Performance Analysis
Measurement low level information is collected
Abstraction low-level information is hidden
Metrics
Problem Translation
Mapping back to users view?
Users View
System Levels

Machine
OS
Node
Network

Tools
High-level Language
Domain Factors

11
Objectives

Obtain information on the relation between
low-level performance information and factors
that affect performance.
Lessen the burden of the programmer to
incorporate experience and knowledge into the
tuning process.
Identify the most important metrics describing
system-software interactions.
Identify how many metrics convey most of the
information of the system.

12
Methodology
Design of Experiment
Data Analysis
Data Collection
Preliminary Problem Analysis
13
Methodology
Computer System
High-level Code
Instrumentation Tools
Experimentation
Alternatives
Performance Data
Modify
Statistical Data Analysis
Programmer
Knowledge-Based System
Suggestions
Information
14
Preliminary Problem Analysis
Preliminary Problem Analysis
-Evaluation of alternatives -Screening
experiment Factors for experimentation -Unders
tanding of problem
-Application -Performance goal -Potential factors
affecting performance

Results
Profiling is useful for preliminary analysis
Contribution
Screening is required to limit number of factors
in experiment
Feasibility
Significance
Due to the large number of factors affecting
performance and the long running times,
experimentation has not been commonly used for
performance data evaluation. Screening will make
it feasible.

15
Design of Experiment (DOE)
Design of Experiment
-Levels of each factor -Response variable -Choice
of design -Order of treatments
-Design Factors

Systematic planning of experiments
Most information
Minimize effect extraneous factors
Causal relations
Correlational relations

16
Design of Experiment

Three basic principles
Replication
Estimate experimental error
Precision
Randomization
Independence between observations
Average out effect extraneous factors
Blocking
Block Set homogeneous experimental conditions

17
Design of Experiment

Results
The appropriate randomization scheme, number of
replications, and treatment order for the
experimental runs.
Contributions
Innovative use of DOE for establishing causal
relations for application tuning
The appropriate design of experiment should be
selected according to the performance analysis
problem
Significance
The use of DOE and ANOVA will determine the cause
of the performance differences in the results

18
Data Collection
Data Collection
-Executable File -System -Instrumentation Tool
-Raw data -Sampling -Profiles

Instrumentation
Dependent on the system
Observable computing system
Metrics can be observed in the system

19
Data Collection

Instrumentation
Software
Hardware
Instrumentation tool setup
Experimental runs and data collection

20
Data Collection
-Tool Configuration -Order of runs -Crontab file
-System
Data Collection
Raw data (metrics)

Results
Measurements of the metrics observed from the
system
Particular to this case study
Between 36 and 52 metrics

21
Data Analysis
Data Analysis
Raw Data
Information

Statistical Analysis
Correlation matrix
Multidimensional methods
Dimensionality estimation
Subset Selection
Entropy cost function
ANOVA
Post hoc comparisons

22
Data Analysis
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
23
Data Analysis Data Conversion

Raw data
Sampling
Profiling
Performance Data Matrix
Random process
Average
Random variable

Convert Format
Raw Data
Performance Data Matrix
24
Data Analysis Data Conversion

Performance data matrix

ma0,0
ma0,1
ma0,P-1

ma1,0
ma1,1
ma1,P-1
M

maK-1,0
maK-1,1
maK-1,P-1
a abs or avg k experimental run p metric
identification number
Multidimensional
ma(k,p), where
25
Data Analysis Data Conversion

Performance data matrix example

ExecTime0
Pgfaults/s0
IdleTime0
Run 0

Run 1
ExecTime1
Pgfaults/s1
IdleTime1
M

ExecTimeK-1
IdleTimeK-1
Pgfaults/sK-1
Run K-1
Metric 0
Metric 1
Metric P-1
26
Data Analysis Correlation Study
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
27
Data Analysis Correlation Study

Correlation
Measure of linear relation among variables
No causal

28
Data Analysis Correlation Study
Example
29
Data Analysis Correlation Study

Correlation formula
Which metrics were most correlated with execution
time
Results of correlation analysis
Collinearity
Software instrumentation

Where Sx and Sy are the sample estimate of the
standard deviation
30
Data Analysis Normalization
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
31
Data Analysis Normalization

Normalization
Log normalization
Min-max normalization
Dimension normalization

Performance Data Matrix
Normalized Performance Data Matrix
nak,plog(mak,p)
Normalize
mak,p-min(mpak)
nak,p
max(mpak)-min(mpak)

Scales of metrics vary widely

mak,p
nak,p
EuclNorm(mpak)
32
Data Analysis Normalization

Normalization Evaluation
Artificially assign classes to data set
Long execution time
Short execution time
Used visual separability criteria
Principal Component Analysis (PCA)
Project data along principal components
Visualized separation of data

33
Data Analysis Normalization
Not Normalized
34
Data Analysis Normalization
Not Normalized
35
Data Analysis Normalization
Min-max normalization
36
Data Analysis Normalization
Normalizing to range (0,1)
37
Data Analysis Normalization
Normalizing with Euclidean Norm
38
Data Analysis Normalization
Normalizing with Euclidean Norm
39
Data Analysis Normalization

Results
Appropriate normalization scheme
Euclidean Normalization
Contribution
Usage of normalization schemes for performance
data
Significance
Due to the effect of differences in scale, some
statistical methods may be biased. By
normalizing, results obtained will be due to the
true nature of the problem and not caused by
scale variations.

40
Data Analysis Dimension Estimation
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
41
Data Analysis Dimension Estimation

Dimensionality estimation
How many metrics will explain the systems
behavior?
Scree test
Plot of eigenvalues of correlation matrix
Cumulative Percentage of Total Variation
Keep components explaining variance of data
Kaiser-Guttman
Eigenvalues of correlation matrix greater than
one.

Dimension
P metrics
K metrics
K ltlt P
42
Data Analysis Dimension Estimation

Example

43
Data Analysis Dimension Estimation

Results
Dimension reduction to approximately 18 of the
size
All three methods have similar results
Contribution
Estimation of performance data sets dimension
Significance
Provides the minimum set of metrics that contain
the most amount of information needed to evaluate
the system

44
Data Analysis Metric Subset Selection
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
45
Data Analysis Metric Subset Selection

Subset Selection
Sequential Forward Search
Entropy Cost Function

where
is the similarity value of two instances
46
Data Analysis Metric Subset Selection

Results
Establishment of most important metrics
For case study
For experiment 1 Paging Activity
For experiment 2 Memory faults
For experiment 3 Buffer activity
For experiment 4 Mix of metrics

47
Data Analysis Metric Subset Selection

Contributions
The usage of
Feature subset selection to identify the most
important metrics
Entropy as a cost function for this purpose
Significance
The system is viewed as a source of information.
If we can select metrics based on the amount of
information they provide, we can narrow down the
search for sources of performance problems.

48
Data Analysis ANOVA
Convert Format
Raw Data
Correlation Matrix
Performance Data Matrix
Normalize
Dimension
Subset Selection
Information
Anova
Post Hoc Comparisons
49
Data Analysis ANOVA

Analysis of Variance (ANOVA)
Cause of variations
Null hypothesis
Post Hoc Comparisons
After null hypothesis is rejected

Raw Data
Anova
Factors
If factors Cause Variations
Which level? How? Significant Differences?
Post Hoc Comparisons
50
Data Analysis ANOVA

Results
Set of factors affecting metric values and the
values
Contribution
Use of ANOVA for analysis of performance metrics
Significance
ANOVA allows to identify whether the variations
of the measurements are due to the random nature
of the data or the factors. Incorrect conclusions
may be reached if personal judgment is used.

51
Publications

N. G. Santiago, D. T. Rover, and D. Rodriguez, "A
Statistical Approach for the Analysis of the
Relation Between Low-Level Performance
Information, the Code, and the Environment",
Information An International Interdisciplinary
Journal, Vol. 9, No 3, May 2006, pp 503 - 518.
N. G. Santiago, D. T. Rover, D. Rodriguez,
Subset Selection of Performance Metrics
Describing System-Software Interactions, SC2002,
Supercomputing High Performance Networking and
Computing 2002, Baltimore MD, November 16-22,
2002.
Santiago, N.G. Rover, D.T. Rodriguez, D., A
statistical approach for the analysis of the
relation between low-level performance
information, the code, and the environment, The
4th Workshop on High Performance Scientific and
Engineering Computing with Applications,
HPSECA-02, Proceedings of the International
Conference on Parallel Processing Workshops,
August 18-21, 2002, Vancouver, British Columbia,
Canada, Page(s) 282 -289.