Summary Presentation (3/24/2005) - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Summary Presentation (3/24/2005)

Description:

Useful to examine in order to extract important performance factors ... There are various incarnations of GOMS with different assumptions useful for ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 49
Provided by: adam94
Category:

less

Transcript and Presenter's Notes

Title: Summary Presentation (3/24/2005)


1
Summary Presentation (3/24/2005)
  • UPC Group
  • HCS Research Laboratory
  • University of Florida

2
Performance Analysis Strategies
3
Methods
  • Three general performance analysis approaches
  • Performance modeling
  • Mostly predictive methods
  • Useful to examine in order to extract important
    performance factors
  • Could also be used in conjunction with
    experimental performance measurement
  • Experimental performance measurement
  • Strategy used by most modern PATs
  • Uses actual event measurement to perform the
    analysis
  • Simulation
  • We will probably not use this approach
  • See Combined Report - Section 4 for details

4
Performance Modeling
5
Purpose and Method
  • Purpose
  • Review of existing performance models and its
    applicability to our project
  • Method
  • Perform literature search on performance modeling
    techniques
  • Categorize the techniques
  • Give weight to methods that are easily adapted to
    UPC SHMEM
  • Also more heavily consider methods that give
    accurate performance predictions or that help the
    user pick between different design strategies
  • See Combined Report - Section 5 for details

6
Performance Modeling Overview
  • Why do performance modeling? Several reasons
  • Grid systems need a way to estimate how long a
    program will take (billing/scheduling issues)
  • Could be used in conjunction with optimization
    methods to suggest improvements to user
  • Also can guide user on what kind of benefit can
    be expected from optimizing aspects of code
  • Figure out how far code is from optimal
    performance
  • Indirectly detect problems if a section of code
    is not performing as predicted, it probably has
    cache locality problems/etc
  • Challenge
  • Many models already exist, with varying degrees
    of accuracy and speed
  • Choose best model to fit in UPC/SHMEM PAT
  • Existing performance models categorized into
    different categories
  • Formal models (process algebras, petri nets)
  • General models that provide mental pictures of
    hardware/performance
  • Predictive models that try to estimate timing
    information

7
Formal Performance Models
  • Least useful for our purposes
  • Formal methods are strongly rooted in math
  • Can make strong statements and guarantees
  • However, difficult to adapt and automate for new
    programs
  • Examples include
  • Petri nets (specialized graphs which represent
    processes and systems)
  • Process algebras (formal algebra for specifying
    how parallel processes interact)
  • Queuing theory (strongly rooted in math)
  • PAMELA (C-style language to model concurrency and
    time-related operations)
  • For our purposes, formal models are too abstract
    to be directly useful

8
General Performance Models
  • Provide user with mental picture
  • Rules of thumb for cost of operations
  • Guides strategies used while creating programs
  • Usually analytical in nature
  • Examples include
  • PRAM (classical model, unit cost operations)
  • BSP (breaks execution into communication and
    computation phases)
  • LogP (analytical model of network operations)
  • Many more (see report for details)
  • For our purposes, general models can be useful
  • Created to be easily understood by programmer
  • But, may need lots of adaptation (and model
    fitting) to be directly useful

9
Predictive Performance Models
  • Models that specifically predict performance of
    parallel codes
  • Similar to general models, except meant to be
    used with existing systems
  • Usually a combination of mathematical
    models/equations and very simple simulation
  • Examples include
  • Lost cycles (samples program state to see if
    useful work is being done)
  • Task graphs (algorithm structure represented with
    graphs)
  • Vienna Fortran Compilation System (uses an
    analytical model to parallelize code by examining
    cost of operations)
  • PACE (Geared towards grid applications)
  • Convolution (Snaveleys method uses a
    combination of existing tools to predict
    performance based on memory traces and network
    traces)
  • Many more (see report, section 5 for details)
  • Lost cycles very promising
  • Provides very easy way to quantify performance
    scalability
  • Needs extension for greater correlation with
    source code

10
Experimental Performance Measurement
11
Overview
  • Instrumentation insertion of instrumentation
    code (in general)
  • Measurement actual measuring stage
  • Analysis filtering, aggregation, analysis of
    data gathered
  • Presentation display of analyzed data to the
    user. The only phase that deals directly with
    user
  • Optimization process of resolving bottleneck

12
Profiling/Tracing Methods
13
Purpose and Method
  • Purpose
  • Review on existing profiling and tracing methods
    (instrumentation stage) based on experimental
    performance measurement
  • Evaluate the various methods and their
    applicability to our PAT
  • Method
  • Literature search on profiling and tracing
    (include some review of existing tools)
  • Categorize the methods
  • Evaluate the applicability of each method toward
    design of UPC/SHMEM PAT
  • Quick overview of method and recommendations
    included here
  • See Combined Report - Section 6.1 for complete
    description and recommendations

14
Summary (1)
  • Overhead
  • Manual amount of work needed from user
  • Performance overhead added by tool to program
  • Profiling / Tracing
  • Profiling collecting of statistical event data.
    Generally refers to filtering and aggregating a
    subset of event data after program terminates
  • Tracing Use to record the majority of events
    possible in logical order (generally with
    timestamp). Can use to reconstruct accurate
    program behavior. Require large amount of storage
  • 2 ways to lower tracing cost (1) compact tract
    file format (2) Smart tracing system that turns
    on and off
  • Manual vs. Automatic user/tool that is
    responsible for the instrumentation of original
    code. Categorization of which event is better
    suited for which method is desirable

15
Summary (2)
  • Number of passes The number of times a program
    need to be executed to get performance data. One
    pass is desirable for long running program, but
    multi-pass can provide more accurate data (ex
    first passprofiling, later passtracing using
    profiling data to turn on and off tracing).
    Hybrid method is available but might not be as
    accurate as multi-pass
  • Levels - need at least source and binary to be
    useful (some event more suited for source level
    and other binary level)
  • Source level manual, pre-compiler,
    instrumentation language
  • System level library or compiler
  • Operating system level
  • Binary level statically or dynamically

16
Performance Factors
17
Purpose and Method
  • Purpose
  • Provide a formal definition of the term
    performance factor
  • Present motivation for calculating performance
    factors
  • Discuss what constitutes a good performance
    factor
  • Introduce a three step approach to determine if a
    factor is good
  • Method
  • Review and provide a concise summary of the
    literature in the area of performance factors for
    parallel systems
  • See Combined Report - Section 6.2 for more details

18
Features of Good Performance Factors
  • Characteristics of a good performance factor
  • Reliability
  • Repeatability
  • Ease of Measurement
  • Consistency
  • Testing
  • On each platform, determine ease of measurement
  • Determine repeatability
  • Determine reliability and consistency by one of
    the following
  • Modify the factor using real hardware
  • Find justification in the literature
  • Derive the information from performance models

19
Analysis Strategies
20
Purpose and Method
  • Purpose
  • Review of existing analysis and bottleneck
    detection methods
  • Method
  • Literature search on existing analysis strategies
  • Categorize the strategies
  • Examine methods that are applied before, during,
    or after execution
  • Weight post-mortem runtime analysis (most
    useful for a PAT)
  • Evaluate the applicability of each method toward
    design of UPC/SHMEM PAT
  • See Analysis Strategies report for details

21
Analysis Methods
  • Performance analysis methods
  • The why of performance tools
  • Make sense of data collected from tracing or
    profiling
  • Classically performed after trace collection,
    before visualization (see right)
  • But, some strategies choose to do it at other
    times and in different ways
  • Bottleneck detection
  • Another form of analysis!
  • Bottleneck detection methods are also shown in
    this report
  • Optimizations also closely related, but discussed
    in combined report
  • Combined Report - Section 6.5

22
When/How to Perform Analysis
  • Can do at different times
  • Post-mortem after a program runs
  • Usually performed in conjunction with tracing
  • During runtime must be quick, but can guide data
    collection
  • Beforehand work on abstract syntax trees from
    parsing source code
  • But hard to know what will happen at runtime!
  • Only one existing strategy fit in this category
  • Also manual vs. automatic
  • Manual Rely on user to perform actions
  • e.g., manual post-mortem analysis look at
    visualizations and manually determine bottlenecks
  • User is clever, but hard to scale this analysis
    technique
  • Semi-automatic Perform some work to make users
    job easier
  • e.g., filtering, aggregation, pattern matching
  • Most techniques try to strike a balance
  • Too automated can miss stuff (computer is dumb)
  • Too manual high overhead for user
  • Can also be used to guide data collection at
    runtime
  • Automatic No existing systems are really fully
    automatic

23
Post-mortem
  • Manual techniques
  • Types
  • Let the user figure it out based on
    visualizations
  • Data can be very overwhelming!
  • Simulation based on collected data at runtime
  • Traditional analysis techniques (Amdahls law,
    isoefficiency)
  • De-facto standard for most existing tools
  • Tools Jumpshot, Paraver, VampirTrace, mpiP,
    SvPablo
  • Semi-automated techniques
  • Let the machine do the hard work
  • Types
  • Critical path analysis, phase analysis (IPS-2)
  • Sensitivity analysis (S-Check)
  • Automatic event classification (machine learning)
  • Record overheads predict effect of removing
    (Scal-tool, SCALEA)
  • Knowledge based (Poirot, KAPPA-PI, FINESSE,
    KOJAK/EXPERT)
  • Knowledge representation techniques (ASL, EDL,
    EARL)

24
On-line
  • Manual techniques
  • Make the user perform analysis during execution
  • Not a good idea!
  • Too many things going on
  • Semi-automated techniques
  • Try to reduce overhead of full tracing
  • Look at a few metrics at a time
  • Most use dynamicdynamic instrumentation
  • Types
  • Paradyn-like approach
  • Start with hypotheses
  • Use refinements based on data collected at
    runtime
  • Paradyn, Peridot (not implemented?), OPAL
    (incremental approach)
  • Lost cycles (sample program state at runtime)
  • Trace file clustering

25
Pre-execution
  • Manual techniques
  • Simulation modeling (FASE approach at UF, etc.)
  • Can be powerful, but
  • Computationally expensive to do for accuracy
  • High user overhead in creating models
  • Semi-automated techniques
  • Hard to analyze a program automatically!
  • One existing system PPA
  • Parallel program analyzer
  • Works on source codes abstract syntax tree
  • Requires compiler/parsing support
  • Vaporware?

26
Presentation Methodology
27
Purpose and Method
  • Purpose
  • Discuss visualization concepts
  • Present general approaches for performance
    visualization
  • Summarize a formal user interface evaluation
    technique
  • Discuss the integration of user-feedback into a
    graphical interface
  • Methods
  • Review and provide a concise summary of the
    literature in the area of visualization for
    parallel performance data
  • See Presentation Methodology report for details

28
Summary of Visualizations
Visualization Name Advantages Disadvantages Include in the PAT Used For
Animation Adds another dimension to visualizations CPU intensive Yes Various
Program Graphs (N-ary tree) Built-in zooming Integration of high and low-level data Difficult to see inter-process data Maybe Comprehensive Program Visualization
Gantt Charts (Time histogram Timeline) Ubiquitous Intuitive Not as applicable to shared memory as to message passing Yes Communication Graphs
Data Access Displays (2D array) Provide detailed information regarding the dynamics of shared data Narrow focus Users may not be familiar with this type of visualization Maybe Data Structure Visualization
Kiviat Diagrams Provides an easy way to represent statistical data Can be difficult to understand Maybe Various statistical data (processor utilization, cache miss rates, etc.)
Event Graph Displays (Timeline) Can be used to display multiple data types (event-based) Mostly provides only high-level information Maybe Inter-process dependency
29
Evaluation of User Interfaces
  • General Guidelines
  • Visualization should guide, not rationalize
  • Scalability is crucial
  • Color should inform, not entertain
  • Visualization should be interactive
  • Visualizations should provide meaningful labels
  • Default visualization should provide useful
    information
  • Avoid showing too much detail
  • Visualization controls should be simple
  • GOMS
  • Goals, Operators, Methods, and Selection Rules
  • Formal user interface evaluation technique
  • A way to characterize a set of design decisions
    from the point of view of the user
  • A description of what the user must learn may be
    the basis for reference documentation
  • The knowledge is described in a form that can
    actually be executed (there have been several
    fairly successful attempts to implement GOMS
    analysis in software, ie GLEAN)
  • There are various incarnations of GOMS with
    different assumptions useful for more specific
    analyses (KVL, CMN-GOMS, NGOMSL, CPM-GOMS, etc.)

30
Conclusion
  • Plan for development
  • Develop a preliminary interface that provides the
    functionality required by the user while
    conforming to visualization guidelines presented
    previously
  • After the preliminary design is complete, elicit
    user feedback
  • During periods where user contact is unavailable,
    we may be able to use GOMS analysis or another
    formal interface evaluation technique

31
Usability
32
Purpose and Method
  • Purpose
  • Provide a discussion on the factors influencing
    the usability of performance tools
  • Outline how to incorporate user-centered design
    into the PAT
  • Discuss common problems seen in performance tools
  • Present solutions to these problems
  • Method
  • Review and provide a concise summary of the
    literature in the area of usability for parallel
    performance tools
  • See Combined Report - Section 6.4.1 for complete
    description and reasons behind inclusion of
    various criteria

33
Usability Factors
  • Ease-of-learning
  • Discussion
  • Important for attracting new users
  • A tools interface shapes the users
    understanding of its functionality
  • Inconsistency leads to confusion
  • Example Providing defaults for some object but
    not all
  • Conclusions
  • We should strive for internally and externally
    consistent tool
  • Stick to established conventions
  • Provide as uniform an interface as possible
  • Target as many platforms as possible so the user
    can amortize the time invested over many uses
  • Ease-of-use
  • Discussion
  • Amount of effort required to accomplish work with
    the tool
  • Conclusions
  • Dont force the user to memorize information
    about the interface. Use menus, mnemonics, and
    other mechanisms
  • Provide a simple interface
  • Make all user-required actions concrete and
    logical
  • Usefulness

34
User-Centered Design
  • General Principles
  • Usability will be achieved only if the software
    design process is user-driven
  • Understand the target users
  • Usability should be the driving factor in tool
    design
  • Four-step model to incorporate user feedback
    (Chronological)
  • Ensure initial functionality is based on user
    needs
  • Solicit input directly from the user
  • MPI users
  • UPC/SHMEM users
  • Meta-user
  • We cant just go by what we think is useful
  • Analyze how users identify and correct
    performance problems
  • UPC/SHMEM users primarily
  • Gain a better idea of how the tool will actually
    be used on real programs
  • Information from users is then presented to the
    meta-user for critique/feedback
  • Develop incrementally
  • Organize the interface so that the most useful
    features are the best supported
  • User evaluation of preliminary/prototype designs
  • Maintain a strong relationship with the users
    with whom we have access

35
UPC/SHMEM Language Analysis
36
Purpose and Method
  • Purpose
  • Determine performance factors purely from the
    languages perspective
  • Correlate performance factors to individual
    UPC/SHMEM construct
  • Method
  • Come up with a complete and minimal factor list
  • Analyze the UPC and SHMEM (Quadrics and SGI) spec
  • Analyze the various implementations
  • Berkeley Michigan UPC translated file system
    code
  • HP UPC pending until NDA process is completed
  • GPSHMEM based on system code
  • See Language Analysis report for complete details

37
Tool Evaluation Strategy
38
Purpose and Method
  • Purpose
  • Provide the basis for evaluation of existing tool
  • Method
  • Literature search on existing evaluation methods
  • Categorize, adding and filtering of applicable
    criterion
  • Evaluate the importance of these criterion
  • Summary table of the final 23 criteria
  • See Combined Report - Section 9 for complete
    description and reasons behind inclusion of
    various criteria

39
Feature (section) Description Information to gather Categories Importance Rating
Available metrics (9.2.1.3) Kind of metric/events the tool can tract (ex function, hardware, synchronization) Metrics it can provide (function, hw ) Productivity Critical
Cost (9.1.1) Physical cost for obtaining software, license, etc. How much Miscellaneous Average
Documentation quality (9.3.2) Helpfulness of the document in term of understanding the tool design and its usage (usage more important) Clear document? Helpful document? Miscellaneous Minor
Extendibility (9.3.1) Ease of (1) add new metrics (2) extend to new language, particularly UPC/SHMEM Estimating of how easy it is to extend to UPC/SHMEM How easy is it to add new metrics Miscellaneous Critical
Filtering and aggregation (9.2.3.1) Filtering is the elimination of noise data, aggregation is the combining of data into a single meaningful event. Does it provide filtering? Aggregation? To what degree Productivity, Scalability Critical
Hardware support (9.1.4) Hardware support of the tool Which platforms? Usability, Portability Critical
Heterogeneity support (9.1.5) Heterogeneity deals with the ability to run the tool in a system where nodes have different HW/SW configuration. Support running in a heterogeneous environment? Miscellaneous Minor
40
Installation (9.1.2) Ease of installing the tool How to get the software How hard to install the software Components needed Estimate number of hours needed for installation Usability Minor
Interoperability (9.2.2.2) Ease of viewing result of tool using other tool, using other tool in conjunction with this tool, etc. List of other tools that can be used with this Portability Average
Learning curve (9.1.6) Learning time required to use the tool Estimate learning time for basic set of features and complete set of features Usability, Productivity Critical
Manual overhead (9.2.1.1) Amount of work needed by the user to instrument their program Method for manual instrumentation (source code, instrumentation language, etc) Automatic instrumentation support Usability, Productivity Average
Measurement accuracy (9.2.2.1) Accuracy level of the measurement Evaluation of the measuring method Productivity, Portability Critical
Multiple analyses (9.2.3.2) The amount of post measurement analysis the tool provides. Generally good to have different analyses for the same set of data Provide multiple analyses? Useful analyses? Usability Average
Multiple executions (9.3.5) Tool support for executing multiple program at once Support multiple executions? Productivity Minor ? Average
Multiple views (9.2.4.1) Tools ability to provide different view/presentation for the same set of data Provide multiple views? Intuitive views? Usability, Productivity Critical
41
Performance bottleneck identification (9.2.5.1) Tools ability to identify the point of performance bottleneck and its ability to help resolving the problem Support automatic bottleneck identification? How? Productivity Minor ? Average
Profiling / tracing support (9.2.1.2) Method of profiling/tracing the tool utilize Profiling? Tracing? Trace format Trace strategy Mechanism for turning on and off tracing Productivity, Portability, Scalability Critical
Response time (9.2.6) Amount of time needed before any useful information is feed back to the user after program execution How long does it take to get back useful information Productivity Average
Searching (9.3.6) Tool support for search of particular event or set of events Support data searching? Productivity Minor
Software support (9.1.3) Software support of the tool Libraries it supports Languages it supports Usability, Productivity Critical
Source code correlation (9.2.4.2) Tools ability to correlate event data back to the source code Able to correlate performance data to source code? Usability, Productivity Critical
System stability (9.3.3) Stability of the tool Crash rate Usability, Productivity Average
Technical support (9.3.4) Responsiveness of the tool developer Time to get a response from developer. Quality/usefulness of system messages Usability Minor ? Average
42
Tool Evaluations
43
Purpose and Method
  • Purpose
  • Evaluation of existing tools
  • Method
  • Pick a set of modern performance tools to
    evaluate
  • Try to pick most popular tools
  • Also pick tools that are innovative in some form
  • For each tool, evaluate and score using the
    standard set of criteria
  • Also
  • Evaluate against a set of programs with known
    bottlenecks to test how well each tool helps
    improve performance
  • Attempt to find out which metrics are recorded by
    a tool and why
  • Tools TAU, PAPI, Paradyn, MPE/Jumpshot-4, mpiP,
    Vampir/VampirTrace (now Intel cluster tools),
    Dynaprof, KOJAK, SvPablo in progress,
    MPICL/Paragraph in progress
  • See Tool Evaluation presentations for complete
    evaluation of each tool

44
Instrumentation Methods
  • Instrumentation methodology
  • Most tools use the MPI profiling interface
  • Reduces instrumentation overhead for user and
    tool developer
  • We are exploring ways to create and use something
    similar for UPC, SHMEM
  • A few tools use dynamic, binary instrumentation
  • Paradyn, Dynaprof examples
  • Makes things very easy for user, but very
    complicated for tool developer
  • Tools that rely entirely on manual
    instrumentation can be very frustrating to use!
  • We should avoid this by using existing
    instrumentation libraries and code from other
    projects
  • Instrumentation overhead
  • Most tools achieved less than 20 overhead for
    default set of instrumentation
  • Seems to be a likely target we should aim for in
    our tool

45
Visualizations
  • Many tools provide one way of looking at things
  • Do one thing, but do it well
  • Can cause problems if performance is hindered due
    to something not being shown
  • Gantt-chart/timeline visualizations most
    prevalent
  • Especially in MPI-specific tools
  • Tools that allow multiple ways of looking things
    can ease analysis
  • However, too many methods can become confusing
  • Best to use a few visualizations that display
    different information
  • In general, creating good visualizations not
    trivial
  • Some visualizations that look neat arent
    necessarily useful
  • We should try to export to known formats (Vampir,
    etc) to leverage existing tools and code

46
Bottleneck Detection
  • To test, used PPerfMark
  • Extension of GrindStone benchmark suite for MPI
    applications
  • Contains short (lt100 lines C code) applications
    with obvious bottlenecks
  • Most tools rely on user to pick out bottlenecks
    from visualization
  • This affects scalability of tool as size of
    system increases
  • Notable exceptions Paradyn, KOJAK
  • In general, most tools faired well
  • System time benchmark was hardest to pick out
  • Tools that lack source code correlation also make
    it hard to track down where bottleneck occurs
  • Best strategy seems to be combination of trace
    visualization and automatic analysis

47
Conclusions and Status 1
  • Completed tasks
  • Programming practices
  • Mod 2n inverse, convolution, CAMEL cipher,
    concurrent wave equation, depth-first search
  • Literature searches/preliminary research
  • Experimental performance measurement techniques
  • Language analysis for UPC (spec, Berekely,
    Michigan) and SHMEM (spec, GPSHMEM, Quadrics
    SHMEM, SGI SHMEM)
  • Optimizations
  • Performance analysis strategies
  • Performance factors
  • Presentation methodologies
  • Performance modeling and prediction
  • Creation of tool evaluation strategy
  • Tool evaluations
  • Paradyn, TAU, PAPI/Perfometer, MPE/Jumpshot,
    Dimemas/Paraver/MPITrace, mpiP, Intel cluster
    tools, Dynaprof, KOJAK

48
Conclusions and Status 2
  • Tasks currently in progress
  • Finish tool evaluations
  • SvPablo and MPICL/ParaGraph
  • Finish up language analysis
  • Waiting on NDAs for HP UPC
  • Also on access to a Cray machine
  • Write tool evaluation and language analysis
    report
  • Creation of high-level PAT design documents
    (starting week of 3/28/2005)
  • Creating a requirements list
  • Generating a specification for each requirement
  • Creating a design plan based on the
    specifications and requirements
  • For more information, see PAT Design Plan on
    project website
Write a Comment
User Comments (0)
About PowerShow.com