Kai Li, Allen D. Malony, Robert Bell, Sameer Shende - PowerPoint PPT Presentation

About This Presentation
Title:

Kai Li, Allen D. Malony, Robert Bell, Sameer Shende

Description:

Department of Computer and Information Science ... Falcon (Schwan, Vetter): computational steering. Dynamic instrumentation and performance search ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 23
Provided by: allend7
Category:
Tags: allen | bell | falcon | kai | malony | robert | sameer | shende

less

Transcript and Presenter's Notes

Title: Kai Li, Allen D. Malony, Robert Bell, Sameer Shende


1
A Framework for Online PerformanceAnalysis and
Visualization of Large-Scale Parallel Applications
  • Kai Li, Allen D. Malony, Robert Bell, Sameer
    Shende
  • likai,malony,bertie,sameer_at_cs.uoregon.edu
  • Department of Computer and Information Science
  • Computational Science Institute, NeuroInformatics
    Center
  • University of Oregon

2
Outline
  • Problem description
  • Scaling and performance observation
  • Interest in online performance analysis
  • General online performance system architecture
  • Access models
  • Profiling issues and control issues
  • Framework for online performance analysis
  • TAU performance system
  • SCIRun computational and visualization
    environment
  • Experiments
  • Conclusions and future work

3
Problem Description
  • Need for parallel performance observation
  • Instrumentation, measurement, analysis,
    visualization
  • In general, there is the concern for intrusion
  • Seen as a tradeoff with accuracy of performance
    diagnosis
  • Scaling complicates observation and analysis
  • Issues of data size, processing time, and
    presentation
  • Online approaches add capabilities as well as
    problems
  • Performance interaction, but at what cost?
  • Tools for large-scale performance observation
    online
  • Supporting performance system architecture
  • Tool integration, effective usage, and portability

4
Scaling and Performance Observation
  • Consider traditional measurement methods
  • Profiling summary statistics calculated during
    execution
  • Tracing time-stamped sequence of execution
    events
  • More parallelism ? more performance data overall
  • Performance specific to each thread of execution
  • Possible increase in number interactions between
    threads
  • Harder to manage the data (memory, transfer,
    storage, )
  • More parallelism / performance data ? harder
    analysis
  • More time consuming to analyze
  • More difficult to visualize (meaningful displays)
  • Need techniques to address scaling at all levels

5
Why Complicate Matters with Online Methods?
  • Adds interactivity to performance analysis
    process
  • Opportunity for dynamic performance observation
  • Instrumentation change
  • Measurement change
  • Allows for control of performance data volume
  • Post-mortem analysis may be too late
  • View on status of long running jobs
  • Allow for early termination
  • Computation steering to achieve better results
  • Performance steering to achieve better
    performance
  • Online performance observation may be intrusive

6
Related Ideas
  • Computational steering
  • Falcon (Schwan, Vetter) computational steering
  • Dynamic instrumentation and performance search
  • Paradyn (Miller) online performance bottleneck
    analysis
  • Adaptive control and performance steering
  • Active Harmony (Hollingsworth) auto decision
    control
  • Autopilot (Reed) actuator/sensor performance
    steering
  • Scalable monitoring
  • Peridot (Gerndt) automatic online performance
    analysis
  • MRNet (Miller) multi-case reduction for access /
    control
  • Scalable analysis and visualization
  • VNG (Brunst) parallel trace analyis

7
General Online Performance Observation System
8
Models of Performance Data Access (Monitoring)
  • Push Model
  • Producer/consumer style of access and transfer
  • Application decides when/what/how much data to
    send
  • External analysis tools only consume performance
    data
  • Availability of new data is signaled passively or
    actively
  • Pull Model
  • Client/server style of performance data access
    and transfer
  • Application is a performance data server
  • Access decisions are made externally by analysis
    tools
  • Two-way communication is required
  • Push/Pull Models

9
Online Profiling Issues
  • Profiles are summary statistics of performance
  • Kept with respect to some unit of parallel
    execution
  • Profiles are distributed across the machine (in
    memory)
  • Must be gathered and delivered to profile
    analysis tool
  • Profile merging must take place (possibly in
    parallel)
  • Consistency checking of profile data
  • Callstack must be updated to generate correct
    profile data
  • Correct communication statistics may require
    completion
  • Event identification (not necessary is save event
    names)
  • Sequence of profile samples allow interval
    analysis
  • Interval frequency depends on profile collection
    delay

10
Performance Control
  • Instrumentation control
  • Dynamic instrumentation
  • Inserts / removes instrumentation at runtime
  • Measurement control
  • Dynamic measurement
  • Enabling / disabling / changing of measurement
    code
  • Dynamic instrumentation or measurement variables
  • Data access control
  • Selection of what performance data to access
  • Control of frequency of access

11
TAU Performance System Framework
  • Tuning and Analysis Utilities (aka Tools Are Us)
  • Performance system framework for scalable
    parallel and distributed high-performance
    computing
  • Targets a general complex system computation
    model
  • nodes / contexts / threads
  • Multi-level system / software / parallelism
  • Measurement and analysis abstraction
  • Integrated toolkit for performance
    instrumentation, measurement, analysis, and
    visualization
  • Portable performance profiling/tracing facility
  • Open software approach

12
TAU Performance System Architecture
Paraver
EPILOG
ParaProf
13
Online Profile Measurement and Analysis in TAU
  • Standard TAU profiling
  • Per node/context/thread
  • Profile dump routine
  • Context-level
  • Profile file per eachthread in context
  • Appends to profile file
  • Selective event dumping
  • Analysis tools access filesthrough shared file
    system
  • Application-level profileaccess routine

14
Online Performance Analysis and Visualization
SCIRun (Univ. of Utah)
Performance Visualizer
Application
// performance data streams
TAU Performance System
Performance Analyzer
// performance data output
accumulated samples
Performance Data Reader
Performance Data Integrator
file system
sample sequencing reader synchronization
15
Profile Sample Data Structure in SCIRun
node
context
thread
16
Performance Analysis/Visualization in SCIRun
SCIRun program
17
Uintah Computational Framework (UCF)
  • Universityof Utah
  • UCF analysis
  • Scheduling
  • MPI library
  • Components
  • 500 processes
  • Use for onlineand offlinevisualization
  • Apply SCIRunsteering

18
Terrain Performance Visualization
F
19
Scatterplot Displays
  • Each pointcoordinatedeterminedby threevalues
  • MPI_Reduce
  • MPI_Recv
  • MPI_Waitsome
  • Min/Maxvalue range
  • Effective forclusteranalysis
  • Relation between MPI_Recv and MPI_Waitsome

20
Online Unitah Performance Profiling
  • Demonstration of online profiling capability
  • Colliding elastic disks
  • Test material point method (MPM) code
  • Executed on 512 processors ASCI Blue Pacific at
    LLNL
  • Example 1 (Terrain visualization)
  • Exclusive execution time across event groups
  • Multiple time steps
  • Example 2 (Bargraph visualization)
  • MPI execution time and performance mapping
  • Example 3 (Domain visualization)
  • Task time allocation to patches

21
Example 1 (Event Groups)
22
Example 2 (MPI Performance)
23
Example 3 (Domain-Specific Visualization)
24
ParaProf Framework Architecture
  • Portable, extensible, and scalable tool for
    profile analysis
  • Offer best of breed capabilities to performance
    analysts
  • Build as profile analysis framework for
    extensibility

25
ParaProf Profile Display (VTF)
  • Virtual Testshock Facility (VTF), Caltech, ASCI
    Center
  • Dynamic measurement, online analysis,
    visualization

26
Full Profile Display (SAMRAI)
  • Structured AMR toolkit (SAMRAI), LLNL

512 processes
27
Evaluation of Experimental Approaches
  • Currently only supporting push model
  • File system solution for moving performance data
  • Is this a scalable solution?
  • Robust solution that can leverage
    high-performance I/O
  • May result in high intrusion
  • However, does not require IPC
  • Should be relatively portable
  • Analysis and visualization only runs sequentially

28
Possible Improvements
  • Profile merging at context level to reduce number
    of files
  • Merging at node level may require explicit
    processing
  • Concurrent trace merging could also reduce files
  • Hierarchical merge tree
  • Will require explicit processing
  • Could consider IPC transfer
  • MPI (e.g., used in mpiP for profile merging)
  • Create own communicators
  • Sockets or PACX between computer server and
    analyzer
  • Leverage large-scale systems infrastructure
  • Parallel profile analysis

29
Concluding Remarks
  • Interest in online performance monitoring,
    analysis, and visualization for large-scale
    parallel systems
  • Need to intelligently use
  • Benefit from other scalability considerations of
    the system software and system architecture
  • See as an extension to the parallel system
    architecture
  • Avoid solutions that have portability
    difficulties
  • In part, this is an engineering problem
  • Need to work with the system configuration you
    have
  • Need to understand if approach is applicable to
    problem
  • Not clear if there is a single solution

30
Future Work
  • Build online support in TAU performance system
  • Extend to support PULL model capabilities
  • Develop hierarchical data access solutions
  • Performance studies of full system
  • Latency analysis
  • Bandwidth analysis
  • Integration with other performance tools
  • System performance monitors
  • ParaProf parallel profile analyzer
  • Development of 3D visualization library
  • Portability focus
Write a Comment
User Comments (0)
About PowerShow.com