Design of a Test Suite for Automatic Performance Analysis Tools - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Design of a Test Suite for Automatic Performance Analysis Tools

Description:

int main(int argc, char *argv[]) { distr_func_t df = atodf('b2:0.5:1.0' ... case 2: df = atodf(argv[1]); dd = atodd(argv[1]); case 1: break; ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 29
Provided by: berndmohrj
Category:

less

Transcript and Presenter's Notes

Title: Design of a Test Suite for Automatic Performance Analysis Tools


1
Design of a Test Suitefor Automatic
Performance Analysis Tools
Design of a Test Suitefor (Automatic)Performance
Analysis Tools
  • Bernd Mohr
  • Forschungszentrum Jülich
  • NIC
  • Germany
  • b.mohr_at_fz-juelich.de

Jesper Larsson Träff NEC Europe Ltd. CC
Research Labs Germany traff_at_ccrl-nece.de
Michael Gerndt Tech. Universität
München LRR Germany gerndt_at_in.tum.de
2
APART Terminologie
  • Performance Property
  • Aspect of performance behavior of an application
  • E.g., communication dominated by waiting time
  • Specified as condition referring to performance
    data
  • Quantified and normalized in terms
    ofbehavior-independent metric (severity)
  • Performance Problem
  • Performance property with negative implications
  • Performance Bottleneck
  • Performance Problem with highest severity

3
Example Performance Property Message in Wrong
Order
Location
B
SEND
SEND
wait
RECV
A
Time
4
The APART Test Suite (ATS)
  • Users rely on correct working of tools
  • Tools need to be especially well tested
  • Systematic approach needed
  • APART Test Suite
  • Common project inside APART group
  • Every member needs this ? minimize resources
  • Ensures re-usability
  • Will also allow evaluation / comparison ofthe
    different member projects
  • Main focus automatic performance analysis tools
  • But also useful for regular performance tools
  • http//www.fz-juelich.de/apart/ats/

5
Desired Functionality
  • Tests to determine whether the semanticsof the
    original program were not altered
  • Tests to see whether the recordedperformance
    data is correct
  • Synthetic positive test cases for each known and
    definedperformance property and combinations of
    them
  • Negative test cases which have no known
    performance problem
  • Real world size parallel applications and
    benchmarks
  • Can be partially based on existing validation
    suites ? WWW
  • Probably needs to be tool specific
  • Collect available benchmarks and applications ?
    WWW
  • Design and Implementation of a ATS Framework

6
Validation Suites and Kernel Benchmarks (I)
  • Validation
  • MPI test / validation suites from Intel, IBM, ANL
  • http//www-unix.mcs.anl.gov/mpi/mpi-test/tsuite.ht
    ml
  • MPI Benchmarks
  • PARKBENCH (PARallel Kernels and BENCHmarks)
  • http//www.netlib.org/parkbench/
  • PMB - Pallas MPI Benchmarks
  • http//www.pallas.com/e/products/pmb/
  • SKaMPI (Special Karlsruher MPI Benchmark)
  • http//liinwww.ira.uka.de/skampi/

7
Kernel Benchmarks (II)
  • OpenMP Benchmarks
  • EPCC OpenMP Microbenchmarks
  • http//www.epcc.ed.ac.uk/ research/openmpbenc
    h/openmp_index.html
  • Hybrid Benchmarks
  • The Los Alamos MicroBenchmarks Suite (LAMB)
  • MPI and multi threading ( Pthreads and OpenMP)
    programming models based on SKaMPI and EPCC

8
Real World Applications and Benchmarks
  • The NAS Parallel Benchmarks (NPB)
  • http//www.nas.nasa.gov/Software/NPB/
  • The ASCI Purple and Blue Benchmark Codes
  • http//www.llnl.gov/ asci/purple/benchmarks/l
    imited/code_list.html asci_benchmarks/asci/as
    ci_code_list.html
  • NCAR Benchmarks
  • http//www.scd.ucar.edu/css/software/bench/

9
Current Design of ATS Framework
10
The Distribution Module
  • Distribution specified by
  • Distribution function
  • Distribution parameters
  • All distribution function have the same signature
  • double distr_func (int me, int size, double sf,
    distr_t dd)
  • me, size member me of group of size size
  • sf scaling factor
  • dd distribution parameter descriptor
  • returns value for me calculated based on me,
    size, and ddscaled by sf
  • ATS provides set of predefined distribution
    functions
  • Can easily extended if needed

11
Predefined Distribution Functions
n
same
linear
peak
block2
cyclic2
block3
cyclic3
12
Current Design of ATS Framework
13
Example MPI Property Function late_sender
void par_do_mpi_work(distr_func_t df, distr_t
dd, MPI_Comm c) int me, sz
MPI_Comm_rank(c, me) MPI_Comm_size(c, sz)
do_work(df(me, sz, 1.0, dd)) void
late_sender(double bwork, double ework, int r,
MPI_Comm c) val2_distr_t dd int i
mpi_buf_t buf alloc_mpi_buf(base_type,
base_cnt) dd.low bworkework dd.high
bwork for (i 0 iltr i)
par_do_mpi_work(df_cyclic2, dd, c)
mpi_commpattern_sendrecv(buf, DIR_UP, 0, 0, c)
free_mpi_buf(buf)
14
Currently Implemented Performance Property
Functions
  • MPI Point-to-Point Communication Performance
    Properties
  • late_sender(basework, extrawork, rf, MPI_Comm)
  • late_receiver(basework, extrawork, rf, MPI_Comm)
  • MPI Collective Communication Performance
    Properties
  • imbalance_at_mpi_barrier(distr_func, distr_param,
    rf, MPI_Comm)
  • imbalance_at_mpi_alltoall(distr_func,
    distr_param, rf, MPI_Comm)
  • late_broadcast(basework, rootextrawork, root, rf,
    MPI_Comm)
  • late_scatter(basework, rootextrawork, root, rf,
    MPI_Comm)
  • late_scatterv(basework, rootextrawork, root, rf,
    MPI_Comm)
  • early_reduce(rootwork, baseextrawork, root, rf,
    MPI_Comm)
  • early_gather(rootwork, baseextrawork, root, rf,
    MPI_Comm)
  • early_gatherv(rootwork, baseextrawork, root, rf,
    MPI_Comm)

15
Currently Implemented Performance Property
Functions
  • OpenMP Imbalance Performance Properties
  • imbalance_in_parallel_region(distr_func,
    distr_param, rf)
  • imbalance_at_barrier(distr_func, distr_param,
    rf)
  • imbalance_in_parallel_loop(distr_func,
    distr_param, rf)
  • imbalance_in_parallel_loop_nowait(distr_func,
    distr_param, rf)
  • imbalance_in_parallel_section(distr_func,
    distr_param, rf)
  • imbalance_due_to_uneven_section_distribution(workl
    oad, rf)
  • imbalance_due_to_not_enough_section(workload,
    rf)
  • imbalance_in_ordered_loop(distr_func,
    distr_param, factor, rf)
  • OpenMP Unparallelized Performance Properties
  • unparallelized_in_master_region(workload, rf)
  • unparallelized_in_single_region(workload, rf)
  • unparallelized_in_ordered_loop(workload, factor,
    rf)

16
Currently Implemented Performance Property
Functions
  • OpenMP Synchronization Performance Properties
  • critical_section_locking(rf)
  • critical_section_contention(factor, rf)
  • serialization_due_to_critical_section(factor,
    workload, rf)
  • frequent_atomic(factor, rf)
  • setting_lock(rf)
  • lock_testing(workload, rf)
  • lock_waiting(workload, rf)
  • all_threads_lock_contention(factor, rf)
  • pairwise_lock_contention(factor, rf)

17
Currently Implemented Performance Property
Functions
  • OpenMP Parallel Overhead Performance Properties
  • dynamic_scheduling_overhead(factor, rf)
  • scheduling_overhead_in_parallelized_inner_loop(big
    factor, smallfactor, rf)
  • insufficient_work_in_parallel_loop(factor, rf)
  • firstprivate_initialization(arraysize, rf)
  • lastprivate_overhead(factor, rf)
  • reduction_handling(factor, rf)
  • false_sharing_in_parallel_region(factor, rf)

18
Current Design of ATS Framework
19
Performance Property Test Programs
  • Single performance property testing
  • Programs can be generated automatically
    fromperformance property function signature
  • Generator based on Program Database Toolkit (PDT)
  • http//www.cs.uoregon.edu/research/paracomp/pdtool
    kit/
  • Property parameters become test program arguments
  • More extensive tests through scripting
    languagesor experiment management system (e.g.,
    Zenturio)
  • http//www.par.univie.ac.at/project/zenturio/
  • Composite performance property testing
  • Program containing multiple performance property
    functions
  • Complexity only limited by imagination
  • Currently manually implemented

20
Example Single Performance Property Test Program
include "mpi_pattern.h" int main(int argc,
char argv) distr_func_t df
atodf("b20.51.0") distr_t dd
atodd("b20.51.0") int r 1 char
basecomm "d20000" MPI_Init(argc,
argv) switch ( argc ) case 4 basecomm
argv3 case 3 r atoi(argv2) case 2
df atodf(argv1) dd atodd(argv1) case
1 break default fprintf(stderr, "usage s
ltdistfgt ltrfacgt ltbcommgt\n",
argv0) break set_base_comm_a(basecomm)
imbalance_at_mpi_barrier(df, dd, r,
MPI_COMM_WORLD) MPI_Finalize()
21
Example Single Performance Property Test Program
  • imbalance_at_mpi_barrier ltdistribution-specgt
    ltrepition-factorgt
  • b20.51.0 2
    b20.12.0 5
  • Problem additional property MPI
    Setup/Termination Overhead also holds!

22
Example Collection of MPI Performance Properties
23
Examples Detail MPI Properties
24
Example MPI Properties in 2 Communicators
25
EXPERT Analysis of MPI 2 Communicator Example
26
Example OpenMP Performance Property
27
EWOMP03 Study
  • Evaluating OpenMP Analysis Tools with the APART
    Test Suite
  • Hitachi SR8000 Profiling
  • KOJAK/Expert
  • Vampir (based on EPILOG traces)
  • Intel Vtune
  • Result summary
  • All tools find imbalance properties
  • Expert and Vtune allow to also inspect sequential
    code
  • Vtune is the only
  • Distinguishing lock competition and synchronized
    time
  • Identifying imbalance in nowait loop
  • None of the tools is able to give more detailed
    information,e.g., distinguishing Imbalance in
    section andimbalanced number of sections

28
ATS Status and Future Work
  • Initial prototype available from APART website
  • List of MPI, OpenMP, and hybridvalidation and
    benchmark suites
  • 1st version of ATS framework including
  • C and Fortran version of code
  • Single property test program generator
  • Future Work
  • More complete collection of validation and
    benchmark suites
  • Real real world applications
  • More studies
  • ATS Framework
  • Even more complete list of property functions
    forMPI, OpenMP, hybrid, and sequential
    performance properties
  • Documentation
Write a Comment
User Comments (0)
About PowerShow.com