Tool Evaluation Scoring Criteria - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Tool Evaluation Scoring Criteria

Description:

Includes user's manuals, READMEs, and 'quick start' guides. Importance rating ... Manual Overhead. Description. Amount of user effort needed to instrument their code ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 28
Provided by: dral60
Category:

less

Transcript and Presenter's Notes

Title: Tool Evaluation Scoring Criteria


1
Tool Evaluation Scoring Criteria
  • Professor Alan D. George, Principal Investigator
  • Mr. Hung-Hsun Su, Sr. Research Assistant
  • Mr. Adam Leko, Sr. Research Assistant
  • Mr. Bryan Golden, Research Assistant
  • Mr. Hans Sherburne, Research Assistant
  • HCS Research Laboratory
  • University of Florida

2
Usability/Portability Characteristics
3
Available Metrics
  • Description
  • Depth of metrics provided by tool
  • Examples
  • Communication statistics or events
  • Hardware counters
  • Importance rating
  • Critical, users must be able to obtain
    representative performance data to debug
    performance problems
  • Rating strategy
  • Scored using relative ratings (subjective
    characteristic)
  • Compare tools available metrics with metrics
    provided by other tools

4
Documentation Quality
  • Description
  • Quality of documentation provided
  • Includes users manuals, READMEs, and quick
    start guides
  • Importance rating
  • Important, can have a large affect on overall
    usability
  • Rating strategy
  • Scored using relative ratings (subjective
    characteristic)
  • Correlated to how long it takes to decipher
    documentation enough to use tool
  • Tools with quick start guides or clear, concise
    high-level documentation receive higher scores

5
Installation
  • Description
  • Measure of time needed for installation
  • Also incorporates level of expertise necessary to
    perform installation
  • Importance
  • Minor, installation only needs to be done once
    and may not even be done by end user
  • Rating strategy
  • Scored using relative ratings based on mean
    installation time for all tools
  • All tools installed by a single person with
    significant system administration experiences

6
Learning Curve
  • Description
  • Difficulty level associated with learning to use
    tool effectively
  • Importance rating
  • Critical, tools that are perceived as being too
    difficult to operate by users will be avoided
  • Rating strategy
  • Scored using relative ratings (subjective
    characteristic)
  • Based on time necessary to get acquainted with
    all features needed for day-to-day operation of
    tool

7
Manual Overhead
  • Description
  • Amount of user effort needed to instrument their
    code
  • Importance rating
  • Important, tool must not cause more work for user
    in end (instead it should reduce time!)
  • Rating strategy
  • Use hypothetical test case
  • MPI program, 2.5 kloc in 20 .c files with 50
    user functions
  • Score one point for each of the following actions
    that can be completed on a fresh copy of source
    code in 10 minutes (estimated)
  • Instrument all MPI calls
  • Instrument all functions
  • Instrument five arbitrary functions
  • Instrument all loops, or a subset of loops
  • Instrument all function callsites, or a subset of
    callsites (about 35)

8
Measurement Accuracy
  • Description
  • How much runtime instrumentation overhead tool
    imposes
  • Importance rating
  • Important, inaccurate data may lead to incorrect
    diagnosis which creates more work for user with
    no benefit
  • Rating strategy
  • Use standard application CAMEL MPI program
  • Score based on runtime overhead of instrumented
    executable (wallclock time)
  • 0-4 five points
  • 5-9 four points
  • 10-14 three points
  • 15-19 two points
  • 20 or greater one point

9
Multiple Analyses/Views
  • Description
  • Different ways tool presents data to user
  • Different analyses available from within tool
  • Importance rating
  • Critical, tools must provide enough ways of
    looking at data so that users may track down
    performance problems
  • Rating strategy
  • Score based on relative number of views and
    analyses provided by each tool
  • Approximately one point for each different view
    and analyses provided by tool

10
Profiling/Tracing Support
  • Description
  • Low-overhead profile mode offered by tool
  • Comprehensive event trace offered by tool
  • Importance rating
  • Critical, profile mode useful for quick analysis
    and trace mode necessary for examining what
    really happens during execution
  • Rating strategy
  • Two points if a profiling mode is available
  • Two points if a tracing mode is available
  • One extra point if trace file size is within a
    few percent of best trace file size across all
    tools

11
Response Time
  • Description
  • How much time is needed to get data from tool
  • Importance rating
  • Average, user should not have to wait an
    extremely long time for data but high-quality
    information should always be first goal of tools
  • Rating strategy
  • Score is based on relative time taken to get
    performance data from tool
  • Tools that perform post-mortem complicated
    analyses or bottleneck detection receive lower
    scores
  • Tools that provide data while program is running
    receive five points

12
Source Code Correlation
  • Description
  • How well tool relates performance data back to
    original source code
  • Importance rating
  • Critical, necessary to see which statements and
    regions of code are causing performance problems
  • Rating strategy
  • Four to five points if tool supports source
    correlation to function or line level
  • One to three points if tool supports indirect
    method of attributing data to functions or source
    lines
  • Zero points if tool does not provide enough data
    to map performance metrics back to source code

13
Stability
  • Description
  • How likely tool is to crash while under use
  • Importance rating
  • Important, unstable tools will frustrate users
    and decrease productivity
  • Rating strategy
  • Scored using relative ratings (subjective
    characteristic)
  • Score takes into account
  • Number of crashes experienced during evaluation
  • Severity of crashes
  • Number of bugs encountered

14
Technical Support
  • Description
  • How quick responses are received from tool
    developers or support departments
  • Quality of information and helpfulness of
    responses
  • Importance rating
  • Average, important for users during installation
    and initial use of tool but becomes less
    important as time goes on
  • Rating strategy
  • Relative rating based on personal communication
    with our contacts for each tool (subjective
    characteristic)
  • Timely, informative responses result in four or
    more points

15
Portability Characteristics
16
Extensibility
  • Description
  • How easy tool may be extended to support UPC and
    SHMEM
  • Importance rating
  • Critical, tools that cannot be extended for UPC
    and SHMEM are almost useless for us
  • Rating strategy
  • Commercial tools receive zero points
  • Regardless of if export or import functionality
    is available
  • Interoperability covered by another
    characteristic
  • Subjective score based on functionality provided
    by tool
  • Also incorporates quality of code (after quick
    review)

17
Hardware Support
  • Description
  • Number and depth of hardware platforms supported
  • Importance rating
  • Critical, essential for portability
  • Rating strategy
  • Based on our estimate of important architectures
    for UPC and SHMEM
  • Award one point for support of each of the
    following architectures
  • IBM SP (AIX)
  • IBM BlueGene/L
  • AlphaServer (Tru64)
  • Cray X1/X1E (UnicOS)
  • Cray XD1 (Linux w/Cray proprietary interconnect)
  • SGI Altix (Linux w/NUMALink)
  • Generic 64-bit Opteron/Itanium Linux cluster
    support

18
Heterogeneity
  • Description
  • Tool support for running programs across
    different architectures within a single run
  • Importance rating
  • Minor, not very useful on shared-memory machines
  • Rating strategy
  • Five points if heterogeneity is supported
  • Zero points if heterogeneity is not supported

19
Software Support
  • Description
  • Number of languages, libraries, and compilers
    supported
  • Importance rating
  • Important, should support many compilers and not
    hinder library support but hardware support and
    extensibility are more important
  • Rating strategy
  • Score based on relative number of languages,
    libraries, and compilers supported compared with
    other tools
  • Tools that instrument or record data for existing
    closed-source libraries receive an extra point
    (up to max of five points)

20
Scalability Characteristics
21
Filtering and Aggregation
  • Description
  • How well tool is able to provide users with tools
    to simplify and summarize data being displayed
  • Importance rating
  • Critical, necessary for users to effectively work
    with large data sets generated by performance
    tools
  • Rating strategy
  • Scored using relative ratings (slightly
    subjective characteristic)
  • Tools that provide many different ways of
    filtering and aggregating data receive higher
    scores

22
Multiple Executions
  • Description
  • Support for relating and comparing performance
    information from different runs
  • Examples
  • Automated display of speedup charts
  • Differences between time taken for methods using
    different algorithms or variants of a single
    algorithm
  • Importance rating
  • Critical, import for doing scalability analysis
  • Rating strategy
  • Five points if tool supports relating data from
    different runs
  • Zero points if not

23
Performance Bottleneck Detection
  • Description
  • How well tool identifies each known (and unknown)
    bottleneck in our test suite
  • Importance rating
  • Critical, bottleneck detection the most important
    function of a performance tool
  • Rating strategy
  • Score proportional to the number of PASS ratings
    given for test suite programs
  • Slightly subjective characteristic have to guess
    that the user is able to determine bottleneck
    based on data provided by tool

24
Searching
  • Description
  • Ability of the tool to search for particular
    information or events
  • Importance rating
  • Minor, can be useful but difficult to provide
    users with a powerful search that is
    user-friendly
  • Rating strategy
  • Five points if searching is support
  • Points deducted if only simple search available
  • Zero points if no search functionality

25
Miscellaneous Characteristics
26
Cost
  • Description
  • How much (per seat) the tool costs to use
  • Importance rating
  • Important, tools that are prohibitively expensive
    reduce overall availability of tool
  • Rating strategy
  • Scale based on per-seat cost
  • Free five points
  • 1.00 to 499.99 four points
  • 500.00 to 999.99 three points
  • 1,000.00 to 1,999.99 two points
  • 2,000.00 or more one point

27
Interoperability
  • Description
  • How well the tool works and integrates with other
    performance tools
  • Importance rating
  • Important, tools lacking in areas like trace
    visualization can make up for it by exporting
    data that other tools can understand (also
    helpful for getting data from 3rd-party sources)
  • Rating strategy
  • Zero if data cannot be imported or exported from
    tool
  • One point for export of data in a simple ASCII
    format
  • Additional points (up to five) for each format
    the tool can export from and import into
Write a Comment
User Comments (0)
About PowerShow.com