HPCS OSC Slides - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

HPCS OSC Slides

Description:

Andy Funk. Theresa Meuse. Bill Mann. Jeremy Kepner. Mark Mitchell. Stefan Seefeld. John Eaton ... Class 1: Best Performance (4 awards) $500 Certificate. Class ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 31
Provided by: ashokkris
Category:
Tags: hpcs | osc | funk | slides

less

Transcript and Presenter's Notes

Title: HPCS OSC Slides


1
HPCS Benchmarks andTest and SpecificationEnviron
ments
Ashok Krishnamurthy Ohio Supercomputer Center
HPCS Workshop SC05 14 Nov. 2005
2
Outline of Talk
  • HPCS Benchmarks
  • HPC Challenge
  • Synthetic Scalable Compact Applications (SSCAs)
  • Overview of Benchmarks and Implementations
  • HPCS Test Environment
  • QMTest features
  • HPCS specific additions

3
Contributors

David Bader Kamesh Madduri
Piotr Luszczek
John Gilbert Viral Shah
4
HPCS Benchmark Spectrum
3. 3. Signal Processing Knowledge Formation
X
X
X
5
HPC Challenge Benchmarks
  • http//icl.cs.utk.edu/hpcc/
  • Benchmarks
  • HPL
  • DGEMM
  • STREAM
  • PTRANS
  • RandomAccess
  • FFTE
  • Comm. bandwidth latency

6
HPC Challenge
7
HPC Challenge Awards
  • Class 1 Best Performance (4 awards)
  • 500 Certificate
  • Class 2 Most Productivity
  • 1,500 Certificate
  • 2005 HPCchallenge Award BOFTuesday, 11/15/05,
    1115-1215 PM
  • Room 602-604
  • Sponsored by HPCWire

8
Synthetic Scalable Compact Applications The
Vision
  • To bridge the gap between scalable synthetic
    kernel benchmarks and (non-scalable) real
    applications ? an important future benchmarking
    tool
  • Must be representative of actual workloads within
    an application while not being numerically
    rigorous
  • memory access characteristics
  • communications characteristics
  • I/O characteristics
  • etc.
  • Will have no limits on the distribution to
    vendors and universities
  • Scalable Synthetic Compact Applications (SSCAs)
    will try to represent the wide spectrum of
    potential HPCS Mission Partner applications

9
SSCAs The Goal
  • Building on a motivation slide from Fred
    Johnson(15 January 2004)

NextGen Apps
Full Apps
HPCS Compact Apps
APP SIZE/COMPLEXITY
Identify which dimensions that must be examined
at full complexity and which dimensions that can
be examined at reduced scale while providing
understanding of both full applications today and
future applications
Micro BMKs
SYSTEM SIZE/COMPLEXITY
10
SSCA 1 Bioinformatics
Intent
  • To develop a scalable synthetic compact
    application that has multiple analysis techniques
    (multiple kernels) identifying similarities
    between sequences of symbols
  • Symbols can be identical, closely related, or
    entirely different
  • A symbol in one sequence can match a gap in
    another
  • Each of the five kernels is based on an
    application from bioinformatics, including
  • Local alignment
  • Searching for similarities
  • Global alignment
  • Multiple alignment
  • Each kernel operates on either the original
    sequences, the results of the previous kernel, or
    both
  • To be entirely integer and character based
  • Except for incidental statistics

11
SSCA 1 Status
Protein Alignment
  • Written and Serial Executable Specification v0.6
    has been released
  • Components
  • Scalable Data Generator
  • Kernel 1 Local Alignment
  • Kernel 2 Sequence Extraction
  • Kernel 3 Sequence Matching
  • Kernel 4 Global Alignment
  • Kernel 5 Multiple Alignment
  • Seeking comment/feedback from the community

Protein in 3D
12
SSCA 2 Graph Analysis
Intent
  • To develop a scalable synthetic compact
    application that has multiple analysis techniques
    (multiple kernels) accessing a single data
    structure representing a directed asymmetric
    weighted multigraph with no self loops
  • In addition to a kernel to construct the graph
    from the input tuple list, there will be three
    computational kernels to operate on the graph
  • Each of the kernels will require irregular access
    to the graphs data structures
  • No single data layout will be optimal for all
    three computational kernels
  • To be entirely integer and character based
  • Except for statistics

13
SSCA 2 Status
  • Written and Serial Executable Specification v1.0
    has been released
  • Over 1200 lines of well commented MATLAB code
  • Also works with Octave 2.9.0
  • Carefully picked functional breakdown, data
    structures, variable names, and comments
  • Accompanying documentation
  • Written Specification, and slides
  • MANIFEST.txt list of files with brief
    description
  • README.txt installation and run time
    instructions code overview
  • KNOWN_ISSUES.txt known issues in current
    release
  • parallelization.txt Design notes on
    parallelization issues

14
Georgia Techs shared-memory CDavid A. Bader
Kamesh Madduri
Execution times of various kernels, and Relative
Speedup Plots
15
Cray MTA-2 John Feo
  • Shared-Memory, Multithreaded
  • 1039 Source Lines (35 fewer lines than in
    Baders shared memory version)
  • Execution Time
  • 1M vertices, 1 processor 32.47 s.
  • 1M vertices, 8 processors 4.48 s.
  • Speedup of 7.24
  • 3X faster on 1P than Bader shared-memory
  • 7X faster on 8P than Bader shared-memory

16
More SSCA2 Implementations
  • MatlabP implementation
  • Alan Edelman, John Gilbert et al.
  • Largest Problem Size solved
  • 67M vertices, 500M edges
  • IBM X10 Implementation (in progress)
  • David A. Bader and Mehmet F. Su
  • Uses JAVA port of SPRNG 2.0 pseudo-random number
    generation suite
  • Enhanced X10 array based adjacency list storage
    for graph structures
  • Generic implementation for clusters
  • Bader and Madduri

17
SSCA 2 Movie from John Gilbert and Viral Shah
  • URL is http//csc.cs.ucsb.edu/ssca/ssca.mov

18
OpenMP Contest
  • The OpenMP ARB chose to use SSCA2 as a basis for
    a programming contest because it was reasonably
    well specified and had two independent
    shared-memory implementations (Bader/Madduri and
    Feo).
  • URL http//www.openmp.org/drupal/sc05/omp-contest
    .htm
  • First prize 1000 plus a 60GB iPod.
  • Second prize 500 plus a 4GB iPod nano.
  • Third prize 250 plus a 1GB iPod shuffle.
  • Larry Meadows, for the OpenMP ARB
    lawrence.f.meadows_at_intel.com
  • Prizes will be announced at OpenMP BOF
  • Wednesday 11/16/05, 515PM 645PM
  • Room 6A

19
SSCA 3 Sensor Processing, Knowledge Formation
and File I/O
Intent
  • SSCA 3 Focuses on two stages
  • Front end image processing and storage
  • Back end image retrieval and knowledge formation
  • Two stages is representative of many areas
  • Medical imaging (e.g. tumor growth)
  • Image many patients daily
  • Later compare images of same patient over time
  • Astronomical image processing (e.g. monitor
    supernovae)
  • Image many regions of the sky daily
  • Later compare images of a region over time
  • Reconnaissance monitoring (e.g. enemy movement)
  • Image many areas daily
  • Later compare images of a given region over time
  • Benchmark has a significant file IO component

20
SSCA 3 Status
SAR Image from Kernel 1
  • Written and Serial Executable Specification v0.7
    has been released
  • Components
  • Synthetic Scalable Data Generator
  • Kernel 1 SAR Image Formation
  • Kernel 2 Image Storage
  • Kernel 3 Image Retrieval
  • Kernel 4 Detection
  • Validation
  • Seeking comment/feedback from the community

System Diagram
21
Benchmark Summary andComputational Challenges
Front-End Sensor Processing
Back-End Knowledge Formation
  • Pulse compression
  • Polar Interpolation
  • FFT, IFFT (corner turn)
  • Sequential store
  • Non-sequential
  • retrieve
  • Large small IO
  • Large Images
  • difference
  • Threshold
  • Many small
  • correlations on
  • random pieces of
  • large image
  • Scalable synthetic data generation

ISC has finished v0.7 C version of SSCA3.
22
Benchmarks Implementations
http//www.highproductivity.org
23
Test Environment Goals
  • Model the relationship between application
    performance and developer effort for various HPC
    programming languages
  • Develop an automated method of collecting
    software metric and runtime data for various HPC
    benchmark codes
  • Automatically generate a large number of tests
    based on relatively few user-specified parameters

HPCS Workflow Model
QMTest HPC Benchmark Analysis
Existing Codes Analysis
Development Time Experiments
24
Test Environment
  • QMTest from CodeSourcery
  • Open source software
  • Work done as part of HPCS effort is being folded
    into mainstream QMTest release

25
QMTest Reporting
NPB Test Database
Parsing rules regexp
Benchmark Database QMTest Extension
QMTest
NPB QMTest Extension
R Script Plots Results
xml2csv.py Parses report Collates results
Test Report ltxmlgt
Context File (platform-specific)
Benchmark Code e.g. NAS Parallel Benchmarks
26
Using QMTest for HPCS
Benchmark Integrators Step 1 Create Test
Database
HPC System Users Step 2 Specify Platform
Configuration
Benchmark Test Database
Benchmark output parsing rules
Context file (compilers, libraries, etc.)
Step 4 Automated Analysis
Step 3 Run Tests
(Excel, Matlab, R, etc.)
Test Report ltXMLgt
Benchmark Results ltCSVgt
generates
Parsing Script
run
QMTest
27
QMTest Reporting
  • QMTest report (excerpt from 2500 line report)
  • ltresult id"npb.ser.cg.s" kind"test"
    outcome"FAIL"gt
  • ltannotation name"ExecTest.stdout"gt
  • quotltpregt Lines Blank Cmnts NCSL
    TPtoks

  • 1031 194 341 501 3252
    /home/afunk/Projects/HPCS/NPB3.2/NPB3.2-SER/CG/cg.
    f (FORTRAN)
  • 97 11 0 86 495
    /home/afunk/Projects/HPCS/NPB3.2/NPB3.2-SER/CG/glo
    bals.h (C)
  • 34 0 0 34 180
    /home/afunk/Projects/HPCS/NPB3.2/NPB3.2-SER/CG/npb
    params.h (C)
  • 131 11 0 120 675 ----- C -----
    (2 files)
  • 1031 194 341 501 3252 ----- FORTRAN
    ----- (1 file)
  • 1162 205 341 621 3927 TOTAL
    (3 files)
  • xml2csv output
  • Test, NCSL
  • npb.ser.bt.s, 2576
  • npb.ser.cg.s, 621
  • npb.ser.ep.s, 179
  • npb.ser.ft.s, 625
  • npb.ser.is.s,
  • npb.ser.lu.s, 2612
  • npb.ser.mg.s, 906
  • npb.ser.sp.s, 2168

28
QMTest Reporting
  • gt qmtest run npb.ser.s ? run tests to count NCSL
  • gt qmtest report o npb.ser.s.xml results.qmr ?
    generate report
  • gt xml2csv npb.ser.s.xml gt npb.ser.s.csv ? parse
    report to cull data
  • gt NCSL_benchmark.R ? plot data using R script

29
Test Environment Status
  • QMTest can run multiple tests with varying
    parameters, e.g. all permutations of NPB 500
    tests
  • QMTest can automate collection of software
    metrics and performance data SLOC, complexity,
    runtime, profiling
  • QMTest has been used successfully to test various
    HPC benchmark suites
  • NPB, HPChallenge, SSCA 2
  • Example parameter sets for the above are
    available
  • The sclc tool has been modified to recognize and
    count SLOC for
  • Ada, Assembly, Awk, C, C, Eiffel, FORTRAN,
    Java, Lisp, MATLAB, Octave, Pascal, Perl,
    Tcl, ZPL, shell, make,

30
Test Environment Planned Additions
  • QMTest parallel enhancements
  • GUI for parameterized tests
  • Output processing for easing further analysis
  • Token counting for selected languages
Write a Comment
User Comments (0)
About PowerShow.com