Blue Waters and the Future of High Performance Computing Marc Snir - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Blue Waters and the Future of High Performance Computing Marc Snir

Description:

Petascale = 250K -- 1M threads. Need algorithms with massive levels of parallelism ... A screen has only one pixel per thread. www.informatics.uiuc.edu. 17 ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 21
Provided by: mingli
Category:

less

Transcript and Presenter's Notes

Title: Blue Waters and the Future of High Performance Computing Marc Snir


1
Blue Waters and the Future of High Performance
ComputingMarc Snir
2
Outline
  • How a Supercomputer looks like in gt 2010
  • Cannot go into the details of Blue Waters
    configuration
  • What are the main challenges to making good use
    of such systems

3
Large NSF Funded Supercomputers Beyond 2010
  • One Petascale platform -- Blue Waters at NCSA, U
    Illinois
  • Sustained performance petaflop range
  • Memory petabyte range
  • Disk 10s petabytes
  • Archival storage exabyte range
  • Power Megawatts
  • Price 200M (not including building, operation,
    application development)
  • Multiple 1/4 scale platforms at various
    universities
  • Available to NSF-funded grand challenge teams
    on a competitive basis

4
The Uniprocessor Crisis
  • Manufacturers cannot increase clock rate anymore
    (power problem)
  • Computer architects have run out of productive
    ideas on how to use more transistors to increase
    single thread performance
  • Diminishing return on caches
  • Diminishing return on instruction-level
    parallelism
  • Increased processor performance will come only
    from the increase on number of cores per chip
  • Petascale 250K -- 1M threads
  • Need algorithms with massive levels of parallelism

5
Average Processors Top 500 System
6
Mileage is Less than Advertised
nominal IPC
Instruction per cycle, frequent item mining
(M Wei)
7
Its the Memory, Stupid
PC Balance (word operands from memory per flop)
Seem stuck at 110 ratio
(source McAlpin)
8
The Memory Wall and Palliatives
  • The problem
  • Memory bandwidth is limited (cost)
  • Queue of pending loads has limited depth
    (performance)
  • Compilers cannot issue enough concurrent loads to
    fill the memory pipe
  • Compilers cannot issue loads early enough to
    avoid stalls
  • Solutions
  • Multicore and vector operations -- to fill the
    pipe Simultaneous multithreading -- to tolerate
    latency
  • Need even higher levels of parallelism!

9
Solutions to the Memory Wall
  • Caching and locality
  • Need algorithms with good locality
  • Split communication
  • Memory prefetch (local memory)
  • Put/get (remote memory)
  • Need programmed communication to local and remote
    memory (not necessarily message-passing)
  • N.B. Compute power is essentially free you pay
    for storing and moving data
  • Peak FLOPs are a silly measure of performance
  • A computer that achieves a high fraction of its
    peak flop rate is ill-designed

10
Global Communication
  • Under software control
  • Remote loads too expensive
  • Global coherence too expensive
  • Software means userlibrary now (MPI) can mean
    compilerhw in 201x.
  • But programmer has to manage locality
  • Probably moving from 2-sided communication to
    1-sided communication (put/get)
  • May have hw accelerators for global operations
    (e.g., global barriers)

11
I/O
  • Parallel file system
  • Optimized for the case where 10K-100K processes
    share 1 file
  • unfortunately, users often open multiple files
    per process
  • File system is logically shared, physically
    distributed
  • May need 2 parity disks

12
Supercomputer vs. Cluster -- is it Merely a
Matter of Size?
  • All systems use commodity processors
  • Size matters need denser packaging, and higher
    quality components
  • Supercomputers use more expensive server
    technology and more advanced cooling
  • Need more scalable switch, with higher bw and
    lower latency
  • Proprietary interface, vs. NIC on I/O bus
  • Need parallel file system
  • Cluster file systems still have a way to go
  • Need very robust, out of band, system control
    infrastructure

13
Do we Need Petascale Systems?
  • Yes, every self-respecting science engineering
    discipline has a roadmap explaining why it needs
    petascale performance, and beyond
  • Two buzzwords Multiphysics, multiscale
  • Do we need this performance now (rather than in
    2020, when it will be much cheaper)?
  • Yes, many simulations have high potential
    societal impact (health, energy, global warming)
  • Note since programming for petascale performance
    will not be significantly easier in 2020,
    scientists (who do not pay for compute time) have
    no incentive to wait

14
Do We Have Applications with Sufficient
Parallelism?
  • Probably -- we have analyzed in detail plausible
    applications as part of the Track 1 competition
  • But simple benchmark applications are not the
    same as complete applications of scientific
    interest
  • Solve larger problem
  • Easy, but not always needed
  • Increase resolution (finer mesh)
  • Parallelism increases (by k3)
  • Number of iteration also
    increases (by
    k)
  • May be limited by accuracy of
    initial
    conditions
  • Increase complexity of simulation
  • Hard

15
Will Codes be Ready in 201x?
  • Likelythere are sufficiently many research
    groups who want to be ready
  • Main obstacle NSF underestimates cost of
    application development
  • Will programming be any easier than it is now?
  • No High Performance Computing is about
    performance programming this is hard, even on
    uniprocessors
  • Problem is inherently hard, and there are no good
    tools for performance tuning
  • Market for HPC software is too small
  • Practically no independent HPC software companies

16
Are we Making Progress on Software Productivity
for HPC?
  • Not muchthere is significant focus on new
    languages for HPC -- most likely misplaced
  • Not clear that parallelism requires new
    high-level constructs frameworks built with
    existing OO languages do hide parallelism
  • Not clear that small HPC market can justify
    unique languages
  • New languages needed for performance, not for
    raising level of abstraction
  • Good tools for debugging and performance tuning
    are missing
  • A screen has only one pixel per thread

17
Will Blue Water Stay Up Long Enough to Complete
Long Computations?
  • Of course not -- MTBF is measured in days, if we
    are lucky
  • Any petascale application must do periodic
    checkpointing and have restart code
  • Also needed for splitting long computations into
    multiple submissions, possibly on different
    machines, and for checking computation evolution
  • User checkpoint, not system checkpoint
  • Optimal checkpoint intervals depends on system
    MTBF, checkpoint overhead and recovery overhead
  • Low MTBF means low machine utilization but is not
    a disaster (assuming that file system is
    reliable)

18
System Utilization as function of MTBF
Optimal Checkpoint interval
hours
100
System utilization
50
MTBF (hours)
  • Assume 5 min checkpoint, 15 min recovery and
    optimal checkpoint frequency

19
Now that any Processor has Multiple Cores, is
High Performance Computing Getting out of the
Ghetto?
  • No By definition, a supercomputer is at the
    bleeding-edge
  • Different concerns, different scales, different
    communities
  • 1-100 way parallelism vs. 100k-1M way parallelism
  • Tightly coupled shared memory vs. distributed
    memory
  • 100B industry vs. 1B industry
  • Good sw (e.g. Windows) vs. lousy sw
  • Programming for the masses vs. programming for
    the elite
  • Expert environments vs. Joe Shmoe environment
  • I/O and network intensive vs. compute intensive
  • Reactive vs. transformational

20
Summary
  • Fitzgerald "The very rich are different from you
    and me." Hemingway "Yeahthey have more money.
  • Supercomputing is different from normal
    computing they have bigger machines
  • P.S. The famous Scott-Ernest dialogue apparently
    never took place. Oh well
Write a Comment
User Comments (0)
About PowerShow.com