PACT%2098 - PowerPoint PPT Presentation

About This Presentation
Title:

PACT%2098

Description:

Worlton: 'Bandwagon Effect' explains massive parallelism. Bandwagon: A propaganda device by which the purported acceptance of an idea ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 65
Provided by: Gordo8
Category:
Tags: pact | bandwagon

less

Transcript and Presenter's Notes

Title: PACT%2098


1
PACT 98
  • Http//www.research.microsoft.com/barc/gbell/pact.
    ppt

2
What Architectures? Compilers? Run-time
environments? Programming models? Any
Apps?Parallel Architectures and Compilers
TechniquesParis, 14 October 1998
  • Gordon Bell
  • Microsoft

3
Talk plan
  • Where are we today?
  • History predicting the future
  • Ancient
  • Strategic Computing Initiative and ASCI
  • Bell Prize since 1987
  • Apps architecture taxonomy
  • Petaflops when, how, how much
  • New ideas Grid, Globus, Legion
  • Bonus Input to Thursday panel

4
1998 ISVs, buyers, users?
  • Technical supers dying DSM (and SMPs) trying
  • Mainline user ISV apps ported to PCs
    workstations
  • Supers (legacy code) market lives ...
  • Vector apps (e.g ISVs) ported to DSM (SMP)
  • MPI for custom and a few, leading edge ISVs
  • Leading edge, one-of-a-kind apps Clusters of 16,
    256, ...1000s built from uni, SMP, or DSM
  • Commercial mainframes, SMPs (DSMs), and
    clusters are interchangeable (control is the
    issue)
  • Dbase tp SMPs compete with mainframes if
    central control is an issue else clusters
  • Data warehousing may emerge just a Dbase
  • High growth, web and stream servers Clusters
    have the advantage

5
c2000 Architecture Taxonomy
Xpt connected SMPS Xpt-SMPvector Xpt-multithread
(Tera) multi Xpt-multi hybrid DSM- SCI
(commodity) DSM (high bandwidth_ Commodity
multis switches Proprietary multis
switches Proprietary DSMs
mainline
SMP
Multicomputers akaClusters MPP 16-(64)- 10K
processors
mainline
6
TOP500 Technical Systems by Vendor (sans PC and
mainframe clusters)
7
Parallelism of Jobs On NCSA Origin Cluster
20 Weeks of Data, March 16 - Aug 2, 1998 15,028
Jobs / 883,777 CPU-Hrs
by of Jobs
by CPU Delivered
CPUs
8
How are users using the Origin Array?
9
National Academic Community Large Project
Requests September 1998
Over 5 Million NUs Requested
One NU One XMP Processor-Hour
Source National Resource Allocation Committee
10
GB's Estimate of Parallelism in Engineering
Scientific Applications
----scalable multiprocessors-----
PCs WSs
Supers
Clusters aka MPPsaka multicomputers
dusty decks for supers
new or scaled-up apps
log ( apps)
scalar 60
vector 15
Vector // 5
One-ofgtgt// 5
Embarrassingly perfectly parallel 15
granularity degree of coupling (comp./comm.)
11
Application Taxonomy
General purpose, non-parallelizable codes(PCs
have it!) Vectorizable Vectorizable
//able(Supers small DSMs) Hand tuned,
one-ofMPP course grainMPP embarrassingly
//(Clusters of PCs...) DatabaseDatabase/TP Web
Host Stream Audio/Video
Technical
Commercial
If central control rich then IBM or large
SMPs else PC Clusters
12
One procerssor perf. as of Linpack
22
CFD Biomolec. Chemistry Materials QCD
25
19
14
33
26
13
10 Processor Linpack (Gflops) 10 P appsx10 Apps
1 P Linpack Apps 10 P Linpack
14
Ancient history
15
Growth in Computational Resources Used for UK
Weather Forecasting
1010/ 50 yrs 1.5850
  • 10T
  • 1T
  • 100G
  • 10G
  • 1G
  • 100M
  • 10M
  • 1M
  • 100K
  • 10K
  • 1K
  • 100
  • 10

YMP
205
195
KDF9
Mercury
Leo
1950
2000
16
Harvard Mark I aka IBM ASCC
17
I think there is a world market for maybe five
computers.


Thomas Watson Senior, Chairman of IBM, 1943
18
The scientific market is still about that size 3
computers
  • When scientific processing was 100 of the
    industry a good predictor
  • 3 Billion 6 vendors, 7 architectures
  • DOE buys 3 very big (100-200 M) machines every
    3-4 years

19
NCSA Cluster of 6 x 128 processors SGI Origin
20
Our Tax Dollars At WorkASCI for Stockpile
Stewardship
  • Intel/Sandia 9000x1 node Ppro
  • LLNL/IBM 512x8 PowerPC (SP2)
  • LNL/Cray ?
  • Maui Supercomputer Center
  • 512x1 SP2

21
LARC doesnt need 30,000 words! --Von Neumann,
1955.
  • During the review, someone said von Neumann
    was right. 30,000 word was too much IF all the
    users were as skilled as von Neumann ... for
    ordinary people, 30,000 was barely enough! --
    Edward Teller, 1995
  • The memory was approved.
  • Memory solves many problems!

22
Parallel processing computer architectures will
be in use by 1975.

  • Navy Delphi Panel1969

23
In Dec. 1995 computers with 1,000 processors will
do most of the scientific processing.

  • Danny Hillis 1990 (1 paper or 1 company)

24
The Bell-Hillis BetMassive Parallelism in 1995
TMC World-wide Supers
TMC World-wide Supers
TMC World-wide Supers
Applications
Petaflops / mo.
Revenue
25
Bell-Hillis Bet wasnt paid off!
  • My goal was not necessarily to just win the bet!
  • Hennessey and Patterson were to evaluate what was
    really happening
  • Wanted to understand degree of MPP progress and
    programmability

26
DARPA, 1985 Strategic Computing Initiative (SCI)


  • A 50 X LISP machine
  • Tom Knight, Symbolics
  • A 1,000 node multiprocessorA Teraflops by 1995
  • Gordon Bell, Encore





è All of 20 HPCC projects failed!
27
SCI (c1980s) Strategic Computing Initiative
funded
  • ATT/Columbia (Non Von), BBN Labs, Bell
    Labs/Columbia (DADO), CMU Warp (GE Honeywell),
    CMU (Production Systems), Encore, ESL, GE (like
    connection machine), Georgia Tech, Hughes
    (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola
    (Dataflow), MIT Lincoln Labs, Princeton (MMMP),
    Schlumberger (FAIM-1), SDC/Burroughs, SRI
    (Eazyflow), University of Texas, Thinking
    Machines (Connection Machine),

28
Those who gave up their lives in SCIs search for
parallellism
  • Alliant, American Supercomputer, Ametek, AMT,
    Astronautics, BBN Supercomputer, Biin, CDC
    (independent of ETA), Cogent, Culler, Cydrome,
    Dennelcor, Elexsi, ETA, Evans Sutherland
    Supercomputers, Flexible, Floating Point Systems,
    Gould/SEL, IPM, Key, Multiflow, Myrias, Pixar,
    Prisma, SAXPY, SCS, Supertek (part of Cray),
    Suprenum (German National effort), Stardent
    (ArdentStellar), Supercomputer Systems Inc.,
    Synapse, Vitec, Vitesse, Wavetracer.

29
Worlton "Bandwagon Effect"explains massive
parallelism
  • Bandwagon A propaganda device by which the
    purported acceptance of an idea ...is claimed in
    order to win further public acceptance.
  • Pullers vendors, CS community
  • Pushers funding bureaucrats deficit
  • Riders innovators and early adopters
  • 4 flat tires training, system software,
    applications, and "guideposts"
  • Spectators most users, 3rd party ISVs

30
Parallel processing is a constant distance away.
  • Our vision ... is a system of millions of hosts
    in a loose confederation. Users will have the
    illusion of a very powerful desktop computer
    through which they can manipulate objects.
  • Grimshaw, Wulf, et al Legion CACM Jan. 1997




31
Progress"Parallelism is a journey."
Paul Borrill
32
Let us not forgetThe purpose of computing is
insight, not numbers. R. W. Hamming
33
Progress 1987-1998
34
Bell Prize Peak Gflops vs time
35
Bell Prize 1000x 1987-1998
  • 1987 Ncube 1,000 computers showed with more
    memory, apps scaled
  • 1987 Cray XMP 4 proc. _at_200 Mflops/proc
  • 1996 Intel 9,000 proc. _at_200 Mflops/proc 1998 600
    RAP Gflops Bell prize
  • Parallelism gains
  • 10x in parallelism over Ncube
  • 2000x in parallelism over XMP
  • Spend 2- 4x more
  • Cost effect. 5x ECL è CMOS Sram è Dram
  • Moores Law 100x
  • Clock 2-10x CMOS-ECL speed cross-over

36
No more 1000X/decade.We are now (hopefully) only
limited by Moores Law and not limited by memory
access.
1 GF to 10 GF took 2 years 10 GF to 100
GF took 3 years 100 GF to 1 TF took gt5
years 2n1 or 2(n-1)1?
37
Commercial Perf/
38
Commercal Perf.
39
1998 Observations vs1989 Predictions for
technical
  • Got a TFlops PAP 12/1996 vs 1995. Really
    impressive progress! (RAPlt1 TF)
  • More diversity results in NO software!
  • Predicted SIMD, mC, hoped for scalable SMP
  • Got Supers, mCv, mC, SMP, SMP/DSM,SIMD
    disappeared
  • 3B (un-profitable?) industry 10 platforms
  • PCs and workstations diverted users
  • MPP apps DID NOT materialize

40
Observation CMOS supers replaced ECL in Japan
  • 2.2 Gflops vector units have dual use
  • In traditional mPv supers
  • as basis for computers in mC
  • Software apps are present
  • Vector processor out-performs n micros for many
    scientific apps
  • Its memory bandwidth, cache prediction, and
    inter-communication

41
Observation price performance
  • Breaking 30M barrier increases PAP
  • Eliminating state computers increased prices,
    but got fewer, more committed suppliers, less
    variation, and more focus
  • Commodity micros aka Intel are critical to
    improvement. DEC, IBM, and SUN are ??
  • Conjecture supers and MPPs may be equally
    cost-effective despite PAP
  • Memory bandwidth determines performance price
  • You get what you pay for aka theres no free
    lunch

42
Observation MPPs 1, Users lt1
  • MPPs with relatively low speed micros with lower
    memory bandwidth, ran over supers, but didnt
    kill em.
  • Did the U.S. industry enter an abyss?
  • Is crying Unfair trade hypocritical?
  • Are users denied tools?
  • Are users not getting with the program
  • Challenge we must learn to program clusters...
  • Cache idiosyncrasies
  • Limited memory bandwidth
  • Long Inter-communication delays
  • Very large numbers of computers

43
Strong recommendation Utilize in situ
workstations!
  • NoW (Berkeley) set sort record, decrypting
  • Grid, Globus, Condor and other projects
  • Need standard interface and programming model
    for clusters using commodity platforms fast
    switches
  • Giga- and tera-bit links and switches allow
    geo-distributed systems
  • Each PC in a computational environment should
    have an additional 1GB/9GB!

44
Petaflops by 2010

  • DOEAccelerated Strategic Computing Initiative
    (ASCI)

45
DOEs 1997 PathForward Accelerated Strategic
Computing Initiative (ASCI)
  • 1997 1-2 Tflops 100M
  • 1999-2001 10-30 Tflops 200M??
  • 2004 100 Tflops
  • 2010 Petaflops

46

When is a Petaflops possible? What price?

Gordon Bell, ACM 1997
  • Moores Law 100xBut how fast can the clock
    tick?
  • Increase parallelism 10Kgt100K 10x
  • Spend more (100M è 500M) 5x
  • Centralize center or fast network 3x
  • Commoditization (competition) 3x

47
Micros gains if 20, 40, 60 / year
1.E21 1.E18 1.E15 1.E12 1.E 9 1.E6
1995 2005 2015 2025 2035 2045
48
Processor Limit DRAM Gap
Moores Law
  • Alpha 21264 full cache miss / instructions
    executed 180 ns/1.7 ns 108 clks x 4 or 432
    instructions
  • Caches in Pentium Pro 64 area, 88 transistors
  • Taken from Patterson-Keeton Talk to SigMod

49
Five Scalabilities
  • Size scalable -- designed from a few components,
    with no bottlenecks
  • Generation scaling -- no rewrite/recompile is
    required across generations of computers
  • Reliability scaling
  • Geographic scaling -- compute anywhere (e.g.
    multiple sites or in situ workstation sites)
  • Problem x machine scalability -- ability of an
    algorithm or program to exist at a range of sizes
    that run efficiently on a given, scalable
    computer.
  • Problem x machine space gt run time problem
    scale, machine scale (p), run time, implies
    speedup and efficiency,

50
The Law of Massive Parallelism (mine) is based on
application scaling
  • There exists a problem that can be made
    sufficiently large such that any network of
    computers can run efficiently given enough
    memory, searching, work -- but this problem may
    be unrelated to no other.
  • A ... any parallel problem can be scaled to run
    efficiently on an arbitrary network of computers,
    given enough memory and time but it may be
    completely impractical
  • Challenge to theoreticians and tool buildersHow
    well will or will an algorithm run?
  • Challenge for software and programmers Can
    package be scalable portable? Are there models?
  • Challenge to users Do larger scale, faster,
    longer run times, increase problem insight and
    not just total flop or flops?
  • Challenge to funders Is the cost justified?

51
Manyflops for Manybucks what are the goals of
spending?
  • Getting the most flops, independent of how much
    taxpayers give to spend on computers?
  • Building or owning large machines?
  • Doing a job (stockpile stewardship)?
  • Understanding and publishing about parallelism?
  • Making parallelism accessible?
  • Forcing other labs to follow?

52
Petaflops Alternatives c2007-14 from 1994 DOE
Workshop
53
Or more parallelism and use installed machines
  • 10,000 nodes in 1998 or 10x Increase
  • Assume 100K nodes
  • 10 Gflops/10GBy/100GB nodes or low end c2010 PCs
  • Communication is first problem use the network
  • Programming is still the major barrier
  • Will any problems fit it

54
Next, short steps
55
The Alliance LES NT Supercluster
Supercomputer performance at mail-order
prices-- Jim Gray, Microsoft
  • Andrew Chien, CS UIUC--gtUCSD
  • Rob Pennington, NCSA
  • Myrinet Network, HPVM, Fast Msgs
  • Microsoft NT OS, MPI API

192 HP 300 MHz
64 Compaq 333 MHz
56
2D Navier-Stokes Kernel - Performance
Preconditioned Conjugate Gradient Method With
Multi-level Additive Schwarz Richardson
Pre-conditioner
Sustaining 7 GF on 128 Proc. NT Cluster
Danesh Tafti, Rob Pennington, NCSA Andrew Chien
(UIUC, UCSD)
57
The GridBlueprint for a New Computing
InfrastructureIan Foster, Carl Kesselman (Eds),
Morgan Kaufmann, 1999
  • Published July 1998
  • ISBN 1-55860-475-8
  • 22 chapters by expert authors including
  • Andrew Chien,
  • Jack Dongarra,
  • Tom DeFanti,
  • Andrew Grimshaw,
  • Roch Guerin,
  • Ken Kennedy,
  • Paul Messina,
  • Cliff Neuman,
  • Jon Postel,
  • Larry Smarr,
  • Rick Stevens,
  • Charlie Catlett
  • John Toole
  • and many others

A source book for the history of the future --
Vint Cerf
http//www.mkp.com/grids
58
The Grid
  • Dependable, consistent, pervasive access to
  • high-end resources
  • Dependable Can provide performance and
    functionality guarantees
  • Consistent Uniform interfaces to a wide variety
    of resources
  • Pervasive Ability to plug in from anywhere

59
Alliance Grid Technology Roadmap Its just not
flops or records/se
60
Globus Approach
A p p l i c a t i o n s
  • Focus on architecture issues
  • Propose set of core services as basic
    infrastructure
  • Use to construct high-level, domain-specific
    solutions
  • Design principles
  • Keep participation cost low
  • Enable local control
  • Support for adaptation

Diverse global svcs
Core Globus services
Local OS
61
Globus Toolkit Core Services
  • Scheduling (Globus Resource Alloc. Manager)
  • Low-level scheduler API
  • Information (Metacomputing Directory Service)
  • Uniform access to structure/state information
  • Communications (Nexus)
  • Multimethod communication QoS management
  • Security (Globus Security Infrastructure)
  • Single sign-on, key management
  • Health and status (Heartbeat monitor)
  • Remote file access (Global Access to Secondary
    Storage)

62
Summary of some beliefs
  • 1000x increase in PAP has not been accompanied
    with RAP, insight, infrastructure, and use.
  • What was the PACT/?
  • The PC World Challenge is to provide commodity,
    clustered parallelism to commercial and technical
    communities
  • Only comes true of ISVs believe and act
  • Grid etc. using world-wide resources, including
    in situ PCs is the new idea

63
PACT 98
  • Http//www.research.microsoft.com/barc/gbell/pact.
    ppt

64
The end
Write a Comment
User Comments (0)
About PowerShow.com