Title: NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell
1NRC Review Panel on High Performance
Computing11 March 1994Gordon Bell
2Position
- Dual use Exploit parallelism with in situ nodes
networksLeverage WS mP industrial HW/SW/app
infrastructure! - No Teraflop before its time -- its Moore's Law
- It is possible to help fund computing Heuristics
from federal funding use (50 computer systems
and 30 years) - Stop Duel Use, genetic engineering of State
Computers 10 years nil pay back, mono use,
poor, still to come plan for apps porting to
monos will also be ineffective -- apps
must leverage, be cross-platform
self-sustaininglet "Challenges" choose apps,
not mono use computers"industry" offers better
computers these are jeopardizedusers must be
free to choose their computers, not funders next
generation State Computers "approach" industry
10 Tflop ... why? - Summary recommendations
3Principle computingEnvironmentscirca 1994 --gt4
networks tosupportmainframes,minis, UNIX
servers,workstations PCs
mainframes
clusters
mainframes
ASCII PC terminals
IBM propritary mainframe world '50s
POTs net for switching terminals
3270 (PC) terminals
Wide-area inter-site network
clusters
minicomputers
Data comm. worlds
minicomputers
70's mini (prop.) world '90s UNIX mini world
ASCII PC terminals
X terminals
'80s Unix distributed workstations servers world
UNIX Multiprocessor servers operated
as traditional minicomputers
NFS servers
Compute dbase uni- mP servers
UNIX workstations
Token-ring (gateways, bridges, routers, hubs,
etc.) LANs
Ethernet (gateways, bridges, routers, hubs,
etc.) LANs
- gt4 Interconnect comm. stds
- POTS 3270 terms.
- WAN (comm. stds.)
- LAN (2 stds.)
- Clusters (prop.)
Late '80s LAN-PC world
Novell NT servers
PCs (DOS, Windows, NT)
4ComputingEnvironmentscirca 2000
Legacy mainframes minicomputers servers terms
Legacy mainframe minicomputer servers
terminals
Wide-area global ATM network
NT, Windows UNIX person servers
Local global data comm world
ATM Local Area Networks for terminal, PC,
workstation, servers
multicomputers built from multiple simple,
servers
NT, Windows UNIX person servers
Centralized departmental uni- mP
servers (UNIX NT)
also10 - 100 mb/s pt-to-pt Ethernet
Centralized departmental scalable uni- mP
servers (NT UNIX)
???
TCTVPC home ... (CATV or ATM)
NFS, database, compute, print, communication
servers
Platforms X86 PowerPC Sparc etc. Universal
high speed data service using ATM or ??
5Beyond Dual Duel Use TechnologyParallelism
can must be free!
- HPCS, corporate RD, and technical users must
have the goal to design, install and support
parallel environments using and leveraging
every in situ workstation multiprocessor server
as part of the local ... national network. - Parallelism is a capability that all computing
environments can must possess! --not a feature
to segment "mono use" computers - Parallel applications become a way of computing
utilizing existing, zero cost resources -- not
subsidy for specialized ad hoc computers - Apps follow pervasive computing environments
6Computer genetic engineering species selection
has been ineffective
- Although Problem x Machine Scalability using SIMD
for simulating some physical systems has been
demonstrated, given extraordinary resources, the
efficacy of larger problems to justify
cost-effectiveness has not. Hamming"The
purpose of computing is insight, not numbers." - The "demand side" Challenge users have the
problems and should be drivers. ARPA's
contractors should re-evaluate their research in
light of driving needs. - Federally funded "Challenge" apps porting should
be to multiple platforms including workstations
compatible, multis that support // environments
to insure portability and understand main line
cost-effectiveness - Continued "supply side"programs aimed at
designing, purchasing, supporting, sponsoring,
porting of apps to specialized, State Computers,
including programs aimed at 10 Tflops, should be
re-directed to networked computing. - User must be free to choose and buy any computer,
including PCs WSs, WS Clusters, multiprocessor
servers, supercomputers, mainframes, and even
highly distributed, coarse grain, data parallel,
MPP State computers.
7Performance (t)
The teraflops
Bell Prize
8We get no Teraflop before it's time it's
Moore's Law!
- Flops f(t,), not f(t) technology plans
e.g. BAA 94-08 ignores s! - All Flops are not equal (peak announced
performance-PAP or real app perf. -RAP) - FlopsCMOSPAPlt C x 1.6(1992-t) x C 128 x
106 flops / 30,000 - FlopsRAP FlopsPAP x 0.5 for real apps, 1/2 PAP
is a great goal - Flopssupers FlopsCMOS x 0.1 improvement of
supers 15-40/year higher cost is f(need for
profitability, lack of subsidies, volume, SRAM) - 92'-94' FlopsPAP/ 4K Flopssupers/500
Flopsvsp/ 50 M (1.6G_at_25) - Assumes primary secondary memory size costs
scale with time memory 50/MB in 1992-1994
violates Moore's Law disks 1/MB in1993, size
must continue to increases at 60 / year - When does a Teraflop arrive if only 30 million
is spent on a super? - 1 TflopCMOS PAP in 1996 (x7.8) with 1 GFlop
nodes!!! or 1997 if RAP - 10 TflopCMOS PAP will be reached in 2001 (x78)
or 2002 if RAP - How do you get a teraflop earlier?
- A 60 - 240 million Ultracomputer reduces the
time by 1.5 - 4.5 years.
9Funding Heuristics(50 computers 30 years of
hindsight)
- 1. Demand side works i.e., we need this
product/technology for x Supply side doesn't
work! Field of Dreams" build it and they will
come. - 2. Direct funding of university research
resulting in technology and product prototypes
that is carried over to startup a company is the
most effective. -- provided the right person
team are backed with have a transfer avenue. a.
Forest Baskett gt Stanford to fund various
projects (SGI, SUN, MIPS) b. Transfer to large
companies has not been effective c. Government
labs... rare, an accident if something emerges - 3. A demanding tolerant customer or user who
"buys" products works best to influence and
evolve products (e.g., CDC, Cray, DEC, IBM, SGI,
SUN) a. DOE labs have been effective buyers and
influencers, "Fernbach policy" unclear if labs
are effective product or apps or process
developers b. Universities were effective at
influencing computing in timesharing, graphics,
workstations, AI workstations, etc. c. ARPA,
per se, and its contractors have not demonstrated
a need for flops. d. Universities have failed
ARPA in defining work that demands HPCS -- hence
are unlikely to be very helpful as users in the
trek to the teraflop. - 4. Direct funding of large scale projects" is
risky in outcome, long-term, training, and other
effects. ARPAnet established an industry after
it escaped BBN!
10Funding Heuristics-2
- 5. Funding product development, targeted
purchases, and other subsidies to establish
"State Companies"in a vibrant and overcrowded
market is wasteful, likely to be wrong , likely
to impede computer development, (e.g. by having
to feed an overpopulated industry). Furthermore,
it is likely to have a deleterious effect on a
healthy industry (e.g. supercomputers). - A significantly smaller universe of computing
environments is needed. Cray IBM are given
SGI is probably the most profitable technical
HP/Convex are likely to be a contender, others
(e.g., DEC) are trying. No state co (intel,TMC,
Tera) is likely to be profitable hence
self-sustaining. - 6. "University-Company collaboration is a new
area of government RD. So far it hasn't
worked nor is it likely to, unless the company
invests. Appears to be a way to help company
fund marginal people and projects. - 7. CRADAs or co-operative research and
development agreement are very closely allied to
direct product development and are equally likely
to be ineffective. - 8. Direct subsidy of software apps or the porting
of apps to one platform, e.g., EMI analysis are
a way to keep marginal computers afloat. If
government funds apps, they must be ported
cross-platform! - 9. Encourage the use of computers across the
board, but discourage designs from those who have
not used or built a successful computer.
11Scalability The Platform of HPCS why continued
funding is unnecessary
- Mono use aka MPPs have been, are, and will be
doomed - The law of scalability
- Four scalabilities machine, problem x machine,
generation (t), now spatial - How do flops, memory size, efficiency time vary
with problem size? Does insight increase with
problem size? - What's the nature of problems work for monos?
- What about the mapping of problems onto monos?
- What about the economics of software to support
monos? - What about all the competitive machines? e.g.
workstations, workstation clusters, supers,
scalable multis, attached P?
12Special, mono-use MPPs are doomed...no matter
how much fedspend!
- Special because it has non-standard nodes
networks -- with no apps Having not evolved to
become mainline -- events have over-taken them. - It's special purpose if it's only in Dongarra's
Table 3. Flop rate, execution time, and memory
size vs problem size shows limited applicability
to very large scale problems that must be scaled
to cover the inherent, high overhead. - Conjecture a properly used supercomputer will
provide greater insight and utility because of
the apps and generality -- running more,
smaller sized problems with a plan produces more
insight - The problem domain is limited now they have to
compete with supers -- do scalars, fine
grain, and work and have apps workstations --
do very long grain, are in situ and have
apps workstation clusters -- have identical
characteristics and have apps low priced (2
million) multis -- are superior i.e., shorter
grain and have appsscalable multiprocessors --
formed from multis are in design stage - Mono useful (gtgt//) -- hence, are illegal because
they are not dual use Duel use -- only useful
to keep a high budget in tact e.g., 10 TF
13The Law of Massive Parallelism isbased on
application scale
- There exists a problem that can be made
sufficiently large such that any network of
computers can run efficiently given enough
memory, searching, work -- but this problem may
be unrelated to no other problem. - A ... any parallel problem can be scaled to run
on an arbitrary network of computers, given
enough memory and time - Challenge to theoreticians How well will an
algorithm run? - Challenge for software Can package be scalable
portable? - Challenge to users Do larger scale, faster,
longer run times, increase problem insight and
not just flops? - Challenge to HPCC Is the cost justified? if so
let users do it!
14Scalabilities
- Size scalable computers are designed from a few
components, with no bottleneck component. - Generation scalable computers can be implemented
with the next generation technology with No
rewrite/recompile - Problem x machine scalability - ability of a
problem, algorithm, or program to exist at a
range of sizes so that it can be run efficiently
on a given, scalable computer. - Although large scale problems allow high flops,
large probs running longer may not produce more
insight. - Spatial scalability -- ability of a computer to
be scaled over a large physical space to use in
situ resources.
15Linpack rate in Gflopsvs Matrix Order
???
16Linpack Solution timevs Matrix Order
17GB's Estimate of Parallelism in Engineering
Scientific Applications
----scalable multiprocessors-----
Supers
massive mCs WSs
WSs
log ( of apps)
dusty decks for supers
new or scaled-up apps
scalar 60
vector 15
mP (lt8) vector 5
gtgt// 5
embarrassingly or perfectly parallel 15
granularity degree of coupling (comp./comm.)
18MPPs are only for unique,very large scale, data
parallel apps
M
100 . . . 10 . . . 1 . . . .1 . . . .01
mono use
s
s
s
s
s
s
gtgt//
gtgt//
mP
mP
mP
mP
mP
mP
WS
WS
WS
WS
WS
WS
Scalar vector vector mP data // emb. //
gp work viz apps
Application characterization
19Applicability of varioustechnical computer
alternatives
- Domain PCWS Multi servr SC Mfrm gtgt// WS
Clusters - scalar 1 1 2 na 1
- vector 2 2 1 3 2
- vect.mP na 2 1 3 na
- data // na 1 2 1 1
- ep inf.// 1 2 3 2 1
- gp wrkld 3 1 1 na 2
- vizualiz'n 1 na na na 1
- apps 1 1 1 na from WS
- Current micros are weak, but improving rapidly
such that subsequent gtgt//s that use them will
have no advantage for node vectorization
20Performance using distributedcomputers depends
on problem machine granularity
- Berkeley's log(p) model characterizes granularity
needs to be understood, measured, and used - Three parameters are given in terms of processing
ops - l latency -- delay time to communicate between
apps - o overhead -- time lost transmitting messages
- g gap - 1 / message-passing rate ( bandwidth)
- time between messages - p number of processors
21GranularityNomograph
x
22x
23Economics of Packaged Software
- Platform Cost Leverage copies
- MPP gt100K 1 1-10 copies
- Minis, mainframe 10-100K 10-100 1000s
copies also, evolving high performance
multiprocessor servers - Workstation 1-100K 1-10K 1-100K copies
- PC 25-500 50K-1M 1-10M copies
24Chuck Seitz commentson multicomputers
- I believe that the commercial, medium grained
multicomputers aimed at ultra-supercomputer
performance have adopted a relatively
unprofitable scaling track, and are doomed to
extinction. ... they may as Gordon Bell believes
be displaced over the next several years by
shared memory multiprocessors. ... For loosely
coupled computations at which they excel,
ultra-super multicomputers will, in any case, be
more economically implemented as networks of
high-performance workstations connected by
high-bandwidth, local area networks...
25Convergence to a single architecturewith a
single address spacethat uses a distributed,
shared memory
- limited (lt20) scalability multiprocessors gtgt
scalable multiprocessors - workstations with 1-4 processors gtgt workstation
clusters scalable multiprocessors - workstation clusters gtgt scalable multiprocessors
- State Computers built as message passing
multicomputers gtgt scalable multiprocessors
26Convergence to one architecture
mPs continue to be the main line
27Re-engineering HPCS
- Genetic engineering of computers has not produced
a healthy strain that lives more than one, 3
year computer generation. Hence no app base can
form. No inter-generational, MPPs exist with
compatible networks nodes. All parts of an
architecture must scale from generation to
generation! An archecture must be designed for
at least three, 3 year generations! - High price to support a DARPA U. to learn
computer design -- the market is only 200
million and RD is billions-- competition works
far better - Inevitable movement of standard networks and
nodes can or need not be accelerated, these best
evolve by a normal market mechanism through
driven by users - Dual use of Networks Nodes is the path to
widescale parallelism, not weird computers - Networking is free via ATM
- Nodes are free via in situ workstationsApps
follow pervasive computing environments - Applicability was small and getting smaller very
fast with many experienced computer companies
entering the market with fine products e.g.
Convex/HP, Cray, DEC, IBM, SGI SUN that are
leveraging their RD, apps, apps, apps - Japan has a strong supercomputer industry. The
more we jeprodize ours by mandating use of weird
machines that take away from use, the weaker it
becomes. - MPP won, mainstream vendors have adopted multiple
CMOS. Stop funding! - environments apps are needed, but are unlikely
because the market is small
28Recommendations to HPCS
- Goal By 2000, massive parallelism must exist as
a by-products that leverages a widescale
national network workstation/multi HW/SW nodes - Dual use not duel use of products and technology
or the principle of "elegance" -one part serves
more than one function network companies supply
networks, node suppliers use ordinary
workstations/servers with existing apps will
leverage 30 billion x 106 RD - Fund high speed, low latency, networks for a
ubiquitous service as the base of all forms of
interconnections from WANs to supercomputers (in
addition, some special networks will exist for
small grain probs) - Observe heuristics in future federal program
funding scenarios ... eliminate direct or
indirect product development and mono-use
computers Fund Challenges who in turn fund
purchase, not product development - Funding or purchase of apps porting must be
driven by Challenges, but builds on binary
compatible workstation/server apps to leverage
nodes be cross-platform based to benefit multiple
vendors have cross-platform use - Review effectiveness of State Computers e.g.,
need, economics, efficacy Each committee member
might visit 2-5 sites using a gtgt// computer - Review // program environments the efficacy to
produce support apps - Eliminate all forms of State Computers
recommend a balanced HPCS program nodes
networks based on industrial infrastructure stop
funding the development of mono computers,
including the 10Tflopit must be acceptable
encouraged to buy any computer for any contract
29Gratis advice for HPCC BS
- D. Bailey warns that scientists have almost lost
credibility.... - Focus on Gigabit NREN with low overhead
connections that will enable multicomputers as a
by-product - Provide many small, scalable computers vs large,
centralized - Encourage (revert to) support not so grand
challenges - Grand Challenges (GCs) need explicit goals
plans --disciplines fund manage (demand
side)... HPCC will not - Fund balanced machines/efforts stop starting
Viet Nams - Drop the funding directed purchase of state
computers - Revert to university research -gt company
product development - Review the HPCC GCs program's output ...
- High Performance Cash Conscriptor Big
Spenders
30Disclaimer
- This talk may appear inflammatory ... i.e. the
speaker may have appeared "to flame". - It is not the speaker's intent to make ad hominem
attacks on people, organizations, countries, or
computers ... it just may appear that way.
31Scalability The Platform of HPCS
- The law of scalability
- Three kinds machine, problem x machine,
generation (t) - How do flops, memory size, efficiency time vary
with problem size? - What's the nature problems work for the
computers? - What about the mapping of problems onto the
machines?