NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell - PowerPoint PPT Presentation

About This Presentation
Title:

NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell

Description:

... ( 50 computer systems and 30 years) Stop Duel Use, genetic engineering of State Computers 10+ years: nil pay back, ... 3 year computer generation. – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 32
Provided by: caw7
Category:

less

Transcript and Presenter's Notes

Title: NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell


1
NRC Review Panel on High Performance
Computing11 March 1994Gordon Bell
2
Position
  • Dual use Exploit parallelism with in situ nodes
    networksLeverage WS mP industrial HW/SW/app
    infrastructure!
  • No Teraflop before its time -- its Moore's Law
  • It is possible to help fund computing Heuristics
    from federal funding use (50 computer systems
    and 30 years)
  • Stop Duel Use, genetic engineering of State
    Computers 10 years nil pay back, mono use,
    poor, still to come plan for apps porting to
    monos will also be ineffective -- apps
    must leverage, be cross-platform
    self-sustaininglet "Challenges" choose apps,
    not mono use computers"industry" offers better
    computers these are jeopardizedusers must be
    free to choose their computers, not funders next
    generation State Computers "approach" industry
    10 Tflop ... why?
  • Summary recommendations

3
Principle computingEnvironmentscirca 1994 --gt4
networks tosupportmainframes,minis, UNIX
servers,workstations PCs
mainframes
clusters
mainframes
ASCII PC terminals
IBM propritary mainframe world '50s
POTs net for switching terminals
3270 (PC) terminals
Wide-area inter-site network
clusters
minicomputers
Data comm. worlds
minicomputers
70's mini (prop.) world '90s UNIX mini world
ASCII PC terminals
X terminals
'80s Unix distributed workstations servers world
UNIX Multiprocessor servers operated
as traditional minicomputers
NFS servers
Compute dbase uni- mP servers
UNIX workstations
Token-ring (gateways, bridges, routers, hubs,
etc.) LANs
Ethernet (gateways, bridges, routers, hubs,
etc.) LANs
  • gt4 Interconnect comm. stds
  • POTS 3270 terms.
  • WAN (comm. stds.)
  • LAN (2 stds.)
  • Clusters (prop.)

Late '80s LAN-PC world
Novell NT servers
PCs (DOS, Windows, NT)
4
ComputingEnvironmentscirca 2000
Legacy mainframes minicomputers servers terms
Legacy mainframe minicomputer servers
terminals
Wide-area global ATM network
NT, Windows UNIX person servers
Local global data comm world
ATM Local Area Networks for terminal, PC,
workstation, servers
multicomputers built from multiple simple,
servers
NT, Windows UNIX person servers
Centralized departmental uni- mP
servers (UNIX NT)
also10 - 100 mb/s pt-to-pt Ethernet
Centralized departmental scalable uni- mP
servers (NT UNIX)
???
TCTVPC home ... (CATV or ATM)
NFS, database, compute, print, communication
servers
Platforms X86 PowerPC Sparc etc. Universal
high speed data service using ATM or ??

5
Beyond Dual Duel Use TechnologyParallelism
can must be free!
  • HPCS, corporate RD, and technical users must
    have the goal to design, install and support
    parallel environments using and leveraging
    every in situ workstation multiprocessor server
    as part of the local ... national network.
  • Parallelism is a capability that all computing
    environments can must possess! --not a feature
    to segment "mono use" computers
  • Parallel applications become a way of computing
    utilizing existing, zero cost resources -- not
    subsidy for specialized ad hoc computers
  • Apps follow pervasive computing environments

6
Computer genetic engineering species selection
has been ineffective
  • Although Problem x Machine Scalability using SIMD
    for simulating some physical systems has been
    demonstrated, given extraordinary resources, the
    efficacy of larger problems to justify
    cost-effectiveness has not. Hamming"The
    purpose of computing is insight, not numbers."
  • The "demand side" Challenge users have the
    problems and should be drivers. ARPA's
    contractors should re-evaluate their research in
    light of driving needs.
  • Federally funded "Challenge" apps porting should
    be to multiple platforms including workstations
    compatible, multis that support // environments
    to insure portability and understand main line
    cost-effectiveness
  • Continued "supply side"programs aimed at
    designing, purchasing, supporting, sponsoring,
    porting of apps to specialized, State Computers,
    including programs aimed at 10 Tflops, should be
    re-directed to networked computing.
  • User must be free to choose and buy any computer,
    including PCs WSs, WS Clusters, multiprocessor
    servers, supercomputers, mainframes, and even
    highly distributed, coarse grain, data parallel,
    MPP State computers.

7
Performance (t)
The teraflops

Bell Prize
8
We get no Teraflop before it's time it's
Moore's Law!
  • Flops f(t,), not f(t) technology plans
    e.g. BAA 94-08 ignores s!
  • All Flops are not equal (peak announced
    performance-PAP or real app perf. -RAP)
  • FlopsCMOSPAPlt C x 1.6(1992-t) x C 128 x
    106 flops / 30,000
  • FlopsRAP FlopsPAP x 0.5 for real apps, 1/2 PAP
    is a great goal
  • Flopssupers FlopsCMOS x 0.1 improvement of
    supers 15-40/year higher cost is f(need for
    profitability, lack of subsidies, volume, SRAM)
  • 92'-94' FlopsPAP/ 4K Flopssupers/500
    Flopsvsp/ 50 M (1.6G_at_25)
  • Assumes primary secondary memory size costs
    scale with time memory 50/MB in 1992-1994
    violates Moore's Law disks 1/MB in1993, size
    must continue to increases at 60 / year
  • When does a Teraflop arrive if only 30 million
    is spent on a super?
  • 1 TflopCMOS PAP in 1996 (x7.8) with 1 GFlop
    nodes!!! or 1997 if RAP
  • 10 TflopCMOS PAP will be reached in 2001 (x78)
    or 2002 if RAP
  • How do you get a teraflop earlier?
  • A 60 - 240 million Ultracomputer reduces the
    time by 1.5 - 4.5 years.

9
Funding Heuristics(50 computers 30 years of
hindsight)
  • 1. Demand side works i.e., we need this
    product/technology for x Supply side doesn't
    work! Field of Dreams" build it and they will
    come.
  • 2. Direct funding of university research
    resulting in technology and product prototypes
    that is carried over to startup a company is the
    most effective. -- provided the right person
    team are backed with have a transfer avenue. a.
    Forest Baskett gt Stanford to fund various
    projects (SGI, SUN, MIPS) b. Transfer to large
    companies has not been effective c. Government
    labs... rare, an accident if something emerges
  • 3. A demanding tolerant customer or user who
    "buys" products works best to influence and
    evolve products (e.g., CDC, Cray, DEC, IBM, SGI,
    SUN) a. DOE labs have been effective buyers and
    influencers, "Fernbach policy" unclear if labs
    are effective product or apps or process
    developers b. Universities were effective at
    influencing computing in timesharing, graphics,
    workstations, AI workstations, etc. c. ARPA,
    per se, and its contractors have not demonstrated
    a need for flops. d. Universities have failed
    ARPA in defining work that demands HPCS -- hence
    are unlikely to be very helpful as users in the
    trek to the teraflop.
  • 4. Direct funding of large scale projects" is
    risky in outcome, long-term, training, and other
    effects. ARPAnet established an industry after
    it escaped BBN!

10
Funding Heuristics-2
  • 5. Funding product development, targeted
    purchases, and other subsidies to establish
    "State Companies"in a vibrant and overcrowded
    market is wasteful, likely to be wrong , likely
    to impede computer development, (e.g. by having
    to feed an overpopulated industry). Furthermore,
    it is likely to have a deleterious effect on a
    healthy industry (e.g. supercomputers).
  • A significantly smaller universe of computing
    environments is needed. Cray IBM are given
    SGI is probably the most profitable technical
    HP/Convex are likely to be a contender, others
    (e.g., DEC) are trying. No state co (intel,TMC,
    Tera) is likely to be profitable hence
    self-sustaining.
  • 6. "University-Company collaboration is a new
    area of government RD. So far it hasn't
    worked nor is it likely to, unless the company
    invests. Appears to be a way to help company
    fund marginal people and projects.
  • 7. CRADAs or co-operative research and
    development agreement are very closely allied to
    direct product development and are equally likely
    to be ineffective.
  • 8. Direct subsidy of software apps or the porting
    of apps to one platform, e.g., EMI analysis are
    a way to keep marginal computers afloat. If
    government funds apps, they must be ported
    cross-platform!
  • 9. Encourage the use of computers across the
    board, but discourage designs from those who have
    not used or built a successful computer.

11
Scalability The Platform of HPCS why continued
funding is unnecessary
  • Mono use aka MPPs have been, are, and will be
    doomed
  • The law of scalability
  • Four scalabilities machine, problem x machine,
    generation (t), now spatial
  • How do flops, memory size, efficiency time vary
    with problem size? Does insight increase with
    problem size?
  • What's the nature of problems work for monos?
  • What about the mapping of problems onto monos?
  • What about the economics of software to support
    monos?
  • What about all the competitive machines? e.g.
    workstations, workstation clusters, supers,
    scalable multis, attached P?

12
Special, mono-use MPPs are doomed...no matter
how much fedspend!
  • Special because it has non-standard nodes
    networks -- with no apps Having not evolved to
    become mainline -- events have over-taken them.
  • It's special purpose if it's only in Dongarra's
    Table 3. Flop rate, execution time, and memory
    size vs problem size shows limited applicability
    to very large scale problems that must be scaled
    to cover the inherent, high overhead.
  • Conjecture a properly used supercomputer will
    provide greater insight and utility because of
    the apps and generality -- running more,
    smaller sized problems with a plan produces more
    insight
  • The problem domain is limited now they have to
    compete with supers -- do scalars, fine
    grain, and work and have apps workstations --
    do very long grain, are in situ and have
    apps workstation clusters -- have identical
    characteristics and have apps low priced (2
    million) multis -- are superior i.e., shorter
    grain and have appsscalable multiprocessors --
    formed from multis are in design stage
  • Mono useful (gtgt//) -- hence, are illegal because
    they are not dual use Duel use -- only useful
    to keep a high budget in tact e.g., 10 TF

13
The Law of Massive Parallelism isbased on
application scale
  • There exists a problem that can be made
    sufficiently large such that any network of
    computers can run efficiently given enough
    memory, searching, work -- but this problem may
    be unrelated to no other problem.
  • A ... any parallel problem can be scaled to run
    on an arbitrary network of computers, given
    enough memory and time
  • Challenge to theoreticians How well will an
    algorithm run?
  • Challenge for software Can package be scalable
    portable?
  • Challenge to users Do larger scale, faster,
    longer run times, increase problem insight and
    not just flops?
  • Challenge to HPCC Is the cost justified? if so
    let users do it!

14
Scalabilities
  • Size scalable computers are designed from a few
    components, with no bottleneck component.
  • Generation scalable computers can be implemented
    with the next generation technology with No
    rewrite/recompile
  • Problem x machine scalability - ability of a
    problem, algorithm, or program to exist at a
    range of sizes so that it can be run efficiently
    on a given, scalable computer.
  • Although large scale problems allow high flops,
    large probs running longer may not produce more
    insight.
  • Spatial scalability -- ability of a computer to
    be scaled over a large physical space to use in
    situ resources.

15
Linpack rate in Gflopsvs Matrix Order
???
16
Linpack Solution timevs Matrix Order
17
GB's Estimate of Parallelism in Engineering
Scientific Applications
----scalable multiprocessors-----
Supers
massive mCs WSs
WSs
log ( of apps)
dusty decks for supers
new or scaled-up apps
scalar 60
vector 15
mP (lt8) vector 5
gtgt// 5
embarrassingly or perfectly parallel 15
granularity degree of coupling (comp./comm.)
18
MPPs are only for unique,very large scale, data
parallel apps
M
100 . . . 10 . . . 1 . . . .1 . . . .01
mono use
s
s
s
s
s
s
gtgt//
gtgt//
mP
mP
mP
mP
mP
mP
WS
WS
WS
WS
WS
WS
Scalar vector vector mP data // emb. //
gp work viz apps
Application characterization
19
Applicability of varioustechnical computer
alternatives
  • Domain PCWS Multi servr SC Mfrm gtgt// WS
    Clusters
  • scalar 1 1 2 na 1
  • vector 2 2 1 3 2
  • vect.mP na 2 1 3 na
  • data // na 1 2 1 1
  • ep inf.// 1 2 3 2 1
  • gp wrkld 3 1 1 na 2
  • vizualiz'n 1 na na na 1
  • apps 1 1 1 na from WS
  • Current micros are weak, but improving rapidly
    such that subsequent gtgt//s that use them will
    have no advantage for node vectorization

20
Performance using distributedcomputers depends
on problem machine granularity
  • Berkeley's log(p) model characterizes granularity
    needs to be understood, measured, and used
  • Three parameters are given in terms of processing
    ops
  • l latency -- delay time to communicate between
    apps
  • o overhead -- time lost transmitting messages
  • g gap - 1 / message-passing rate ( bandwidth)
    - time between messages
  • p number of processors

21
GranularityNomograph
x
22
x
23
Economics of Packaged Software
  • Platform Cost Leverage copies
  • MPP gt100K 1 1-10 copies
  • Minis, mainframe 10-100K 10-100 1000s
    copies also, evolving high performance
    multiprocessor servers
  • Workstation 1-100K 1-10K 1-100K copies
  • PC 25-500 50K-1M 1-10M copies

24
Chuck Seitz commentson multicomputers
  • I believe that the commercial, medium grained
    multicomputers aimed at ultra-supercomputer
    performance have adopted a relatively
    unprofitable scaling track, and are doomed to
    extinction. ... they may as Gordon Bell believes
    be displaced over the next several years by
    shared memory multiprocessors. ... For loosely
    coupled computations at which they excel,
    ultra-super multicomputers will, in any case, be
    more economically implemented as networks of
    high-performance workstations connected by
    high-bandwidth, local area networks...

25
Convergence to a single architecturewith a
single address spacethat uses a distributed,
shared memory
  • limited (lt20) scalability multiprocessors gtgt
    scalable multiprocessors
  • workstations with 1-4 processors gtgt workstation
    clusters scalable multiprocessors
  • workstation clusters gtgt scalable multiprocessors
  • State Computers built as message passing
    multicomputers gtgt scalable multiprocessors

26
Convergence to one architecture
mPs continue to be the main line
27
Re-engineering HPCS
  • Genetic engineering of computers has not produced
    a healthy strain that lives more than one, 3
    year computer generation. Hence no app base can
    form. No inter-generational, MPPs exist with
    compatible networks nodes. All parts of an
    architecture must scale from generation to
    generation! An archecture must be designed for
    at least three, 3 year generations!
  • High price to support a DARPA U. to learn
    computer design -- the market is only 200
    million and RD is billions-- competition works
    far better
  • Inevitable movement of standard networks and
    nodes can or need not be accelerated, these best
    evolve by a normal market mechanism through
    driven by users
  • Dual use of Networks Nodes is the path to
    widescale parallelism, not weird computers
  • Networking is free via ATM
  • Nodes are free via in situ workstationsApps
    follow pervasive computing environments
  • Applicability was small and getting smaller very
    fast with many experienced computer companies
    entering the market with fine products e.g.
    Convex/HP, Cray, DEC, IBM, SGI SUN that are
    leveraging their RD, apps, apps, apps
  • Japan has a strong supercomputer industry. The
    more we jeprodize ours by mandating use of weird
    machines that take away from use, the weaker it
    becomes.
  • MPP won, mainstream vendors have adopted multiple
    CMOS. Stop funding!
  • environments apps are needed, but are unlikely
    because the market is small

28
Recommendations to HPCS
  • Goal By 2000, massive parallelism must exist as
    a by-products that leverages a widescale
    national network workstation/multi HW/SW nodes
  • Dual use not duel use of products and technology
    or the principle of "elegance" -one part serves
    more than one function network companies supply
    networks, node suppliers use ordinary
    workstations/servers with existing apps will
    leverage 30 billion x 106 RD
  • Fund high speed, low latency, networks for a
    ubiquitous service as the base of all forms of
    interconnections from WANs to supercomputers (in
    addition, some special networks will exist for
    small grain probs)
  • Observe heuristics in future federal program
    funding scenarios ... eliminate direct or
    indirect product development and mono-use
    computers Fund Challenges who in turn fund
    purchase, not product development
  • Funding or purchase of apps porting must be
    driven by Challenges, but builds on binary
    compatible workstation/server apps to leverage
    nodes be cross-platform based to benefit multiple
    vendors have cross-platform use
  • Review effectiveness of State Computers e.g.,
    need, economics, efficacy Each committee member
    might visit 2-5 sites using a gtgt// computer
  • Review // program environments the efficacy to
    produce support apps
  • Eliminate all forms of State Computers
    recommend a balanced HPCS program nodes
    networks based on industrial infrastructure stop
    funding the development of mono computers,
    including the 10Tflopit must be acceptable
    encouraged to buy any computer for any contract

29
Gratis advice for HPCC BS
  • D. Bailey warns that scientists have almost lost
    credibility....
  • Focus on Gigabit NREN with low overhead
    connections that will enable multicomputers as a
    by-product
  • Provide many small, scalable computers vs large,
    centralized
  • Encourage (revert to) support not so grand
    challenges
  • Grand Challenges (GCs) need explicit goals
    plans --disciplines fund manage (demand
    side)... HPCC will not
  • Fund balanced machines/efforts stop starting
    Viet Nams
  • Drop the funding directed purchase of state
    computers
  • Revert to university research -gt company
    product development
  • Review the HPCC GCs program's output ...
  • High Performance Cash Conscriptor Big
    Spenders

30
Disclaimer
  • This talk may appear inflammatory ... i.e. the
    speaker may have appeared "to flame".
  • It is not the speaker's intent to make ad hominem
    attacks on people, organizations, countries, or
    computers ... it just may appear that way.

31
Scalability The Platform of HPCS
  • The law of scalability
  • Three kinds machine, problem x machine,
    generation (t)
  • How do flops, memory size, efficiency time vary
    with problem size?
  • What's the nature problems work for the
    computers?
  • What about the mapping of problems onto the
    machines?
Write a Comment
User Comments (0)
About PowerShow.com