The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources

Description:

Research: Globus, Legion, NetSolve, Condor, NINF, PUNCH. Commercial: Globus, Avaki, Grid Engine ... Globus, Condor-G, NWS, KX509/KCA. Release every 3 months ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 42
Provided by: lro96
Category:

less

Transcript and Presenter's Notes

Title: The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources


1
  • The Computational Grid Aggregating Performance
    and Enhanced Capability from Federated Resources
  • Rich Wolski
  • University of California, Santa Barbara

2
The Goal
  • To provide a seamless, ubiquitous, and
    high-performance computing environment using a
    heterogeneous collection of networked computers.
  • But there wont be one, big, uniform system
  • Resources must be able to come and go dynamically
  • The base system software supported by each
    resource must remain inviolate
  • Multiple languages and programming paradigms must
    be supported
  • The environment must be secure
  • Programs must run fast
  • For distributed computingThe Holy Grail

3
For Example Richs Computational World
umich.edu
wisc.edu
ameslab.gov
osc.edu
harvard.edu
wellesley.edu
anl.gov
ncsa.edu
ksu.edu
uiuc.edu
lbl.gov
indiana.edu
virginia.edu
ncni.net
utk.edu
ucsb.edu
titech.jp
isi.edu
vu.nl
csun.edu
caltech.edu
utexas.edu
ucsd.edu
npaci.edu
rice.edu
4
Zoom In
CT94
SDSC
IBM SP
HPSS
Desktops
Sun
T-3E
C
The Internet
UCSB
5
The Landscape
  • Heterogeneous
  • Processors X86, SPARC, RS6000, Alpha, MIPS,
    PowerPC, Cray
  • Networks GigE, Myrinet, 100baseT, ATM
  • OS Linux, Solaris, AIX, Unicos, OSX, NT, Windows
  • Dynamically changing
  • Completely dedicated access is impossible gt
    contention
  • Failures, upgrades, reconfigurations, etc.
  • Federated
  • Local administrative policies take precedence
  • Performance?

6
The Computational Grid
  • Vision Application programs plug into the
    system to draw computational power from a
    dynamically changing pool of resources.
  • Electrical Power Grid analogy
  • Power generation facilities computers,
    networks, storage devices, palm tops, databases,
    libraries, etc.
  • Household appliances application programs
  • Scale to national and international levels
  • Grid users (both power producers and application
    consumers) can join and leave the Grid at will.

7
The Shape of Things to Come?
  • Grid Research Adventures
  • Infrastructure
  • Grid Programming
  • State of the Grid Art
  • What do Grids look like today?
  • Interesting developments, trends, and
    prognostications of the Grid future

8
Fundamental Questions
  • How do we build it?
  • software infrastructures
  • policies
  • maintenance, support, accounting, etc.
  • How do we program it?
  • concurrency, synchronization
  • heterogeneity
  • dynamism
  • How do we use it for performance?
  • metrics
  • models

9
General Approach
  • Combine results from distributed operating
    systems, parallel computing, and internet
    computing research domains
  • Remote procedure call/ remote invocation
  • Public/private key encryption
  • Domain decomposition
  • Location independent naming
  • Engineering strategy Implement Grid software
    infrastructure as middleware
  • Allows resource owners maintain ultimate control
    locally over the resources they commit to the
    Grid
  • Permits new resources to be incorporated easily
  • Aids in developing a user community

10
Middleware Research Efforts
  • Globus (I. Foster and K. Kesselman)
  • Collection of independent remote execution and
    naming services
  • Legion (A. Grimshaw)
  • Distributed object-oriented programming
  • NetSolve (J. Dongarra)
  • Multi-language brokered RPC
  • Condor (M. Livny)
  • Idle cycle harvesting
  • NINF (S. Matsuoka)
  • Java-based brokered RPC

11
Commonalities
  • Runtime systems
  • All current infrastructures are implemented as a
    set of run time services
  • Resource is an abstract notion
  • Anything with an API is resource operating
    systems, libraries, databases, hardware devices
  • Support for multiple programming languages
  • legacy codes
  • performance

12
Infrastructure Concerns
  • Leverage emerging distributed technologies
  • Buy it rather than build it
  • Network infrastructure
  • Web services
  • Complexity
  • Performance
  • Installation, configuration, fault-diagnosis
  • Mean time to reconfiguration is probably measured
    in minutes
  • Bringing the Grid down is not an option
  • Who operates it?

13
NPACI
  • National Partnership for Advanced Computational
    Infrastructure
  • high-performance computing for the scientific
    research community
  • Goal Build a production-quality Grid
  • Leverage emerging standards
  • Harden and deploy mature Grid technologies
  • Packaging, configuration, deployment,
    diagnostics, accounting
  • Deliver the Grid to scientists

14
PACI-sized Questions
  • If the national infrastructure is managed as a
    Grid...
  • What resources are attached to it?
  • X86 is certainly plentiful
  • Earth Simulator is certainly expensive
  • Mutithreading is certainly attractive
  • What is the right blend?
  • How are they managed?
  • How long will you wait for your job to get
    through the queue?
  • Accounting
  • What are the units of Grid allocation?

15
Grid Programming
  • Two models
  • Manual Application is explicitly coded to be a
    Grid application
  • Automatic Grid software Gridifies a parallel
    or sequential program
  • Start with the simpler approach build programs
    that can adapt to changing Grid conditions
  • What are the current Grid conditions?
  • Need a way to assess the available performance
  • For example
  • What is the speed of your ethernet?

16
Ethernet Doesnt Have a Speed -- it Has Many
TCP/IP throughput mb/s
17
More Importantly
  • It is not what the speed was, but what the speed
    will be that matters
  • Performance prediction
  • Analytical models remain elusive
  • Statistical models are difficult
  • Whatever models are used, the prediction itself
    needs to be fast

18
The Network Weather Service
  • On-line Grid system that
  • monitors the performance that is available from
    distributed resources
  • forecasts future performance levels using fast
    statistical techniques
  • delivers forecasts on-the-fly dynamically
  • Uses adaptive, non-parametric time series
    analysis models to make short-term predictions
  • Records and reports forecasting error with each
    prediction stream
  • Runs as any user (no privileged access required)
  • Scalable and end-to-end

19
NWS Predictions and Errors
Red NWS Prediction, Black Data
MSE 73.3, FED 8.5 mb/s, MAE 5.8 mb/s
20
Clusters Too
MSE 4089, FED 63 mb/s, MAE 56 mb/s
21
Many Challenges, No Waiting
  • On-line predictions
  • Need it better, faster, cheaper, and more
    accurate
  • Adaptive programming
  • Even if predictions are there they will have
    errors
  • Performance fluctuates at machines speeds, not
    human speeds
  • Which resource to use? When?
  • Can programmers really manage a fluctuating
    abstract machine?

22
GrADS
  • Grid Application Development Software (GrADS)
    Project (K. Kennedy, PI)
  • Investigates Grid programmability
  • Soup-to-nuts integrated approach
  • Compilers, Debuggers, libraries, etc.
  • Automatic Resource Control strategies
  • Selection and Scheduling
  • Resource economies (stability)
  • Performance Prediction and Monitoring
  • Applications and resources
  • Effective Grid simulation
  • Builds upon middleware successes
  • Tested with real applications

23
Four Observations
  • The performance of the Grid middleware and
    services matters
  • Grid fabric must scale even if the individual
    applications do not
  • Adaptivity is critical
  • So far, only short-term performance predictions
    are possible
  • Both application and system must adapt on same
    time scale
  • Extracting performance is really really hard
  • Things happen at machine speeds
  • Complexity is a killer
  • We need more compilation technology

24
Grid Compilers
  • Adaptive compilation
  • Compiler and program preparation environment
    needs to manage complexity
  • The machine for which the compiler is
    optimizing is changing dynamically
  • Challenges
  • Performance of the compiler is important
  • Legacy codes
  • Security?
  • GrADS has broken ground, but there is much more
    to do

25
Grid Research Challenges
  • Four foci characterize Grid problems
  • Heterogeneity
  • Dynamism
  • Federalism
  • Performance
  • Just building the infrastructure makes research
    questions out of previously solved problems
  • Installation
  • Configuration
  • Accounting
  • Grid programming is extremely complex
  • New programming technologies

26
Okay, so where are we now?
27
Rational Exuberance
28
For Example -- TeraGrid
  • Joint effort between
  • San Diego Supercomputer Center (SDSC)
  • National Center for Scientific Applications
    (NCSA)
  • Argonne National Laboratory (ANL)
  • Center for Advanced Computational Research (CACR)
  • Stats
  • 13.6 Teraflops (peak)
  • 600 Terabytes on-line storage
  • 40 gb/s full connectivity, cross country, between
    sites
  • Software Infrastructure is primarily Globus based
  • Funded by NSF last year

29
Non-trivial Endeavor
30
Its Big, but there is Room to Grow
  • Baseline infrastructure
  • IA64 processors running Linux
  • Gigabit ethernet
  • Myrinet
  • The Phone Company
  • Designed to be heterogeneous and extensible
  • Sites have plugged their resources in
  • IBM Blue Horizon
  • SGI Origin
  • Sun Enterprise
  • Convex X and V Class
  • Caves, imersadesks, etc.

31
Middleware Status
  • Several research and commercial infrastructures
    have reached maturity
  • Research Globus, Legion, NetSolve, Condor, NINF,
    PUNCH
  • Commercial Globus, Avaki, Grid Engine
  • By far, the most prevalent Grid infrastructure
    deployed today is Globus

32
Globus on One Slide
  • Grid protocols for resource access, sharing, and
    discovery
  • Grid Security Infrastructure (GSI)
  • Grid Resource Allocation Manager (GRAM)
  • MetaDirectory Service (MDS)
  • Reference implementation of protocols in toolkit
    form

33
Increasing Research Leverage
  • Grid research software artifacts turn out to be
    valuable
  • Much of the extant work is empirical and
    engineering focused
  • Robustness concerns mean that the prototype
    systems need to work
  • Heterogeneity implies the need for portability
  • Open source impetus
  • Need to go from research prototypes to nationally
    available software infrastructure
  • Download, install, run

34
Packaging Efforts
  • NSF Middleware Initiative (NMI)
  • USC/ISI, SDSC, U. Wisc., ANL, NCSA, I2
  • Identifies maturing Grid services and tools
  • Provides support for configuration tools,
    testing, packaging
  • Implements a release schedule and coordination
  • R1 out 8/02
  • Globus, Condor-G, NWS, KX509/KCA
  • Release every 3 months
  • Many more packages slated
  • The NPACkage
  • Use NMI technology for PACI infrastructure

35
State of the Art
  • Dozens of Grid deployments underway
  • Linux cluster technology is the primary COTS
    computing platform
  • Heterogeneity is built in from the start
  • Networks
  • Extant systems
  • Special-purpose devices
  • Globus is the leading Middleware
  • Grid services and software tools reaching
    maturity and mechanisms are in place to maximize
    leverage

36
Whats next?
37
Grid Standards
  • Interoperability is an issue
  • Technology drift is starting to become a problem
  • Protocol zoo is open for business
  • The Global Grid Forum (GGF)
  • Modeled after IETF (e.g working groups)
  • Organized at a much earlier stage of development
    (relatively speaking)
  • Meetings every 4 months
  • Truly an international organization

38
Webification
  • Open Grid Service Architecture (OGSA)
  • The Physiology of the Grid, I. Foster, K.
    Kesselman, J. Nick, S. Tuecke
  • Based on W3C standards (XML, WSDL, WSIL, UDDI,
    etc.)
  • Incorporates web service support for interface
    publication, multiple protocol bindings, and
    local/remote transparency
  • Directly interoperable with Internet-targeted
    hosting environments
  • J2EE, .NET
  • The Vendors are excited

39
Grid_at_Home
  • Entropia (www.entropia.com)
  • Commercial enterprise
  • Peer-2-Peer approach
  • Napster for compute cycles (without the law
    suits)
  • Microsoft PC-based instead of Linux/Unix based
  • More compute leverage -- a lot more
  • Way more configuration support, deployment
    support, fault-management built into the system
  • Proprietary technology
  • Deployed at NPACI on 250 hosts

40
Thanks and Credit
  • organizations
  • NPACI, SDSC, NCSA, The Globus Project (ISI/USC),
    The Legion Project (UVa), UTK, LBL
  • support
  • NSF, NASA, DARPA, USPTO, DOE

41
More Information
http//www.cs.ucsb.edu/rich
  • Entropia
  • http//www.entropia.com
  • Globus
  • http//www.globus.org
  • GrADS
  • http//hipersoft.cs.rice.edu/grads
  • NMI
  • http//www.nsf-middleware.org
  • NPACI
  • http//www.npaci.edu
  • NWS
  • http//nws.cs.ucsb.edu
  • TeraGrid
  • http//www.teragrid.org
Write a Comment
User Comments (0)
About PowerShow.com