The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources

About This Presentation

Title:

The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources

Description:

Research: Globus, Legion, NetSolve, Condor, NINF, PUNCH. Commercial: Globus, Avaki, Grid Engine ... Globus, Condor-G, NWS, KX509/KCA. Release every 3 months ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 42

Provided by: lro96

Category:

more less

Transcript and Presenter's Notes

Title: The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources

1

The Computational Grid Aggregating Performance
and Enhanced Capability from Federated Resources
Rich Wolski
University of California, Santa Barbara

2
The Goal

To provide a seamless, ubiquitous, and
high-performance computing environment using a
heterogeneous collection of networked computers.
But there wont be one, big, uniform system
Resources must be able to come and go dynamically
The base system software supported by each
resource must remain inviolate
Multiple languages and programming paradigms must
be supported
The environment must be secure
Programs must run fast
For distributed computingThe Holy Grail

3
For Example Richs Computational World
umich.edu
wisc.edu
ameslab.gov
osc.edu
harvard.edu
wellesley.edu
anl.gov
ncsa.edu
ksu.edu
uiuc.edu
lbl.gov
indiana.edu
virginia.edu
ncni.net
utk.edu
ucsb.edu
titech.jp
isi.edu
vu.nl
csun.edu
caltech.edu
utexas.edu
ucsd.edu
npaci.edu
rice.edu
4
Zoom In
CT94
SDSC
IBM SP
HPSS
Desktops
Sun
T-3E
C
The Internet
UCSB
5
The Landscape

Heterogeneous
Processors X86, SPARC, RS6000, Alpha, MIPS,
PowerPC, Cray
Networks GigE, Myrinet, 100baseT, ATM
OS Linux, Solaris, AIX, Unicos, OSX, NT, Windows
Dynamically changing
Completely dedicated access is impossible gt
contention
Failures, upgrades, reconfigurations, etc.
Federated
Local administrative policies take precedence
Performance?

6
The Computational Grid

Vision Application programs plug into the
system to draw computational power from a
dynamically changing pool of resources.
Electrical Power Grid analogy
Power generation facilities computers,
networks, storage devices, palm tops, databases,
libraries, etc.
Household appliances application programs
Scale to national and international levels
Grid users (both power producers and application
consumers) can join and leave the Grid at will.

7
The Shape of Things to Come?

Grid Research Adventures
Infrastructure
Grid Programming
State of the Grid Art
What do Grids look like today?
Interesting developments, trends, and
prognostications of the Grid future

8
Fundamental Questions

How do we build it?
software infrastructures
policies
maintenance, support, accounting, etc.
How do we program it?
concurrency, synchronization
heterogeneity
dynamism
How do we use it for performance?
metrics
models

9
General Approach

Combine results from distributed operating
systems, parallel computing, and internet
computing research domains
Remote procedure call/ remote invocation
Public/private key encryption
Domain decomposition
Location independent naming
Engineering strategy Implement Grid software
infrastructure as middleware
Allows resource owners maintain ultimate control
locally over the resources they commit to the
Grid
Permits new resources to be incorporated easily
Aids in developing a user community

10
Middleware Research Efforts

Globus (I. Foster and K. Kesselman)
Collection of independent remote execution and
naming services
Legion (A. Grimshaw)
Distributed object-oriented programming
NetSolve (J. Dongarra)
Multi-language brokered RPC
Condor (M. Livny)
Idle cycle harvesting
NINF (S. Matsuoka)
Java-based brokered RPC

11
Commonalities

Runtime systems
All current infrastructures are implemented as a
set of run time services
Resource is an abstract notion
Anything with an API is resource operating
systems, libraries, databases, hardware devices
Support for multiple programming languages
legacy codes
performance

12
Infrastructure Concerns

Leverage emerging distributed technologies
Buy it rather than build it
Network infrastructure
Web services
Complexity
Performance
Installation, configuration, fault-diagnosis
Mean time to reconfiguration is probably measured
in minutes
Bringing the Grid down is not an option
Who operates it?

13
NPACI

National Partnership for Advanced Computational
Infrastructure
high-performance computing for the scientific
research community
Goal Build a production-quality Grid
Leverage emerging standards
Harden and deploy mature Grid technologies
Packaging, configuration, deployment,
diagnostics, accounting
Deliver the Grid to scientists

14
PACI-sized Questions

If the national infrastructure is managed as a
Grid...
What resources are attached to it?
X86 is certainly plentiful
Earth Simulator is certainly expensive
Mutithreading is certainly attractive
What is the right blend?
How are they managed?
How long will you wait for your job to get
through the queue?
Accounting
What are the units of Grid allocation?

15
Grid Programming

Two models
Manual Application is explicitly coded to be a
Grid application
Automatic Grid software Gridifies a parallel
or sequential program
Start with the simpler approach build programs
that can adapt to changing Grid conditions
What are the current Grid conditions?
Need a way to assess the available performance
For example
What is the speed of your ethernet?

16
Ethernet Doesnt Have a Speed -- it Has Many
TCP/IP throughput mb/s
17
More Importantly

It is not what the speed was, but what the speed
will be that matters
Performance prediction
Analytical models remain elusive
Statistical models are difficult
Whatever models are used, the prediction itself
needs to be fast

18
The Network Weather Service

On-line Grid system that
monitors the performance that is available from
distributed resources
forecasts future performance levels using fast
statistical techniques
delivers forecasts on-the-fly dynamically
Uses adaptive, non-parametric time series
analysis models to make short-term predictions
Records and reports forecasting error with each
prediction stream
Runs as any user (no privileged access required)
Scalable and end-to-end

19
NWS Predictions and Errors
Red NWS Prediction, Black Data
MSE 73.3, FED 8.5 mb/s, MAE 5.8 mb/s
20
Clusters Too
MSE 4089, FED 63 mb/s, MAE 56 mb/s
21
Many Challenges, No Waiting

On-line predictions
Need it better, faster, cheaper, and more
accurate
Adaptive programming
Even if predictions are there they will have
errors
Performance fluctuates at machines speeds, not
human speeds
Which resource to use? When?
Can programmers really manage a fluctuating
abstract machine?

22
GrADS

Grid Application Development Software (GrADS)
Project (K. Kennedy, PI)
Investigates Grid programmability
Soup-to-nuts integrated approach
Compilers, Debuggers, libraries, etc.
Automatic Resource Control strategies
Selection and Scheduling
Resource economies (stability)
Performance Prediction and Monitoring
Applications and resources
Effective Grid simulation
Builds upon middleware successes
Tested with real applications

23
Four Observations

The performance of the Grid middleware and
services matters
Grid fabric must scale even if the individual
applications do not
Adaptivity is critical
So far, only short-term performance predictions
are possible
Both application and system must adapt on same
time scale
Extracting performance is really really hard
Things happen at machine speeds
Complexity is a killer
We need more compilation technology

24
Grid Compilers

Adaptive compilation
Compiler and program preparation environment
needs to manage complexity
The machine for which the compiler is
optimizing is changing dynamically
Challenges
Performance of the compiler is important
Legacy codes
Security?
GrADS has broken ground, but there is much more
to do

25
Grid Research Challenges

Four foci characterize Grid problems
Heterogeneity
Dynamism
Federalism
Performance
Just building the infrastructure makes research
questions out of previously solved problems
Installation
Configuration
Accounting
Grid programming is extremely complex
New programming technologies

26
Okay, so where are we now?
27
Rational Exuberance
28
For Example -- TeraGrid

Joint effort between
San Diego Supercomputer Center (SDSC)
National Center for Scientific Applications
(NCSA)
Argonne National Laboratory (ANL)
Center for Advanced Computational Research (CACR)
Stats
13.6 Teraflops (peak)
600 Terabytes on-line storage
40 gb/s full connectivity, cross country, between
sites
Software Infrastructure is primarily Globus based
Funded by NSF last year

29
Non-trivial Endeavor
30
Its Big, but there is Room to Grow

Baseline infrastructure
IA64 processors running Linux
Gigabit ethernet
Myrinet
The Phone Company
Designed to be heterogeneous and extensible
Sites have plugged their resources in
IBM Blue Horizon
SGI Origin
Sun Enterprise
Convex X and V Class
Caves, imersadesks, etc.

31
Middleware Status

Several research and commercial infrastructures
have reached maturity
Research Globus, Legion, NetSolve, Condor, NINF,
PUNCH
Commercial Globus, Avaki, Grid Engine
By far, the most prevalent Grid infrastructure
deployed today is Globus

32
Globus on One Slide

Grid protocols for resource access, sharing, and
discovery
Grid Security Infrastructure (GSI)
Grid Resource Allocation Manager (GRAM)
MetaDirectory Service (MDS)
Reference implementation of protocols in toolkit
form

33
Increasing Research Leverage

Grid research software artifacts turn out to be
valuable
Much of the extant work is empirical and
engineering focused
Robustness concerns mean that the prototype
systems need to work
Heterogeneity implies the need for portability
Open source impetus
Need to go from research prototypes to nationally
available software infrastructure
Download, install, run

34
Packaging Efforts

NSF Middleware Initiative (NMI)
USC/ISI, SDSC, U. Wisc., ANL, NCSA, I2
Identifies maturing Grid services and tools
Provides support for configuration tools,
testing, packaging
Implements a release schedule and coordination
R1 out 8/02
Globus, Condor-G, NWS, KX509/KCA
Release every 3 months
Many more packages slated
The NPACkage
Use NMI technology for PACI infrastructure

35
State of the Art

Dozens of Grid deployments underway
Linux cluster technology is the primary COTS
computing platform
Heterogeneity is built in from the start
Networks
Extant systems
Special-purpose devices
Globus is the leading Middleware
Grid services and software tools reaching
maturity and mechanisms are in place to maximize
leverage

36
Whats next?
37
Grid Standards

Interoperability is an issue
Technology drift is starting to become a problem
Protocol zoo is open for business
The Global Grid Forum (GGF)
Modeled after IETF (e.g working groups)
Organized at a much earlier stage of development
(relatively speaking)
Meetings every 4 months
Truly an international organization

38
Webification

Open Grid Service Architecture (OGSA)
The Physiology of the Grid, I. Foster, K.
Kesselman, J. Nick, S. Tuecke
Based on W3C standards (XML, WSDL, WSIL, UDDI,
etc.)
Incorporates web service support for interface
publication, multiple protocol bindings, and
local/remote transparency
Directly interoperable with Internet-targeted
hosting environments
J2EE, .NET
The Vendors are excited

39
Grid_at_Home

Entropia (www.entropia.com)
Commercial enterprise
Peer-2-Peer approach
Napster for compute cycles (without the law
suits)
Microsoft PC-based instead of Linux/Unix based
More compute leverage -- a lot more
Way more configuration support, deployment
support, fault-management built into the system
Proprietary technology
Deployed at NPACI on 250 hosts

40
Thanks and Credit

organizations
NPACI, SDSC, NCSA, The Globus Project (ISI/USC),
The Legion Project (UVa), UTK, LBL
support
NSF, NASA, DARPA, USPTO, DOE

41
More Information
http//www.cs.ucsb.edu/rich

Entropia
http//www.entropia.com
Globus
http//www.globus.org
GrADS
http//hipersoft.cs.rice.edu/grads
NMI
http//www.nsf-middleware.org
NPACI
http//www.npaci.edu
NWS
http//nws.cs.ucsb.edu
TeraGrid
http//www.teragrid.org

Write a Comment

User Comments (0)

About PowerShow.com

The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources - PowerPoint PPT Presentation

The Computational Grid: Aggregating Performance and Enhanced Capability from Federated Resources

Research: Globus, Legion, NetSolve, Condor, NINF, PUNCH. Commercial: Globus, Avaki, Grid Engine ... Globus, Condor-G, NWS, KX509/KCA. Release every 3 months ... – PowerPoint PPT presentation