Title: The Grid: Opportunities, Achievements, and Challenges for (Computer) Science
1The GridOpportunities, Achievements, and
Challenges for (Computer) Science
- Ian Foster
- Argonne National Laboratory
- University of Chicago
- Globus Alliance
- www.mcs.anl.gov/foster
2Abstract
- Grid technologies and infrastructure support
the integration of services and resources within
and among enterprises, and thus allow new
approaches to problem solving and interaction
within distributed, multi-organizational
collaborations. Sustained effort by computer
scientists and application developers has
resulted in the creation of a substantial open
source technology, numerous infrastructure
deployments, a vibrant international community,
and significant application success stories.
Application communities are now working to deploy
and apply these technologies more broadly, and
thus we encounter ever more challenging
requirements for scale, functionality, and
robustness. In this talk, I seek to define the
nature of the opportunities, achievements, and
challenges that underlie this work. I describe
the current state and likely evolution of the
core technologies, focusing in particular on the
Open Grid Services Architecture (OGSA), which
integrates Grid technologies with emerging Web
services standards. I discuss the implications of
these developments for science, engineering, and
industry, and present some of the lessons learned
within large projects that apply the
technologies. I also examine the opportunities
and challenges that Grid deployments and
applications present for computer scientists.
3The Grid
- Resource sharing coordinated problem solving
in dynamic, multi-institutional virtual
organizations
- Enable integration of distributed resources
- Using general-purpose protocols infrastructure
- To achieve useful qualities of service
4The Grid PhenomenonAn Abbreviated History
- Early 90s
- Gigabit testbeds, metacomputing
- Mid to late 90s
- Early experiments (e.g., I-WAY), academic
software (Globus, Condor, Legion), experiments - 2003
- Hundreds of application communities projects in
scientific and technical computing - Major infrastructure deployments
- Open source technology Globus Toolkit, etc.
- Global Grid Forum 2000 people, 30 countries
- Growing industrial adoption
5Context (1)Dramatic Technological Evolution
- Ubiquitous Internet 100 million hosts
- Collaboration resource sharing the norm
- Ultra-high-speed networks 10 Gb/s
- Global optical networks
- Enormous quantities of data Petabytes
- For an increasing number of communities, gating
step is not collection but analysis - Huge quantities of computing 100 Top/s
- Ubiquitous computing via clusters
- Moores law everywhere 1000x/decade
- Instruments, detectors, sensors, scanners
6Context (1)Technological Evolution
- Internet revolution 100M hosts
- Collaboration sharing the norm
- Universal Moores law x103/10 yrs
- Sensors as well as computers
- Petascale data tsunami
- Gating step is analysis
- our old infrastructure?
7Context (2)A Powerful New Three-way Alliance
Requires much engineering and innovation
Changes culture, mores, andbehaviours
CS as the new mathematics George Djorgovski
8Context (3)New Commercial Computing Models
computing utility or GRID
virtual data center
value
programmable data center
UDC
grid-enabled systems
- Utility computing
- On-demand
- Service-orientation
- Virtualization
Tru64, HP-UX, Linux
clusters
Open VMS clusters, TruCluster, MC ServiceGuard
today
shared, traded resources
(Based on a slide from HP)
9Problem-Driven, Collaborative Research Methodology
Deploy
Build
Deploy
Global Community
Apply
Apply
Design
Apply
Apply
Analyze
10Problem-Driven, Collaborative Research Methodology
Infra- structure
Software, Standards
Deploy
Build
Deploy
Global Community
Apply
Apply
Design
Apply
Apply
Analyze
Computer Science
Discipline Advances
11Resource/Service Integrationas a Fundamental
Challenge
12Scale Metrics Participants, Data, Tasks,
Performance, Interactions,
13Profound Technical Challenges
- How do we, in dynamic, scalable,
multi-institutional, computationally data-rich
settings - Negotiate manage trust
- Access integrate data
- Construct reuse workflows
- Plan complex computations
- Detect recover from failures
- Capture share knowledge
- Represent enforce policies
- Achieve end-to-end QoX
- Move data rapidly reliably
- Support collaborative work
- Define primitive protocols
- Build reusable software
- Package deliver software
- Deploy operate services
- Operate infrastructure
- Upgrade infrastructure
- Perform troubleshooting
- Etc., etc., etc.
14Grid Technology RDSeeks to Identify Enabling
Mechanisms
- Infrastructure (middleware) for establishing,
managing, and evolving multi-organizational
federations - Dynamic, autonomous, domain independent
- On-demand, ubiquitous access to computing, data,
and services - Mechanisms for creating and managing workflow
within such federations - New capabilities constructed dynamically and
transparently from distributed services - Service-oriented, virtualization
15Grid as Computer Science Integrator and
Contributor
Computer supported collaborative work
Networking, security, distributed systems
Grid technologies and applications
Databases and knowledge representation
Compilers, algorithms, formal methods
16Grid as Computer Science Integrator and
Contributor
Computer supported collaborative work
Networking, security, distributed systems
Grid technologies and applications
Databases and knowledge representation
Compilers, algorithms, formal methods
17Computer Science Contributions
- Protocols and/or tools for use in dynamic,
scalable, multi-institutional, computationally
data-rich settings for - Large-scale distributedsystem architecture
- Cross-org authentication
- Scalable community-based policy enforcement
- Robust scalable discovery
- Wide-area scheduling
- High-performance, robust, wide-area data
management - Knowledge-based workflow generation
- High-end collaboration
- Resource service virtualization
- Distributed monitoring manageability
- Application development
- Wide area fault tolerance
- Infrastructure deployment management
- Resource provisioning quality of service
- Performance monitoring modeling
18GriPhyN Virtual Data Technology
www.griphyn.org/chimera
Ive come across some interesting data, but I
need to understand the nature of the corrections
applied when it was constructed before I can
trust it for my purposes.
Ive detected a calibration error in an
instrument and want to know which derived data to
recompute.
Data
Virtual Data System
created-by
consumed-by/ generated-by
Transformation
Derivation
execution-of
I want to apply an astronomical analysis program
to millions of objects. If the results already
exist, Ill save weeks of computation.
I want to search an astronomical database for
galaxies with certain characteristics. If a
program that performs this analysis exists, I
wont have to write one from scratch.
19Problem-Driven, Collaborative Research Methodology
Infra- structure
Software, Standards
Deploy
Build
Deploy
Global Community
Apply
Apply
Design
Apply
Apply
Analyze
Computer Science
Discipline Advances
20Evolution of Open GridStandards and Software
Increased functionality, standardization
Custom solutions
1990
1995
2000
2005
2010
21Open Grid Services Architecture
- Service-oriented architecture
- Key to virtualization, discovery, composition,
local-remote transparency - Leverage industry standards
- Internet, Web services
- Distributed service management
- A component model for Web services
- A framework for the definition of composable,
interoperable services
The Physiology of the Grid An Open Grid
Services Architecture for Distributed Systems
Integration, Foster, Kesselman, Nick, Tuecke,
2002
22OGSI Standard Web Services Interfaces Behaviors
- Naming and bindings (basis for virtualization)
- Every service instance has a unique name, from
which can discover supported bindings - Lifecycle (basis for fault-resilient state
management) - Service instances created by factories
- Destroyed explicitly or via soft state
- Information model (basis for monitoring
discovery) - Service data (attributes) associated with GS
instances - Operations for querying and setting this info
- Asynchronous notification of changes to service
data - Service Groups (basis for registries collective
svcs) - Group membership rules membership management
- Base Fault type
23Open Grid Services Architecture
Users in Problem Domain X
Applications in Problem Domain X
Application Integration Technology for Problem
Domain X
Generic Virtual Service Access and Integration
Layer
OGSA
OGSI Interface to Grid Infrastructure
Compute, Data Storage Resources
-
Distributed
Virtual Integration Architecture
24The Globus Alliance Toolkit(Argonne, USC/ISI,
Edinburgh, PDC)
- An international partnership dedicated to
creating disseminating high-quality open source
Grid technology the Globus Toolkit - Design, engineering, support, governance
- Academic Affiliates make major contributions
- EU CERN, Imperial, MPI, Poznan
- AP AIST, TIT, Monash
- US NCSA, SDSC, TACC, UCSB, UW, etc.
- Significant industrial contributions
- 1000s of users worldwide, many contribute
25Globus Toolkit HistoryAn Unreliable Memoir
Only Globus.Org not downloads from NMI UK
eScience EU DataGrid IBM Platform etc.
GT 2.0 Released
GT 2.2 Released
Physiology of the Grid Paper Released
GT 2.0 beta Released
NSF GRIDS CenterInitiated
Anatomy of the Grid Paper Released
Significant Commercial Interest in Grids
GT 1.1.4 and MPICH-G2 Released
The Grid Blueprint for a New Computing Infrastruc
ture published
NSF European Commission Initiate Many New Grid
Projects
First EuroGlobus Conference Held in Lecce
GT 1.1.3 Released
MPICH-G released
Early Application Successes Reported
GT 1.1.2 Released
Globus Project wins Global Information
Infrastructure Award
GT 1.0.0 Released
GT 1.1.1 Released
NASA initiatesInformation Power Grid
26GlobusToolkitContributorsInclude
- Grid Packaging Technology (GPT) NCSA
- Persistent GRAM Jobmanager Condor
- GSI/Kerberos interchangeability Sandia
- Documentation NASA, NCSA
- Ports IBM, HP, Sun, SDSC,
- MDS stress testing EU DataGrid
- Support IBM, Platform, UK eScience
- Testing and patches Many
- Interoperable tools
Many - Replica location service EU DataGrid
- Python hosting environment LBNL
- Data access integration UK eScience
- Data mediation services SDSC
- Tooling, Xindice, JMS IBM
- Brokering framework Platform
- Management framework HP
- DARPA, DOE, NSF, NASA, Microsoft, EU
27Problem-Driven, Collaborative Research Methodology
Infra- structure
Software, Standards
Deploy
Build
Deploy
Global Community
Apply
Apply
Design
Apply
Apply
Analyze
Computer Science
Discipline Advances
28Infrastructure
- Broadly deployed services in support of virtual
organization formation and operation - Authentication, authorization, discovery,
- Services, software, and policies enabling
on-demand access to important resources - Computers, databases, networks, storage, software
services, - Operational support for 24x7 availability
- Integration with campus infrastructures
- Distributed, heterogeneous, instrumented systems
can be wonderful CS testbeds
29Infrastructure Status
- gt100 infrastructure deployments worldwide
- Community-specific general-purpose
- From campus to international
- Most based on GT technology
- U.S. examples TeraGrid, Grid2003, NEESgrid,
Earth System Grid, BIRN - Major open issues include practical aspects of
operations and federation - Scalability issues (number of users, sites,
resources, files, jobs, etc.) also arising
30(No Transcript)
31TeraGrid
4 Lambdas
CHI
LA
100 TB DataWulf
96 GeForce4 Graphics Pipes
32 Pentium4 52 2p Madison 20 2p Madison Myrinet
96 Pentium4 64 2p Madison Myrinet
20 TB
ANL
Caltech
ANL
256 2p Madison 667 2p Madison Myrinet
128 2p Madison 256 2p Madison Myrinet
1.1 TF Power4 Federation
500 TB FCS SAN
230 TB FCS SAN
NCSA
SDSC
PSC
32Grid2003 Towards a Persistent U.S. Open Science
Grid
Status on 11/19/03 (http//www.ivdgl.org/grid2003)
33NEESgrid
Instrumented Structures and Sites
Remote Users
Simulation Tools Repository
High-Performance Network(s)
Laboratory Equipment
Field Equipment
Curated Data Repository
Leading Edge Computation
Global Connections (FY 2005 FY 2014)
Remote Users (K-12 Faculty and Students)
Laboratory Equipment
www.neesgrid.org
34DOE Earth System Grid
- Goal address technical obstacles to the
sharing analysis of high-volume data from
advanced earth system models
www.earthsystemgrid.org
35Earth System Grid
36EGEEEnabling Grids for E-Science in Europe
Operations Center
Infrastructure
Regional Support Center (Support for
Applications Local Resources)
Resource Center (Processors, disks)
Grid server Nodes
37Problem-Driven, Collaborative Research Methodology
Infra- structure
Software, Standards
Deploy
Build
Deploy
Global Community
Apply
Apply
Design
Apply
Apply
Analyze
Computer Science
Discipline Advances
38Applications
- 100s of projects applying Grid technologies in
science, engineering, and industry - Many are exploratory but a significant number are
delivering real value, in such areas as - Remote access to computers, data, services,
instrumentation - Federation of computers, data, instruments
- Collaborative environments
- No single recipe for success, but well-defined
goals, modest ambition, skilled staff help
39Sloan Galaxy Cluster Analysis
DAG
Sloan Data
Galaxy cluster size distribution
Jim Annis, Steve Kent, Vijay Sehkri, Fermilab,
Michael Milligan, Yong Zhao, Chicago
40NEESgridMulti-site Online Simulation Test
All computational models written in Matlab.
41MOST A Grid Perspective
UIUC Experimental Model
U. Colorado Experimental Model
SIMULATION COORDINATOR
NCSA Computational Model
42MOSTUser Perspective
43Industry Adopts Grid Technology
44Concluding Remarks
Infra- structure
Software, Standards
Deploy
Build
Deploy
Global Community
Apply
Apply
Design
Apply
Apply
Analyze
Computer Science
Discipline Advances
45Grid RD Has Produced Significant Success
Stories
- Computer science results
- Metrics Papers, citations, students
- Widely used software
- Globus Toolkit, Condor, NMI, etc., etc.
- International cooperation community
- Science, technology, infrastructure
- Interdisciplinary science and engineering
- Effective partnerships community
- Industrial adoption
- Broad spectrum of large small companies
46Significant Challenges Remain
- Scaling in multiple dimensions
- Ambition and complexity of applications
- Number of users, datasets, services,
- From technologies to solutions
- The need for persistent infrastructure
- Software and people as well as hardware
- Currently no long-term commitment
- Institutionalizing the 3-way alliance
- Understand implications on the practice of
computer science research
47Thanks, in particular, to
- Carl Kesselman and Steve Tuecke, my long-time
Globus co-conspirators - Gregor von Laszewski, Kate Keahey, Jennifer
Schopf, Mike Wilde, Argonne colleagues - Globus Alliance members at Argonne, U.Chicago,
USC/ISI, Edinburgh, PDC - Miron Livny, U.Wisconsin Condor project, Rick
Stevens, Argonne U.Chicago - Other partners in Grid technology, application,
infrastructure projects - DOE, NSF, NASA, IBM for generous support
48For More Information
- The Globus Alliance
- www.globus.org
- Global Grid Forum
- www.ggf.org
- Background information
- www.mcs.anl.gov/foster
2nd Edition Just Out