Title: emoreorlessanything: The Killer Application Grids P2P and Web Services: The Killer Technologies
1e-moreorlessanything The Killer
ApplicationGrids P2P and Web Services The
Killer Technologies
- University of Southern California
- 7-9pm March 29 2006
- Geoffrey Fox
- Computer Science, Informatics, Physics
- Pervasive Technology Laboratories
- Indiana University Bloomington IN 47401
- gcf_at_indiana.edu
- http//www.infomall.org
2Web services
- Web Services build loosely-coupled, distributed
applications, (wrapping existing codes and
databases) based on the SOA (service oriented
architecture) principles. - Web Services interact by exchanging messages in
SOAP format - The contracts for the message exchanges that
implement those interactions are described via
WSDL interfaces.
3A typical Web Service
- In principle, services can be in any language
(Fortran .. Java .. Perl .. Python) and the
interfaces can be method calls, Java RMI
Messages, CGI Web invocations, totally compiled
away (inlining) - The simplest implementations involve XML messages
(SOAP) and programs written in net friendly
languages like Java and Python
PaymentCredit Card
Web Services
WSDL interfaces
Warehouse Shipping control
WSDL interfaces
Web Services
4Philosophy of Web Service Grids
- Much of Distributed Computing was built by
natural extensions of computing models developed
for sequential machines - This leads to the distributed object (DO) model
represented by Java and CORBA - RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java - Key people think this is not a good idea as it
scales badly and ties distributed entities
together too tightly - Distributed Objects Replaced by Services
- Note CORBA was considered too complicated in both
organization and proposed infrastructure - and Java was considered as tightly coupled to
Sun - So there were other reasons to discard
- Thus replace distributed objects by services
connected by one-way messages and not by
request-response messages
5Typical Grid Architecture
Each Blob is a Computer Program!
UserServices
CoreGrid
6Classic Grid Architecture
Resources
Content Access
Composition
Middle TierBrokers Service Providers
Netsolve
Security
Collaboration
Computing
Middle Tier becomes Web Services
Clients
Users and Devices
7Peer to Peer Grid
Peers
Service FacingWeb Service Interfaces
Peers
User FacingWeb Service Interfaces
Peer to Peer Grid
A democratic organization
8The Grid and Web Service Institutional Hierarchy
4 Application or Community of Interest
(CoI)Specific Services such as Map Services,
Run BLAST or Simulate a Missile
XBMLXTCE VOTABLE CML CellML
3 Generally Useful Services and Features (OGSA
and other GGF, W3C) Such as Collaborate,
Access a Database or Submit a Job
OGSA GS-and some WS- GGF/W3C/.
2 System Services and Features (WS- from
OASIS/W3C/Industry) Handlers like WS-RM,
Security, UDDI Registry
WS- fromOASIS/W3C/Industry
1 Container and Run Time (Hosting) Environment
(Apache Axis, .NET etc.)
Apache Axis.NET etc.
Must set standards to get interoperability
9Sources of Grid Technology
- Grids support distributed collaboratories or
virtual organizations integrating concepts from - The Web
- Agents
- Distributed Objects (CORBA Java/Jini COM)
- Globus, Legion, Condor, NetSolve, Ninf and other
High Performance Computing activities - Peer-to-peer Networks
- With perhaps the Web and P2P networks being the
most important for Information Grids and Globus
for Compute/File Grids
10The Essence of Grid Technology?
- We will start from the Web view and assert that
basic paradigm is - Meta-data rich Web Services communicating via
messages - These have some basic support from some runtime
such as .NET, Jini (pure Java), Apache
TomcatAxis (Web Service toolkit), Enterprise
JavaBeans, WebSphere (IBM) or GT3/4 (Globus
Toolkit 3/4) - These are the distributed equivalent of operating
system functions as in UNIX Shell - Called Hosting Environment or platform
- W3C standard WSDL defines IDL (Interface
standard) for Web Services
11What is Happening?
- Grid ideas are being developed in (at least) four
communities - Web Service W3C, OASIS, (DMTF)
- Global Grid Forum (High Performance Computing,
e-Science) - Enterprise Grid Alliance (Commercial Grid Forum
with a near term focus) - Service Standards are being debated
- Grid Operational Infrastructure is being deployed
- Grid Architecture and core software being
developed - Apache has several important projects as do
academia large and small companies - Particular System Services are being developed
centrally OGSA framework for this in GGF
WS- for OASIS/W3C/Microsoft-IBM - Lots of fields are setting domain specific
standards and building domain specific services - USA started but now Europe is probably in the
lead and Asia will soon catch USA if momentum
(roughly zero for USA) continues
12Technical Activities of Note
- Look at different styles of Grids such as
Autonomic (Robust Reliable Resilient) - New Grid architectures hard due to investment
required - Program the Grid Workflow
- Access the Grid Portals, Grid Computing
Environments - Critical Services Such as
- Security build message based not connection
based - Notification event services
- Metadata Use Semantic Web, provenance
- Fabric and Service Management
- Databases and repositories instruments, sensors
- Computing Submit job, scheduling, distributed
file systems - Visualization, Computational Steering
- Network performance
Low Level WS-
High Level e.g. OGSA
13What do Web Services Prescribe?
- The specify interfaces for system services (and
generally useful services like database) - They specify an interface language (WSDL) for all
services - They develop containers and frameworks to use to
host services - They specify a message format (SOAP) for ALL
messages that defines both application and system
actions precisely - They imply a process be started to define domain
specific services - There are multiple competing activities from
Microsoft and IBM to Apache, IU and Anabas (for
example) developing system and application
services - Unlike for RTI and CORBA, services from different
vendors should interoperate
14What do Grids Add?
- Grids use all of the Web Services
- They address management and deployment of large
distributed systems of services - Internet Scale Distributed Services
- I will use Grid more simply as a composable
coordinated collection of services - They address security and management issues of
virtual organizations crossing multiple
administrative domains - GGF is developing specific services of relevance
including job management, many aspects of data
and scheduling - Not much on sensors, real-time, P2P
- GGF has a good process for developing new higher
level specifications
15Plethora of Standards
- Java is very powerful partly due to its many
frameworks that generalize libraries e.g. - Java Media Framework
- Java Database Connectivity JDBC
- Web Services have a correspondingly collections
of specifications that represent critical
features of the distributed operating systems for
Grids of Simple Services - About 60 WS- specifications introduced in last
2-3 years - These are low level with higher level standards
such as access database (OGSA-DAI) or Submit a
job built on top of these - Many battles both between standard bodies and
between companies as each tries to set standards
they consider best thus there are multiple
standards for many of key Web Service
functionalities - Microsoft a key player and stands to benefit as
Web Services open up enterprise software space to
all participants - e.g. MQSeries (IBM) and Tibco have to change
their messaging systems to support new open
standards
16The Ten areas covered by the 60 core WS-
Specifications
17Activities in Global Grid Forum Working Groups
18The Global Information Grid Core Enterprise
Services
19The Core Service Areas I
20The Core Service Areas II
21A List of Web Services 1
- 1) Core Service Architecture
- XSD XML Schema (W3C Recommendation) V1.0 February
1998, V1.1 February 2004 - WSDL 1.1 Web Services Description Language
Version 1.1, (W3C note) March 2001 - WSDL 2.0 Web Services Description Language
Version 2.0, (W3C under development) March 2004 - SOAP 1.1 (W3C Note) V1.1 Note May 2000
- SOAP 1.2 (W3C Recommendation) June 24 2003
22A List of Web Services 2
- 2) Service Internet including messaging
- WS-Addressing Web Services Addressing (BEA, IBM,
Microsoft, SAP, Sun) in W3C consideration August
2004 - WS-MessageDelivery Web Services Message Delivery
(W3C Submission by Oracle, Sun ..) April 2004 - WS-Reliability Web Services Reliable Messaging
(OASIS Web Services Reliable Messaging TC) March
2004 - WS-RM Web Services Reliable Messaging (BEA, IBM,
Microsoft, Tibco) v0.992 February 2005 linked to
WS-Reliability in OASIS as Web Services Reliable
Exchange (WS-RX) - WS-RM Policy Web Services Reliable Messaging
Policy Assertion (BEA, IBM, Microsoft, Tibco)
March 2006 - WS-RX Web Services Reliable Exchange (Many
members) integrating previous reliability
specifications - SOAP MOTM SOAP Message Transmission Optimization
Mechanism (W3C) June 2004 - SOAP-over-UDP Binding of SOAP to UDP (Microsoft,
BEA ) September 2004 - Many obsolete specifications like WS-Routing and
Referral SOAP Routing Protocol (Microsoft)
October 2001
23Application Specific Grids Generally Useful
Services and Grids Workflow WSFL/BPEL Service
Management (Context etc.) Service Discovery
(UDDI) / Information Service Internet Transport ?
Protocol Service Interfaces WSDL
Higher Level Services
ServiceContext
ServiceInternet
Base Hosting Environment
Protocol HTTP FTP DNS Presentation XDR
Session SSH Transport TCP UDP Network IP
Data Link / Physical
Bit level Internet (OSI Stack)
Layered Architecture for Web Services and Grids
24WS- implies the Service Internet
- We have the classic (CISCO, Juniper .) Internet
routing the flood of ordinary packets in OSI
stack architecture - Web Services build the Service Internet or IOI
(Internet on Internet) with - Routing via WS-Addressing not IP header
- Fault Tolerance (WS-RM not TCP)
- Security (WS-Security/SecureConversation not
IPSec/SSL) - Data Transmission by WS-Transfer not HTTP
- Information Services (UDDI/WS-Context not
DNS/Configuration files) - At message/web service level and not packet/IP
address level - Software-based Service Internet possible as
computers fast - Familiar from Peer-to-peer networks and built as
a software overlay network defining Grid (analogy
is VPN) - SOAP Header contains all information needed for
the Service Internet (Grid Operating System)
with SOAP Body containing information for Grid
application service
25A List of Web Services 3
- 3) Notification and high-level publish/subscribe
information dissemination - WS-Eventing Web Services Eventing (BEA,
Microsoft, TIBCO) August 2004 - WS-EventNotification (HP, IBM, Intel, Microsoft)
March 2006 uses resources to manage subscriptions - WS-Notification Framework for Web Services
Notification with WS-Topics, WS-BaseNotification,
and WS-BrokeredNotification (OASIS) OASIS Web
Services Notification TC Set up March 2004 - JMS Java Message Service V1.1 March 2002
- Different from using publish-subscribe to
robustly support messaging between Web services - Bind SOAP to JMS or MQSeries
26A List of Web Services 4
- 4) Coordination and Workflow, Transactions and
Contextualization - BPEL Business Process Execution Language for Web
Services (OASIS) V1.1 May 2003 (V1.1) with V2.0
under development - WS-CDL Web Services Choreography Language (W3C)
V1.0 Working Draft 17 December 2004 - WSCI (W3C) Web Service Choreography Interface
V1.0 (W3C Note from BEA, Intalio, SAP, Sun,
Yahoo) - WSCL Web Services Conversation Language (W3C
Note) HP March 2002 - Workflow is general linkage between services
transactions are a critical special case - Concept of workflow generalizes traditional
workflow processes in business
27A List of Web Services 4-Continued
- 4) Transactions, Business Processes and
Contextualization - WS-CAF Web Services Composite Application
Framework including WS-CTX, WS-CF and WS-TXM
below (OASIS Web Services Composite Application
Framework TC) - WS-CTX Web Services Context (OASIS Web Services
Composite Application Framework TC) V0.9.2 July
2005 - WS-CF Web Services Coordination Framework (OASIS
Web Services Composite Application Framework TC)
V0.1 April 2005 - WS-TXM Web Services Transaction Management (OASIS
Web Services Composite Application Framework TC)
including WS-ACID (V0.1 May 2005), WS-BP
(Business Process V0.1 May 2005), WS-LRA (Long
running action V0.1 May 2005) - WS-Coordination Web Services Coordination (BEA,
IBM, Microsoft) November 2004 - WS-AtomicTransaction Web Services Atomic
Transaction (BEA, IBM, Microsoft) November 2004 - WS-BusinessActivity Web Services Business
Activity Framework (BEA, IBM, Microsoft) November
2004 - BTP Business Transaction Protocol (OASIS) May
2002 with V1.1 November 2004 - ebXML BPSS Business Process (OASIS) with V2.0.1
pre-Committee Draft review 17 July 2005
28A List of Web Services 5
- 5) Security Frameworks and Core Specifications
- WS-Security 2004 Web Services Security SOAP
Message Security (OASIS) Standard March 2004. - WS-I Basic Security Profile V1.0 Web Services
Interoperability Organization Working Group Draft
May 15 2005 - WS-Security Username Token Profile Web Services
Security Username Token Profile V1.0 OASIS
Standard, March 2004 - WS-Security X.509 Certificate Token Profile Web
Services Security X.509 Certificate Token Profile
OASIS Standard, March 2004 - WS-Security REL Profile Web Services Security
Rights Expression Language (REL) Token Profile
OASIS Standard 19 December 2004 - WS-I REL Token Profile V1.0 Web Services
Interoperability Organization Working Group Draft
13 May 2005 - WS-Security Kerberos Web Services Security
Kerberos Binding (Microsoft) December 2003 - Web-SSO Web Single Sign-On Metadata Exchange
Protocol (Microsoft, Sun) April 2005 - Web-SSO-Mex Web Single Sign-On Interoperability
Profile (Microsoft, Sun) April 2005 - WS-SecurityPolicy Web Services Security Policy
Language (IBM, Microsoft, RSA, Verisign) V1.1
July 2005
29A List of Web Services 5 - Contd
- 5) Security Capabilities
- WS-Trust Web Services Trust Language (BEA, IBM,
Microsoft, RSA, Verisign ) February 2005 - WS-SecureConversation Web Services Secure
Conversation Language (BEA, IBM, Microsoft, RSA,
Verisign ) February 2005 - WS-Federation Web Services Federation Language
(BEA, IBM, Microsoft, RSA, Verisign) July 2003 - WS-Federation Active Requestor Profile Web
Services Federation Language Active Requestor
Profile V 1.0 (BEA, IBM, Microsoft, RSA,
Verisign) July 8, 2003 - WS-Federation Passive Requestor Profile Web
Services Federation Language Passive Requestor
Profile V 1.0 (BEA, IBM, Microsoft, RSA,
Verisign) July 8, 2003 - WS-Authorization is being developed by IBM and
Microsoft and will build on WS-Trust to describe
how access to particular web services is
specified and managed. - WS-Privacy is being developed by IBM and
Microsoft and will build on WS-Policy to describe
the binding of privacy policies to Web services
and their exchanged data.
30A List of Web Services 5 - Contd
- 5) Security Languages
- SAML Assertions and Protocols for the OASIS
Security Assertion Markup Language (SAML) V2.0
OASIS Standard, 15 March 2005 - WS-Security SAML Token Profile Web Services
Security SAML Token Profile OASIS Standard, 1
December 2004 - WS-I SAML Token Profile V1.0 Web Services
Interoperability Organization Working Group Draft
13 May 2005 - XACML eXtensible Access Control Markup Language
(OASIS) V2.0 1 February 2005
31A List of Web Services 6
- 6) Service Discovery
- UDDI (Broadly Supported OASIS Standard) V3 August
2003 - WS-Discovery Web services Dynamic Discovery
(Microsoft, BEA, Intel ) February 2004 - WS-IL Web Services Inspection Language, (IBM,
Microsoft) November 2001 - Note WS-Context as a metadata catalog and
WS-Management Catalog are examples of related
services - There are many UDDI extensions
32A List of Web Services 7
- 7) Metadata and State
- RDF Resource Description Framework (W3C) Set of
recommendations expanded from original February
1999 standard - DAMLOIL combining DAML (Darpa Agent Markup
Language) and OIL (Ontology Inference Layer)
(W3C) Note December 2001 - OWL Web Ontology Language (W3C) Recommendation
February 2004 - WS-MetadataExchange 1.1 Web Services Metadata
Exchange (HP, IBM, Intel, Microsoft) March 2006 - ASAP Asynchronous Service Access Protocol (OASIS)
with V1.0 working draft 2B December 11 2004 - WS-GAF Web Service Grid Application Framework
(Arjuna, Newcastle University) August 2003 - WBEM Web-Based Enterprise Management including
CIM (Common Information Model) from DMTF
(Distributed Management Task Force) 2004-2005
33A List of Web Services 7
- 7) Metadata and State Resource Framework
- WS-RF Web Services Resource Framework (OASIS)
including - WS-Resource Framework Web Services Resource 1.2
(OASIS) Public Review Draft 01, 10 June 2005 - WS-ResourceProperties Web Services Resource
Properties V1.2 Public Review Draft 01, 10 June
2005 - WS-ResourceLifetime Web Services Resource
Lifetime V1.2 Public Review Draft 01, 13 June
2005 - WS-ServiceGroup Web Services Service Group V1.2
Public Review Draft 01, 10 June 2005 - WS-BaseFaults Web Services Base Faults V1.2
Public Review Draft 01, June 13, 2005
34Metadata and Service Context
- Consider a collection of services working
together - Workflow tells you how to specify service
interaction but more basically there is shared
information or context specifying/controlling
collection - WS-RF and WS-GAF have different approaches to
contextualization supplying a common context
which at its simplest is a token to represent
state - More generally core shared information includes
dynamic service metadata and the equivalent of
configuration information. - One can supports such a common context either as
pool of messages or as message-based access to a
database (Context Service) - Two services linked by a stream are perhaps
simplest example of a collection of services
needing context - Note that there is a tension between storing
metadata in messages and services. - This is shared versus distributed memory debate
in parallel computing
35Stateful Interactions
- There are (at least) four approaches to
specifying state - OGSI use factories to generate separate services
for each session in standard distributed object
fashion - Globus GT-4 and WSRF use metadata of a resource
to identify state associated with particular
session - WS-GAF uses WS-Context to provide abstract
context defining state. Has strength and weakness
that reveals less about nature of session - WS-I Pure Web Service leaves state
specification the application e.g. put a
context in the SOAP body - I think we should smile and write a great
metadata service hiding all these different
models for state and metadata
36A List of Web Services 8
- 8) Management original OASIS
- WS-DistributedManagement Web Services Distributed
Management Framework with MUWS and MOWS below
(OASIS) - WSDM-MUWS Web Services Distributed Management
Management Using Web Services (OASIS) OASIS
Standard March 9 2005 - WSDM-MOWS Web Services Distributed Management
Management of Web Services (OASIS) OASIS Standard
March 9 2005
37A List of Web Services 8- Contd
- 8) Management Microsoft Converged Stack
- WS-Management Web Services for Management
(Microsoft, Intel, Sun ) August 2005 - WS-Management Catalog The WS-Management Catalog
(Microsoft, Intel, Sun ) August 2005 - WS-ResourceTransfer Web Service Resource Transfer
(HP, IBM, Intel, Microsoft) March 2006 - WS-Transfer Web Service Transfer (Microsoft, BEA,
Sonic Software etc.) September 2004 - WS-TransferAddendum Extensions to Web Service
Transfer (HP, IBM, Intel, Microsoft) March 2006 - WS-Enumeration Web Service Enumeration
(Microsoft, BEA, Sonic Software etc.) September
2004
38A List of Web Services 9
- 9) General Service Characteristics
- WS-PolicyFramework Web Services Policy Framework
(BEA, IBM, Microsoft, SAP ) September 2004 - WS-PolicyAttachment Web Services Policy
Attachment (BEA, IBM, Microsoft, SAP ) September
2004 - WS-PolicyAssertions Web Services Policy
Assertions Language (BEA, IBM, Microsoft, SAP) 18
December 2002 (Superseded by WS-PolicyFramework) - WS-Agreement Web Services Agreement Specification
(GGF under development) 9 August 2004
39A List of Web Services 10
- 10) User Interfaces
- WSRP Web Services for Remote Portlets (OASIS)
OASIS Standard August 2003 - JSR168 JSR-000168 Portlet Specification for Java
binding (Java Community Process) October 2003 - WSRP specifies the client-service protocol while
JSR168 specifies how portlets are implemented for
each supported service user-facing Web service
ports inside aggregating portalslike JetSpeed,
GridSphere or uPortal
40WS-I Interoperability
- Critical underpinning of Grids and Web Services
is the gradually growing set of specifications in
the Web Service Interoperability Profiles - Web Services Interoperability (WS-I)
Interoperability Profile 1.0a."
http//www.ws-i.org. gives us XSD, WSDL1.1,
SOAP1.1, UDDI in basic profile and parts of
WS-Security in their first security profile. - We imagine the 60 Specifications being checked
out and evolved in the cauldron of the real world
and occasionally best practice identifies a new
specification to be added to WS-I which gradually
increases in scope - Note only 4.5 out of 60 specifications have made
it in this definition
41Some ideas to Remember
- Grids are managed Web Services exchanging
Messages - P2P Networks are differently managed and
architected services exchanging messages - Any computer operation involves messages not all
these messages can be isolated - With services all messages are explicit and can
be examined - Grid Services extend WS- Web Service
Specifications - Web Service container replaces computer
- Service replaces process
- A stream is an ordered set of messages
- Service Internet replaces Internet messages
replace packets - (Sub)Grids replace Libraries
42Internet Scale Distributed Services
- Grids use Internet technology and are
distinguished by managing or organizing sets of
network connected resources - Classic Web allows independent one-to-one access
to individual resources - Grids integrate together and manage multiple
Internet-connected resources People, Sensors,
computers, data systems - Organization can be explicit as in
- TeraGrid which federates many supercomputers
- Information Retrieval Grid which federates
multiple data resources - CrisisGrid which federates first responders,
commanders, sensors, GIS, (Tsunami) simulations,
science/public data - Organization can be implicit as in Internet
resources such as curated databases and
simulation resources that harmonize a community
43Different Visions of the Grid
- e-Science or Cyberinfrastructure are virtual
organization Grids supporting global distributed
engineering and science research (note sensors,
instruments are people are all distributed) - Utility Computing or X-on-demand (Xdata,
computer ..) is a major computer Industry
interest in Grids and this is key part of
enterprise or campus Grids - Skype (Kazaa) VOIP system is a Peer-to-peer Grid
(and VRVS/GlobalMMCS like Internet A/V
conferencing are Collaboration Grids) - DoDs vision of Network Centric Computing can be
considered a Grid (linking sensors, warfighters,
commanders, backend resources) and they are
building the GIG (Global Information Grid) - Commercial 3G Cell-phones and DoD ad-hoc network
initiative are forming mobile Grids - Grids support universal Globalization in life,
fun, research, business
44Why use SOAs
- Globalization of applications Life, Fun,
Research, Business, Defense as an International
collaborative activity - Globalization of Software Production Software
components including open-source made everywhere - Interoperability in interfaces and protocol
(messages) requires Web Services as only broadly
supported SOA - Anti-Performance if Moores law gives you a
factor X, then use vX for performance, v X for
improved lifecycle (re-use) - Software Engineering Software paradigms are ways
of packaging modules/components/objects/methods/
subroutines. Services have minimal coupling and
best re-use (lowest performance). 1962 Fortran
easier re-use than 2006 Java - Multicore chips requires pervasive concurrency
without side effects. Even Microsoft must be able
to use 32-128 way parallelism on a chip over next
5 years
45Intel Fall 2005 Multicore Roadmap
March 2006 Sun T1000 8 core Server at lt6,000
46Performance Per Transistor
Peter Kogge 1997
Normalized SPECINTS
Normalized SPECFLTS
Millions of Transistors (CPU)
Millions of Transistors (CPU)
- Performance data from uP vendors
- Transistor count excludes on-chip caches
- Performance normalized by clock rate
- Conclusion Simplest is best! (250K Transistor
CPU)
471962 Lickliders Vision
- Lick had this concept all of the stuff
linked together throughout the world, that you
can use a remote computer, get data from a remote
computer, or use lots of computers in your job. - Larry Roberts Principal Architect of the ARPANET
48Physics and the Web
- Tim Berners-Lee developed the Web at CERN as a
tool for exchanging information between the
partners in physics collaborations - The first Web Site in the USA was a link to the
SLAC library catalogue - It was the international particle physics
community who first embraced the Web - Killer application for the Internet
- Transformed modern world academia, business and
leisure
49What is e-Science?
- e-Science is about global collaboration in
key areas of science, and the next generation of
infrastructure that will enable it. - John Taylor
- Director General of Research Councils
- UK, Office of Science and Technology
- e-Science is about developing tools and
technologies that allow scientists to do faster,
better or different research
50Example e-Science Projects
- Particle Physics
- global sharing of data and computation
- Astronomy
- Virtual Observatory for multi-wavelength
astrophysics - Chemistry
- remote control of equipment and electronic
logbook - Bioinformatics
- data integration, knowledge discovery and
workflow - Healthcare
- sharing normalized mammograms
- Environment
- Ocean, weather, climate modelling, sensor networks
51e-moreorlessanything and the Grid
- e-Business captures an emerging view of
corporations as dynamic virtual organizations
linking employees, customers and stakeholders
across the world. - The growing use of outsourcing is one example
- e-Science is the similar vision for scientific
research with international participation in
large accelerators, satellites or distributed
gene analyses. - The Grid integrates the best of the Web,
traditional enterprise software, high performance
computing and Peer-to-peer systems to provide the
information technology e-infrastructure for
e-moreorlessanything. - A deluge of data of unprecedented and inevitable
size must be managed and understood. - People, computers, data and instruments must be
linked. - On demand assignment of experts, computers,
networks and storage resources must be supported
52Science is a Team Sport
Life Sciences
53Technology Today is More than Computers
- Todays computer is a coordinated set of
hardware, software, and services providing an
end-to-end resource. - Cyberinfrastructure captures how the SE
community has redefined computer
The computer as an integrated set of resources
54Integrated Cyberinfrastructure
Cyberinfrastructure resources (computers,
data storage, networks, scientific instruments,
experts, etc.) glue (integrating software,
systems, and organizations).
NSFs Atkins Report provided a compelling
vision for integrated Cyberinfrastructure
55How does Cyberinfrastructure Work?Cyberinfrastruc
ture-enabled Neurosurgery
- PROBLEM Neuro-surgeons seek to remove as much
tumor tissue as possible while minimizing removal
of healthy brain tissue - Brain deforms during surgery
- Surgeons must align preoperative brain image with
intra-operative images to provide surgeons the
best opportunity for intra-surgical navigation
56Cyberinfrastructure and Computation -- Parallelism
- Two ways of making computers solve problems
faster - Make CPUs faster
- Divide the problem into parts use more than one
CPU interconnected by a network to run each of
the parts simultaneously (parallelism)
57Cyberinfrastructure and Computation Grid
Computing
- Grid Computing takes the parallel computer out
of the box - Allow the CPUs to be in different geographical
locations - Connect many different kinds of components
NVO analysis can involve connecting the
telescope, data archive, and computer through
grid computing
Internet
58National-scale Grid Projects
Open Science Grid Physics-driven Grid
infrastructure
NEES Earthquake Engineering Grid
59Community Tools
- e-mail and list-serves are oldest and best used
- Kazaa, Instant Messengers, Skype, Napster,
BitTorrent for P2P Collaboration text,
audio-video conferencing, files - del.icio.us, Connotea, Citeulike manage shared
bookmarks - hotornot.com or similar sites allow you to create
community resources and share them - Writely, Wikis and Blogs are powerful specialized
shared document systems - ConferenceXP and WebEx share general applications
- Google Scholar tells you who has cited your
papers while publisher sites tell you about
co-authors - Note sharing resources creates (implicit)
communities - Social network tools study graphs to both define
communities and extract their properties
60Entertainment Cyberinfrastructure
Role Playing Games support distributed players
in a shared scenario
Meanwhile games like chess are (apart from issues
like cheating) probably equally good on the
Internet as face-to-face. Grandmasters can give
lessons using Skype, text chats and shared chess
games as in internetchess.com. They can be paid
by paypal.com
61Raw Data ? Data ? Information ?
Knowledge ? Wisdom
AnotherGrid
Decisions
AnotherGrid
SS
SS
SS
SS
FS
FS
OS
MD
MD
FS
Portal
FS
OS
OS
OS
SOAP Messages
OS
FS
FS
FS
FS
AnotherService
FS
MD
MD
OS
MD
OS
OS
FS
Other Service
FS
FS
FS
FS
OS
MD
OS
OS
FS
FS
FS
MD
MD
FS
Filter Service
OS
FS
MetaData
AnotherGrid
FS
FS
FS
MD
Sensor Service
SS
SS
SS
SS
SS
SS
SS
SS
SS
SS
AnotherService
62Semantic Grid and Services
- Implications of SOA (Service Oriented
Architectures) for SG (Semantic Grid) - Build services to implement SG
- Implications of SG for SOA
- Build metadata rich systems of services using SG
- Services receive data in SOAP messages,
manipulate it and produce transformed data as
further messages - Meta-data is carried in SOAP messages
- Meta-data controls processing and transport of
SOAP Messages - Knowledge is created from data by services
- The Grid enhances Web services with semantically
rich system and application specific management - One must exploit and work around the different
approaches to meta-data and their manipulation in
Web Services
63Structure of SOAP Messages
- SOAP Messages have System information in the
header including WS-Policy based meta-data
defining processing options - Processed by Handlers
- Application data and meta-data is the body
(controversies here!) - Processed by the Service itself
- Some meta-data like WS-RF is logically only in
messages - Other like that in WS-Context or the SRB are
stored in logical equivalent of XML databases - We only need to preserve semantic structure
(XML/SOAP Infoset) so transport in fast XML and
store in efficient relational databases
64What Type of Services are there?
- There are a horde of support services supplying
security, collaboration, database access, user
interfaces - The support services are either associated with
system or application - We studied the WS- and GS- which implicitly or
explicitly define many support services - There are generalized filter services which are
applications that accept messages and produce new
messages with some data derived from that in
input - Simulations (including PDEs and reactive
systems) - Data-mining
- Transformations
- Agents
- Reasoning are all termed filters
here - There are services like author ontology, parse
RDF or attach provenance that directly support
Semantic Grid - But all services and their interactions are
bathed in sea of meta-data and so implicitly need
and support the Semantic Grid
65Its a Composite Hierarchical World
- Filters can be a workflow which means they are
just collections of other simpler services - One needs meta-data to control the workflow
- Services are programs that accept messages and
produce messages - Grids are a distributed collection of services
supporting managed shared resources - Management requires meta-data
- Grids are distributed systems that accept
distributed messages and produce distributed
result messages - Can always talk about Grids and view a service
or a workflow as a special case of a Grid - It just requires meta-data to send a message to a
Grid and it routed to correct computer holding
requested service - Meta-data allows mapping of virtual to real
addresses
66Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
Filter Service
OS
FS
FS
MD
MD
FS
FS
OS
OS
OS
Portal
OS
FS
FS
FS
FS
FS
MD
MD
OS
MD
OS
OS
FS
Other Service
FS
FS
FS
FS
OS
MD
OS
OS
FS
FS
FS
MD
MD
FS
OS
FS
MetaData
FS
FS
FS
MD
Sensor Service
SS
SS
SS
SS
SS
SS
SS
SS
SS
SS
67Consequences of Rule of the Millisecond
- Useful to remember critical time scales
- 1) 0.000001 ms CPU does a calculation
- 2a) 0.001 to 0.01 ms Parallel Computing MPI
latency - 2b) 0.001 to 0.01 ms Overhead of a Method Call
- 3) 1 ms wake-up a thread or process
- 4) 10 to 1000 ms Internet delay
- 2a), 4) implies geographically distributed
metacomputing cant in general compete with
parallel systems - 3) ltlt 4) implies a software overlay network is
possible without significant overhead - We need to explain why it adds value of course!
- 2b) versus 3) and 4) describes regions where
method and message based programming paradigms
important
68Linking Modules
- From method based to RPC to message based to
event-based publish-subscribe Message Oriented
Middleware
ListenerSubscribe to Events
Publisher Post Events
Message Queue in the Sky
69What is a Simple Service?
- Take any system it has multiple functionalities
- We can implement each functionality as an
independent distributed service - Or we can bundle multiple functionalities in a
single service - Whether functionality is an independent service
or one of many method calls into a glob of
software, we can always make them as Web
services by converting interface to WSDL - Simple services are gotten by taking
functionalities and making as small as possible
subject to rule of millisecond - Distributed services incur messaging overhead of
one (local) to 100s (far apart) of milliseconds
to use message rather than method call - Use scripting or compiled integration of
functionalities ONLY when require lt1 millisecond
interaction latency - Apache web site has many (pre Web Service)
projects that are multiple functionalities
presented as (Java) globs and NOT (Java) Simple
Services - Makes it hard to integrate sharing common
security, user profile, file access .. services
70Grids of Grids of Simple Services
- Link via methods ? messages ? streams
- Services and Grids are linked by messages
- Internally to service, functionalities are linked
by methods - A simple service is the smallest Grid
- We are familiar with method-linked
hierarchyLines of Code ? Methods ? Objects ?
Programs ? Packages
71Component Grids?
- So we build collections of Web Services which we
package as component Grids - Visualization Grid
- Sensor Grid
- Utility Computing Grid
- Collaboration Grid
- Earthquake Simulation Grid
- Control Room Grid
- Crisis Management Grid
- Drug Discovery Grid
- Bioinformatics Sequence Analysis Grid
- Intelligence Data-mining Grid
- We build bigger Grids by composing component
Grids using the Service Internet
72Using the Grid of Grids and Core Services to
build multiple application grids re-using common
components.
BioInformatics Grid
Chemical Informatics Grid
15 Application Services Sequencing
Tools Biocomplexity Simulations
Domain Specific Grids/Services
15 Application Services Screening Tools Quantum
Calculations
14 Information
Instrument/Sensor
11 Portals
Services
13 Data Access/Storage
12 Computing
17 Collaboration
9 Management 18 Scheduling
10 Policy
4 Notification
8Metadata
7 Discovery
Core Low Level Grid Services
5 Workflow
6 Security
3 Messaging
9 Management
Physical Network (monitored by FS16)
73Critical Infrastructure (CI) Grids built as Grids
of Grids
74Mediation and Transformation in a Grid of Grids
and Simple Services
75Why can we build better software?
- In 1962 I was punching holes in cards and paper
tape to persuade tiny slow computers to
manipulate words in memory to string together
instructions like a b c - Now computers are much faster and languages are
better but not a lot better - I suspect I would only be a factor of 2 or so
faster programming the same program today - However A B C can now be resources (Bank records,
Drugs, Games, Supernova) and can be a service - Objects were wrong as they distributed ordinary
programs services express distributed
independent entities (communication time very
different inter and intra computers) - Services are essential for reliable modular
programming
76Whats wrong with old programs
- They were made of instructions, methods,
subroutines and libraries thereof - Languages (Java, C) encouraged spaghetti
programming that linked parts of programs
together - This leads to efficient unmaintainable software
- However now computers and networks are several
orders of magnitude faster - Optimize for modularity and maintainability and
rarely if ever optimize for performance - Old programs have the wrong optimization and by
construction are hard to optimize
77Old and New Software Regime
- Web Services, Grids and P2P systems are built
with - The new software model independent entities
connected by explicit messages - All computer entities are actually connected by
some form of message (traveling on bus or from
memory to register) but often implicit - And they support the distributed services and
resources needed for global science, fun and
business - Google, Amazon, Yahoo and perhaps Microsoft and
Electronic Arts can use - Old programs have the old architecture and cannot
be modified - At best can wrap partial functionalities as
services and use as a black box - IBM, Oracle and the old Enterprise software
companies have this noose around their necks
78Large and Small Grids
- N resources in a community (N is billions for the
world and 1000-10000 for many scientific fields) - Communities are arranged hierarchically with real
work being done in groups of M resources M
could be 10-100 in e-Science - Metcalfes law value of network grows like
square of number of nodes M we call Grids where
this true Metcalfe or M2 Grids - Nature of Interaction depends on size of M or N
- Shared Information O(N) Complexity Grids for
largish N - Complexity M2 Metcalfe Grids for smaller M lt N
- Grids must merge with peer-to-peer networks to
support both Complexity O(N) and M2 Systems
79Community Resources
- Grid Community databases have analogy to
Television and the News Web that allow
individuals to communicate instantly with each
other via Web Pages and Headline News acting as
proxies - N resources deposit information and N can view
Complexity O(N)
80M2 Interactions
- Superimpose M2 Grids on the sea (heatbath) of
O(N) ordinary interactions
81Architecture of (Web Service) Grids
- Grids built from Web Services communicating
through an overlay network built in SOFTWARE on
the ordinary internet at the application level - Grids provide the special quality of service
(security, performance, fault-tolerance) and
customized services needed for distributed
complex enterprises - We need to work with Web Service community as
they debate the 60 or so proposed Web Service
specifications - Use Web Service Interoperability WS-I as best
practice - Must add further specifications to support high
performance - Database Grid Services for O(N) Community case
- Streaming support for M2 case
- We add to WS-, Grid services for managed shared
resources
82e-Defense and e-Crisis
- Grids support Command and Control and provide
Global Situational Awareness - Link commanders and frontline troops to
themselves and to archival and real-time data
link to what-if simulations - Dynamic heterogeneous wired and wireless networks
- Security and fault tolerance essential
- System of Systems Grid of Grids
- The command and information infrastructure of
each ship is a Grid each fleet is linked
together by a Grid the President is informed by
and informs the national defense Grid - Grids must be heterogeneous and federated
- Crisis Management and Response enabled by a Grid
linking sensors, disaster managers, and first
responders with decision support
83DAME Grid based tools and Infer-structure for
Aero-Engine Diagnosis and Prognosis
XTO
Companies Rolls-Royce DSS Cybula
Universities York, Leeds, Sheffield, Oxford
Engine Model
Case Based Reasoning
Signal Data Explorer
84DAME Operational Scenario
Engine flight data
5000 engines
Gigabyte per aircraft per Engine per
transatlantic flight
London Airport
New York Airport
Grid
Airline office
Diagnostics Centre
Maintenance Centre
American data center
European data centre
Rolls Royce and UK e-Science ProgramDistributed
Aircraft Maintenance Environment
85DAME Signal Data Explorer Service
86NASA Aerospace Engineering Grid
87Some Important Styles of Grids
- Computational Grids were origin of concepts and
link computers across the globe high latency
stops this from being used as parallel machine - Typically Compute/File Grids where information
(messages) exchanged by writing and reading files - Knowledge and Information Grids link sensors and
information repositories as in Virtual
Observatories or BioInformatics - Education Grids link teachers, learners, parents
as a VO with learning tools, distant lectures
etc. - e-Science Grids link multidisciplinary
researchers across laboratories and universities - Community Grids focus on Grids involving large
numbers of peers rather than focusing on linking
major resources links Grid and Peer-to-peer
network concepts - Semantic Grid links Grid, and AI community with
Semantic web (ontology/meta-data enriched
resources) and Agent concepts - Collaboration Grids support the linkage of
multiple people and electronic resources (often
peer-to-peer architecture)
88Types of Computing Grids
- Running Pleasing Parallel Jobs as in United
Devices, Entropia (Desktop Grid) cycle stealing
systems - Can be managed (inside the enterprise as in
Condor) or more informal (as in SETI_at_Home) - Computing-on-demand in Industry where jobs
spawned are perhaps very large (SAP, Oracle ) - Support distributed file systems as in Legion
(Avaki), Globus with (web-enhanced) UNIX
programming paradigm - Particle Physics will run some 30,000
simultaneous jobs - Distributed Simulation HLA style Grids (some
work) - Linking Supercomputers as in TeraGrid
- Pipelined applications linking data/instruments,
compute, visualization - Seamless Access where Grid portals allow one to
choose one of multiple resources with a common
interfaces - Parallel Computing typically NOT suited for a
Grid (latency)
89Analysis and Visualization
Large Disks
Old Style Metacomputing Grid
Large Scale Parallel Computers
Spread a single large Problem over multiple
supercomputers
90Utility and Service Computing
- An important business application of Grids is
believed to be utility computing - Namely support a pool of computers to be assigned
as needed to take-up extra demand - Pool shared between multiple applications
- Natural architecture is not a cluster of
computers connected to each other but rather a
Farm of Grid Services connected to Internet and
supporting services such as - Web Servers
- Financial Modeling
- Run SAP
- Data-mining
- Simulation response to crisis like forest fire or
earthquake - Media Servers for Video-over-IP
- Note classic Supercomputer use is to allow full
access to do anything via ssh etc. - In service model, one pre-configures services for
all programs and you access portal to run job
with less security issues
91UK National Grid Service
Web Services based National Grid Infrastructure
92Towards an International Grid Infrastructure
UK NGS
Leeds
Manchester
Starlight (Chicago)
US TeraGrid
Netherlight (Amsterdam)
Oxford
RAL
SDSC
NCSA
PSC
UCL
UKLight
SC05
Local laptops in Seattle and UK
All sites connected by production network (not
all shown)
Computation
Steering clients
Network PoP
Service Registry
93Cyberinfrastructure At Home
- BOINC (Berkeley Open Infrastructure for Network
Computing) (http//boinc.berkeley.edu) - Climateprediction.net study climate change
- Einstein_at_home search for gravitational signals
emitted by pulsars - LHC_at_home improve the design of the CERN LHC
particle accelerator - Predictor_at_home investigate protein-related
diseases - Rosetta_at_home help researchers develop cures for
human diseases - SETI_at_home Look for radio evidence of
extraterrestrial live - Etc.
Arecibo telescope
SETI_at_Home averages 138 TFLOPS on 100,000s of
computers in 100s of countries
94climateprediction.net
Since September 2003 95,000 registered
participants in 150 countries Donated 8,000 years
of computer time Completed 100,000 simulations of
over 4M model years
95Results so Far the first steps towards a fully
probability-based forecast
96Information/Knowledge Grids
- Distributed (10s to 1000s) of data sources
(instruments, file systems, curated databases ) - Data Deluge 1 (now) to 100s petabytes/year
(2012) - Moores law for Sensors
- Possible filters assigned dynamically (on-demand)
- Run image processing algorithm on telescope image
- Run Gene sequencing algorithm on compiled data
- Needs decision support front end with what-if
simulations - Metadata (provenance) critical to annotate data
- Integrate across experiments as in
multi-wavelength astronomy
Data Deluge comes from pixels/year available
97Data Deluged Science
- In the past, we worried about data in the form of
parallel I/O or MPI-IO, but we didnt consider
it as an enabler of new algorithms and new ways
of computing - Data assimilation was not central to HPCC
- DoE ASCI set up because didnt want test data!
- Now particle physics will get 100 petabytes from
CERN - Nuclear physics (Jefferson Lab) in same situation
- Use around 30,000 CPUs simultaneously 24X7
- Weather, climate, solid earth (EarthScope)
- Bioinformatics curated databases (Biocomplexity
only 1000s of data points at present) - Virtual Observatory and SkyServer in Astronomy
- Environmental Sensor nets
98The Data Deluge
- In next 5 years e-Science projects will produce
more scientific data than has been collected in
the whole of human history - Some normalizations
- The Bible 5 Megabytes
- Annual refereed papers 1 Terabyte
- Library of Congress 20 Terabytes
- Internet Archive (1996 2002) 100 Terabytes
- In many fields new high throughput devices,
sensors and surveys will be producing Petabytes
of scientific data
99Tracking the Heavens
Hubble Telescope
Palomar Telescope
Sloan Telescope
100Virtual Observatory Astronomy GridIntegrate
Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible X-ray
Galaxy Density Map
101The Virtual Observatory
- Premise most observatory data is (or could be)
online - So, the Internet is the worlds best telescope
- It has data on every part of the sky
- In every measured spectral band optical, x-ray,
radio.. - Its as deep as the best instruments
- It is up when you are up
- The seeing is always great
- Its a smart telescope links objects and
data to literature on them - Software has became a major expense
- Share, standardize, reuse..
Slide modified from Alex Szalay, NVO
102Downloading the Night Sky
- The National Virtual Observatory
- Astronomy community came together to set
standards for services and data - Interoperable, multi-terabyte online databases
- Technology-enabled, science-driven.
- NVO combines over 100 TB of data from 50 ground
and space-based telescopes and instruments to
create a comprehensive picture of the heavens - Sloan Digital Sky Survey, Hubble Space Telescope,
Two Micron All Sky Survey, National Radio
Astronomy Observatory, etc.
Hubble Telescope
Palomar Telescope
Sloan Telescope
103Using Technology to Evolve Astronomy
- Looking for
- Needles in haystacks the Higgs particle
- Haystacks -- Dark matter, Dark energy
- Statistical analysis often deals with
- Creating uniform samples
- Data filtering
- Assembling relevant subsets
- Censoring bad data
- Likelihood calculations
- Hypothesis testing, etc.
- Traditionally these are performed on files, most
of these tasks are much better done inside a
database
Slide modified from Alex Szalay, NVO
104How NVO Works
- Raw data comes from large-scale telescopes
- Telescopes provide daily sweep of the sky,
scientists clean data which is then converted
from temporal to spatial data, allowing indexing
over both dimensions. - All NVO data on website available to the public
without restriction (by community agreement, all
data public after 1 year) - NVO databases distributed and mirrored at
multiple sites
Crab Nebula
Palomar Telescope
105Making Discoveries Using the NVO
Scientists at Johns Hopkins, Caltech and other
institutions confirmed the discovery of a new
brown dwarf. Search time on 5,000,000 files went
from months to minutes using NVO database tools
and technologies. Brown dwarfs are often called
the missing link in the study of star
formations. They are considered small, cool
failed stars.
106Cyberinfrastructure and NVO
- Sky surveys from major telescopes indexed and
catalogued in NVO databases by time and spatial
location using Storage Resource Broker and other
tools - NVO collections archived at multiple sites,
accessed by Grid technologies - Software tools and web portals create an
environment for ingestion of new information,
mining, discovery and dissemination -
107International Virtual Observatory Alliance
- Reached international agreements on Astronomical
Data Query Language, VOTable 1.1, UCD 1,
Resource Metadata Schema - Image Access Protocol, Spectral Access Protocol
and Spectral Data Model, Space-Time Coordinates
definitions and schema - Interoperable registries by Jan 2005 (NVO,
AstroGrid, AVO, JVO) using OAI publishing and
harvesting - So each Community of Interest builds data AND
service standards that build on GS- and WS-
108 Particle Physics at the CERN LHC
ATLAS at LHC, 2006-2020 150106 sensors
UA1 at CERN 1981-1989 "hermetic detector"
LHC experimental collaborations (e.g. ATLAS)
typically involve over 100 institutes and over
1000 physicists world wide
109Particle Physicists need to support a truly
global Virtual Organization
Europe 267 institutes, 4603 usersElsewhere
208 institutes, 1632 users
110Comb-e-Chem Project
Video
Simulation
Properties
Analysis
StructuresDatabase
Diffractometer
X-Raye-Lab
Propertiese-Lab
Grid Middleware
111myGrid Project
- Imminent deluge of data
- Highly heterogeneous
- Highly complex and inter-related
- Convergence of data and literature archives
112The Williams Workflows
A
B
C
A Identification of overlapping sequence B
Characterisation of nucleotide sequence C
Characterisation of protein sequence
113eDiaMoND Project
Mammograms have diffe