Title: Myth Busting and Bridge Building Professor Carole Goble University of Manchester, UK
1Myth Busting and Bridge BuildingProfessor Carole
GobleUniversity of Manchester, UK
2The Grid The Semantic Grid Two-way
traffic Building the bridge
3- Pervasive and dependable computing utility
- Proposed a distributed computing infrastructure
for advanced science and engineering - Application problem holders developers
- Global Grid Forum
- http//www.ggf.org
4(No Transcript)
5- e-Science is about global collaboration in key
areas of science and the next generation of
computing infrastructure that will enable it - e-Science will change the dynamic of the way
science is undertaken
John Taylor, Director General of UK Research
Councils
(And to prove it he invested 240 million over 5
years)
6- Quantity explosion
- Geographical, organisational, data and
information complexity - Global collaboration
Analysis paralysis In silico experiments
Figure courtesy of LION BioSciences
7Sharing expensive resources more effectively, on
demand
Sharing of Ultra High Voltage Electron
Microscopy in Osaka University, Japan with
National Center for Microscopy and Imaging
Research in San Diego, USA
http//www.nbirn.net
- OP3D. Compute intensive surgical visualisation
system, University of Manchester, UK.
http//www.esnw.ac.uk
Figures courtesy of Nigel John (OP3D) and Mark
Ellisman (BIRN)
8 A collaboratory isa center without walls, in
which the nation's researchers can perform their
research without regard to geographical location,
interacting with colleagues, accessing
instrumentation, sharing data and computational
resources, and accessing information in digital
libraries William Wulf, 1989 U.S. National
Science Foundation
9The Grid as Collaboratory
Figure courtesy of Ian Foster
Controlled sharing of resources and know-how with
overlapping and volatile membership to generate
new results Unanticipated Reuse
10Building Global Knowledge Communities
- Teams organised around common goals
- Communities Virtual organisations
- Overlapping memberships, resources and activities
- Essential diversity is a strength challenge
- membership capabilities
- Geographic and political distribution
- No location/organisation/country possesses all
required skills and resources - Dynamic adapt as a function of their situation
- Adjust membership, reallocate responsibilities,
renegotiate resources
Slide derived from Ian Fosters SSDBM 03 keynote
11The Grid
- Grid computing has emerged as an important new
field, distinguished from conventional
distributed computing by its focus on large-scale
resource sharing, innovative applications...we
define the "Grid problemas flexible, secure,
coordinated resource sharing among dynamic
collections of individuals, institutions, and
resources - what we refer to as virtual
organizations - Middleware for establishing, managing and
evolving multi-organisational federations. - On-demand distributed computing
The Anatomy of the Grid Enabling Scalable
Virtual Organizations Foster, Kesselman and Tuecke
12- Implement One from Many
- Virtualization one database, one computer
- Provisioning of work and resources based on
policies and dynamic requirements - Pooling of resources to increase utilization and
sharing - Manage Many as One
- Self-adaptive software that largely tunes and
fixes itself - Unified management and provisioning
Figure courtesy of Ian Foster
13- Virtual Organisations are dynamic, ad hoc,
long lived, heterogeneous and large - Performance, reliability, scalability, fault
tolerance, quality of service, authentication,
authorisation all matter.
Figure courtesy of Ian Foster
14Layers of collaboration
SCIENTISTS
Steer Simulation
Cross-DB Query
INFORMATION
PLUMBING
Data Grid
Data Grid
Compute Grid
Results
DBX Copy 1
DBX Copy 2
DBY XML
DBZ RDBM
(Re) Compute
15Kepler
http//kepler.ecoinformatics.org/
Courtesy Bertram Ludaescher
16Data Grid
Many sources of data, services, computation
Registries organize services of interest to a
community
Figure courtesy of Ian Foster
17Smallpox Grid
http//www.grid.org/projects/smallpox/
- Analysis of 35 million drug compounds against 11
smallpox proteins to try to find a way to stop
the replication of the virus. - Volunteers from over 190 countries donated spare
CPU power at www.grid.org, the world's largest
public computing resource - Contributed over 39,000 years of computing time
in less than six months. - 44 lead molecules identified
United Devices, IBM, Oxford University, Accelrys
18RD Collaboration
http//www.avaki.com/
US
UK
DAS
NAS
Screening App
Screening App
NFS, CIFS
NFS
Avaki
App Files Cached
Avaki
Files
Files
- Users at multiple sites need access to shared
genomic data - Current replication approach is manual, unwieldy,
with high latency (FTP) - Multiple copies introduce data currency
consistency issues - RD on a major new drug is being delayed as
scientists are forced to search for data and to
do rework because of using outdated data - AVAKI provides transparent secure access to
up-to-date production data across a wide area
Germany
Avaki
Cached
Files
Files
Share Files
NFS
Application
Screening App
Courtesy of Andrew Grimshaw at AVAKI
19Smallpox Grid
http//www.astrogrid.org
- Analysis of 35 million drug compounds against
eleven smallpox proteins to try to find a way to
stop the replication of the virus. - Volunteers from over 190 countries donated spare
CPU power at www.grid.org, the world's largest
public computing resource - Contributed over 39,000 years of computing time
in less than six months.
44 lead molecules identified and turned over to
United States Army
Courtesy United Devices
Courtesy of Andy Palmer
20http//www.astrogrid.org
AVAKI Data Grid
US
UK
DAS
NAS
Screening App
Screening App
NFS, CIFS
NFS
Avaki
App Files Cached
Avaki
Files
Files
- Users at multiple sites need access to shared
genomic data - Current replication approach is manual, unwieldy,
with high latency (FTP) - Multiple copies introduce data currency
consistency issues - RD on a major new drug is being delayed as
scientists are forced to search for data and to
do rework because of using outdated data - Transparent secure access to up-to-date
production data across a wide area
Germany
Avaki
Cached
Files
Files
Share Files
NFS
Application
Screening App
Courtesy Andrew Grimshaw
Courtesy of Andy Palmer
21ID MURA_BACSU STANDARD PRT 429
AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE
1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7)
(ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMI
NE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA
OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA
FIRMICUTES BACILLUS/CLOSTRIDIUM GROUP
BACILLACEAE OC BACILLUS. KW PEPTIDOGLYCAN
SYNTHESIS CELL WALL TRANSFERASE. FT ACT_SITE
116 116 BINDS PEP (BY SIMILARITY). FT
CONFLICT 374 374 S -gt A (IN REF.
3). SQ SEQUENCE 429 AA 46016 MW 02018C5C
CRC32 MEKLNIAGGD SLNGTVHISG AKNSAVALIP
ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE
MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI
GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER
LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE
IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP
DRIEAGTFMI
22In silico biology http//www.mygrid.org.uk
Middleware for data intensive in silico biology
by bioinformaticians
23In silico biology http//www.mygrid.org.uk
- Construct in silico experiments, find and adapt
others, manage the experiment lifecycle - Application workflows and web services
- Semantic discovery
- Semantic workflow composition
- Semantic integration of knowledge
- Williams-Beuren Syndrome, Graves Disease,
Trypanosomiasis in cattle. - 2 weeks -gt 2 hours
24http//www.accessgrid.org
- Interactive environments and virtual presence
integrated with Grid middleware, multicast over
IP
- SARS Combat Grid, Taiwan
- Emergency Access Grids
- Integration of patient data
- Integration of models of disease dissemination
- Data mining using compute grid
25(No Transcript)
26Myth Busting
- The Academics-only myth
- 67 companies using or planning to use Grids
(Forrester 2004) - Commercial vendors investing. .
- The Particle Physics-only myth
- Life Sciences and Medicine will dominate because
of their complex organisational, data and
diversity characteristics
27Grid stakeholders and meanings
28http//www.nbirn.net
http//egee-intranet.web.cern.ch/
No ONE Grid. Logical and Physical Grid
configurations
http//www.teragrid.org/
http//www.ngs.ac.uk
29The Computational Grid myth
- Isnt it just High Performance Computing and
cycle stealing? - Most mature kind of Grid.
- A generic mechanism for forming, managing and
disbanding dynamic federations of services - Data integration, data access, data transport,
transaction management, will dominant - Application integration and cooperative
information systems is key - This myth persists in the USA. Everyone else has
gotten over it.
30Service stacks, policies, protocols, standards,
APIs, Reference implementations Globus Tool
Kit, Condor, Unicore Commercial
implementations Avaki, United Devices, Platform
Tools portals, heart beat monitors
Confusagram courtesy of David Snelling, Fujitsu
Europe
31No Architecture myth
- Isnt it just a bag of protocols glued together?
- Actually, it was.
- Stop press - The Grid discovers Service Oriented
Architectures!! (2002) - The Open Grid Service Architecture gives a well
specified middleware stack built on industry
standard web services
32Generation Game
Computationally intensive File access/transfer
Bag of various heterogeneous protocols
toolkits Monolithic design Recognised
internet, ignored Web Academic teams
App-specific Services
Open Grid Services Architecture
Web services
Increased functionality, standardization
Data and knowledge intensive Open services-based
architecture Builds on Web services GGF
OASISW3C Multiple implementations Global Grid
Forum Industry participation
Custom solutions
Time
(adapted from Ian Foster GGF7 Plenary)
33(No Transcript)
34Specific services drug discovery pipeline, sky
surveys, engineering simulations
Grid Applications
Standard services VO forming, semantic data
integration, service discovery, workflow
enactment composition, provenance, portals
Open Grid Service Architecture Tupperware
upper services
Standard services provisioning, data access and
integration, reliable data shipment, workload,
authentician, job execution, replica management,
resource scheduling, brokering and monitoring
Open Grid Service Architecture Underware
plumbing services
Standard interfaces and behaviours for
distributed systems naming, service state,
lifetime management, notification, registry
management
Web Service Resource Framework Web
Service-Notification WS-I
Standard mechanisms for describing and invoking
services WSDL, SOAP, WS-Security etc
Web Services
35WS Resource Framework a basis for agents?
- Resource Addressing
- Reference and Identification of stateful
resources in a Web services context. - Resource Properties
- Modeling of state as an XML document.
- Accessing state WSDL defined interfaces.
- Resource Lifetime
- Management of leases on resource access.
- Create, destroy, expire.
- Service Groups
- Creating and managing aggregations of Web
services. - Base Faults
- Baseline for extensible fault framework.
- Ability to reproduce exception hierarchies, as in
Java. - Patterns for managing service configuration
contexts
Identity Lifetime State Type
36WS Notification
- Publish Subscribe Pattern
- WS Base Notification
- Notification producer and consumer
- Notification subscription
- WS Brokered Notification
- Addition of publisher mechanisms
- Broker role
- WS Topics
- Framework for Topics and Topic spaces in XML.
37Knowledge everywhere
- Declarative specification of services and their
requirements - Classification and discovery of computational and
data resources, codes and models - Encoding performance metrics, service state,
event notification topics, typing service inputs
and outputs, provenance trails access rights to
databases, personal profiles and security
groupings charging infrastructure - Job control semantic integration, workflow
descriptions, resource brokering, resource
scheduling - Problem solving selection and intelligent
portals - GGF WG-CMM, CIM, GIS
38Vision
Reality
39Grid Computing trajectory
Virtual organisations with dynamic access to
unlimited resources
cost
For all
Sharing of apps and know-how
With controlled set of unknown clients
Sharing standard scientific process and data,
sharing of common infrastructure
Between trusted partners
CPU intensive workload Grid as a utility, data
Grids, robust infrastructure
Intra-company, intra community e.g. Life Science
Grid
CPU scavenging
time
40Grid Reality
41Using todays grid
- Obtain frequency spectrum for signal S in
instrument I and timeframe T - User identifies which code generates desired
products, required inputs as files, physical
location of the files, hosts that support
execution given code requirements, availability
of hosts, access policies, etc. - User queries Grid middleware metadata catalog,
replica locator, resource descriptor and
monitoring, etc. - User oversees execution and repair
42Vision and Reality
- The Grid is in the same state as the Web 10 years
ago - Few production grids and not many killer demos -
something you couldnt have done before. - Middleware hard to use and incomplete (and not
invisible!) - OGSA in its infancy.
- Varying degrees of maturity, but people use it
anyway! - Deployment, research, development, applications
and standardisation all happening together - Danger of half-baked solutions, premature
standardisation, a Grid Winter - The Invisible Grid? 10 years?
43Bridging the Gap
- Intelligently manage knowledge
- The explicit representation of metadata semantics
gt knowledge-based Grid services - Semantic-based integration and aggregation of
metadata - Knowledge Representation and Ontologies
- Semantic Grid Services
- Semantic Web, RDF and OWL
- Semantic-based decision making, building and
coordinating dynamic, distributed communities - Re-thinking the architecture of the Grid to be a
cooperative agent-based system - Multi-agent Systems
- Planning and scheduling
44The Semantic Grid is an extension of the current
Grid in which information and services are given
well-defined and explicitly represented meaning,
better enabling computers and people to work in
cooperation Grid with Semantics Intelligent Grid
middleware
45SemanticWeb
SemanticGrid
Scale of Interoperability
ClassicalWeb
ClassicalGrid
Scale of data and computation
Based on an idea by Norman Paton
46Knowledge Representation
47Getting knowledge into the light
- Managing and operating a Grid intelligently
requires the interpretation of knowledge about
the state and properties of Grid components, and
their configurations - Knowledge is already there.
- Its embedded in middleware code, in schemas, in
applications and in practice. - It needs to be explicit, exchangable and machine
processable - Grid people know this.
48(No Transcript)
49The semantics of knowledge
- Semantic Grids
- Grids and Grid middleware that makes use of
semantics for its installation, deployment,
running etc. - I.e. Semantics IN the Grid FOR the Grid.
- Knowledge Grids
- A virtual knowledge base derived by using the
Grid resources, in the same spirit as a data grid
is a virtual data resource and a compute grid a
virtual computer. - Knowledge Grids include services for knowledge
and data mining. - I.e Semantics ON the Grid arising from the USE of
the Grid.
50Coupling Semantic Web and Grid
- Expose the meaning of Grid services, resources
and entities by assertions in a common data
model, Resource Description Framework - Publish and share consensually agreed ontologies
in OWL - Query, filter, integrate and aggregate the
metadata - Reason over metadata to infer more metadata
- Attribute trust to the metadata.
51Enablers for e-Science
52Semantic Web/Grid Services
- Web and Grid services , and workflows require a
semantic-driven description - Semantics is the key to negotiation, discovery
and workflow composition - If you cant describe what you want, you cant
have it - If you cant describe what youve got, no-one
will or can use it
Ontology based service discovery in RDF-based
registry
Ontology based workflow discovery
http//www.mygrid.org.uk
53Semantics in e-Science
Ontology-aided workflow construction
- RDF-based service and data registries
- RDF-based metadata for experimental components
- RDF-based provenance graphs
- OWL based controlled vocabularies for database
content - OWL based integration of experiment entities
RDF-based semantic mark up of results, logs,
notes, data entries
http//www.mygrid.org.uk
54Consuming Semantic Metadataknowledge advisor
integrated with the domain script editor
Horizontal advice on component configuration
Vertical advice on what can be done before and
next
http//www.geodise.org
55Translation Service Unicorelt-gtGLUE
http//www.grid-interoperability.org/
56awareness ofcolleagues presence
BuddySpace
Access Grid Node
virtual meetings
mapping real time discussions/group sense making
NetMeeting
recovering information from meetings
enacting decisions/coordinating activities
synthesising artefacts
I-X planning tools
http//www.aktors.org/coakting/ Courtesy of David
De Roure
57GEON Grid Applications
http//www.geongrid.org/
Courtesy Bertram Ludaescher
58Knowledge aware Grid computing and services.
Grid Computing
Knowledge Management
- Knowledge aware grid services
Grid aware distributed knowledge management.
Replica management for ontologies event
notification for metadata updates, authentication
and authorisation for ontology updates OGSA data
access for RDF repositories metadata update
workflows distributed reasoning.
59Multi-Agent ComputingPlanning and Scheduling
60PegasusDetecting gravitational waves
http//pegasus.isi.edu/
Slide courtesy of Jim Blyth
61Agent
- an encapsulated computer system that is
situated in some environment, and that is capable
of flexible, autonomous action in that
environment in order to meet its design
objectives
62WS-AgreementNegotiating service level agreements
for resource scheduling
AgreementFactory
Agreement 1
Agreement 2
(binds)
63Plumbing for Resource Broker / Scheduler
Slide courtesy of David Snelling, UNICORE
64AstroGridhttp//www.astrogrid.org
Courtesy of Andy Palmer
65 Flexible, decentralized decision making
capabilities.
- Knowledge aware grid services
Grid Computing
Multi-agent Computing
A robust distributed computing platform to
discover, acquire, federate, and manage the
capabilities necessary to execute their
decisions.
Make Grid technologies more agent like and Agent
technologies more Grid like
66Brains and Brawn
- Service architecture
- System management and trouble shooting
- Trust
- Negotiation
- Service composition
- VO formation and management
- System predictability
- Human computer collaboration
- Evaluation
- Semantic integration again
Foster, Jennings and Kesselman, 2004
67Knowledge Management and Ontologies NLP, data
mining, most stuff
Grid Applications
Knowledge management and Ontologies, case based
reasoning Agents people
Open Grid Service Architecture Tupperware
upper services
Knowledge management and ontologies, agent
negotiation, ML, intelligent planning and
scheduling, CBR, constraint modelling
Open Grid Service Architecture Underware
plumbing services
Standard interfaces and behaviours for
distributed systems
Web Service Resource Framework Web
Service-Notification WS-I
Standard mechanisms for describing and invoking
services
Web Services
68myGrid Service stack
Taverna workbench
Web Portal
LSID Launch Pad
Haystack
Apps
e-Science process patterns
e-Science Mediator
e-Science event bus
Service workflow discovery
!
Core services
Metadata management
!
Data management
!
Workflow enactment
!
Web Service (Grid Service) communication fabric
External services
AMBIT Text Extraction Service
Native Web Services
SoapLab
GowLab
Websites
Legacy apps
69Semantic Grid security and trust policies,
management and frameworks
Resource selection scheduling
Ontologies for service classification
Knowledge Representation for Semantic Grid
Services
Semantic interoperability and integration
Semantics in Agent Communication Languages
Workflow and schedule repair
Knowledge-based provenance and audit trails
Semantics for service delegation and knowledge
aggregation
Service Negotiation
Quality of service and service level agreement
management
(Semantic) event notification
Models for quality and accessibility of data
sources, incl. versioning, recoverability, etc.
Lifetime management
Architectures for supporting Semantic Grid
Services
New models for fault tolerance and dependability
(Semantic) Service state
Virtualisation and provisioning of knowledge
service
Audit trails over transient state
Naming
Scaleable service composition for heterogeneous
environments
Service enactment/invocation frameworks
70Building the bridge Pioneers and travellers
71Building Bridges
WWW2002 Waikiki, Hawaii
72What is Grid ?
Courtesy of Eoghan ONeill
73- Knowledge aware Grid services
- Agent negotiation
- Grid complaint knowledge services
- Grid aware distributed knowledge services
74Challenges
75Remarks to Bridge Builders
- Overcoming community divisions
- Growing pains of middleware
- Make it easier not harder or more interesting
- A little semantics goes a long way
- Evolution not revolution deal with reality
- The network effect service providers rule
- Return on investment for service providers and
users - Applications keep it real listen to users
- Activation Energy
- Implementation is not a luxury
76Vision vs Benefits
- You dont have to buy into the visions to benefit
from the technologies - A standard ontology language for interchange
- A little bit of semantics goes a long way
- In the 3 stage project lifecycle
- It will never work.
- It could be useful.
- It was my idea all along.
We are here!
77Summary
- The Grid is a knowledge driven collaboratory.
- The Grid The Semantic Grid.
- Grid applications and middleware looking to
Semantic Web technologies. - Agent frameworks less visible but now is a good
time. - Mutual benefit for all.
78What can the Semantic Grid do for you, and what
can you do for the Semantic Grid? http//www.seman
ticgrid.org
79Acknowledgements
- Semantic Grid Colleagues
- All those names on the slides
- David De Roure, my Semantic Grid GGF co-chair and
co-author - Chris Wroe, Ewa Deelman, Nigel Shadbolt, Marlon
Pierce, Luc Moreau, Sean Bechhofer, Savas
Parastatidis - Colleagues on the myGrid, Geodise and Link-Up
projects, and in the e-Science North West
Regional Centre (ESNW) - Special thanks to Simon Miles and Claire Dixon
for their advice on this presentation - Funders
- EPSRC (Geodise, myGrid and Link-Up)
- UK e-Science Core Programme (EPSRC/DTI) (ESNW)