Title: Everything you always wanted to know about the Grid and never dared to ask
1Everything you always wanted to know about the
Grid and never dared to ask
- Tony Hey and Geoffrey Fox
2Outline
- Lecture 1 Origins of the Grid The Past to the
Present (TH) - Lecture 2 Web Services, Globus, OGSA and the
Architecture of the Grid (GF) - Lecture 3 Data Grids, Computing Grids and P2P
Grids (GF) - Lecture 4 Grid Functionalities Metadata,
Workflow and Portals (GF) - Lecture 5 The Future of the Grid - e-Science to
e-Business (TH)
3Lecture 1
- Origins of the Grid
- The Past to the Present
- Grid Computing Book
- Chs 1,2,3,4,5,6,36
4Lecture 5
- The Future of the Grid
- e-Science to e-Business
- Grid Computing Book
- Chs 38, 39, 40,41,42,43
5Lecture 5
- e-Science Research and the Future of Scientific
Research - Computer Science Research Issues
- A Business Case for the Grid
- Concluding Remarks
6e-Science and the Future of Scientific Research
- e-Science will change the dynamic of the way
science is undertaken. - John Taylor, 2001
-
7Integrated e-Science Environment
Framework for distributed scientific computing
and experimentation
8e-Science Examples
- Particle Physics
- Virtual Observatories
- e-Engineering
- e-Chemistry
- Bioinformatics
- High-Throughput Applications
- e-Health
9DAME Project
In flight data
Global Network eg SITA
Ground Station
Airline
DSS Engine Health Center
Maintenance Centre
Internet, e-mail, pager
Data centre
10Comb-e-Chem
Structure Properties
Knowledge Prediction
Structures DB
Properties DB
Simulation and calculation
11Combinatorial Chemistry
- Parallel synthetic approach
- create hundreds of materials
- screen properties to find those that fit the bill
- Typically requires several passes
- find chemical structure of the best candidates
- create new batches of similar materials for
subsequent passes - Leads to explosive growth in
- volume of data generated
- potential to exploit this data
12Array production of different chemical species
13Well plate with typically 96 or 384 cells
Library synthesis
Mass Spec
Raman
databases
Structure and properties analysis
x-ray
High throughput systems
- 100,000s compounds at a time analysis
- Produces huge amounts of complex data
14Remote equipment, multiple users, few experts
Data control links
Access Grid links
Remote (Dark) Laboratory
- Model for National crystallographic Service NCS
15NCS Workflow
Send sample material to NCS service
Search materials database and predict properties
using Grid computations
Download full data on materials of interest
Collaborate in e-Lab experiment and obtain
structure
16NCS Portal Access
17NCS Experimental Services
18NCS Lab Service
19myGrid Project
- Imminent deluge of data
- Highly heterogeneous
- Highly complex and inter-related
- Convergence of data and literature archives
20myGrid Generic Technologies
- Database access from the Grid
- Process enactment on the Grid
- Personalisation services
- Metadata services
- Development of Agent Services
-
- Ultimate goal is to put Grid Services together
- with Ontologies to develop Semantic Grid
21Workflow
- Know how.
- Associate base resources with derived data.
- Keep, describe, find, compare, protect, share.
- Repeat/reuse/re-enact
- Specialise/Customise/Personalise
- Evolution notification, knowledge
- Quality best practice
- Need the workflows to be effective
- good experimental practice.
1
2
3
4
22Personalisation
- Dynamic creation of personal data sets
- Personal views over repositories
- Personalisation of workflows
- Personal notification
- Annotation of datasets and workflows
- Personalisation of service descriptions what I
think the service does
1
2
3
4
23Provenance
- Who, what, where, why, when, how?
- The traceability of knowledge as it is evolves
and as it is derived. - Identity the Life Sciences ID
- Lab Books, Methods in papers.
- Immutable Metadata
- Migration travels with its data but may not be
stored with it. - Private vs Shared provenance records.
- Ownership/credit
1
2
3
4
24Discovery Net Project
25Discovery Process Management
- Workflow Service Composition Discovery
Pathway - Towards a Standard Workflow Representation for
Discovery Informatics Discovery Process Markup
Language (DPML) - Discovery Pathway Construction Recording and
managing a collaboratively-built discovery
process - Distributed Service Composition Components
organsied by the workflow can be executing
anywhere - Discovery Pathway as Key Intellectual Property
Discovery Processes can be stored, reused,
audited, refined and deployed in various forms
D-Net Workflow for Genome Annotation 16
services executing across Internet
26Dynamic Integration Services
- Dynamic Application Integration On-demand
access and composition of remote analysis
components - Towards a Dynamic Component Integration
- Knowledge Servers allow users to register,
locate and remotely execute components - Execution Servers allow users to control the
execution of components distributed environments - Easy Maintenance New components can be added
through a clean API
Clustering
Classification
Text analysis
Gene function perdition
D-NET API
Promoter Prediction
Homology Search
27Case Study SC2002 HPC Challenge
D-Net based Global Collaborative Real- Time
Genome Annotation
High Throughput Sequencers
Nucleotide-level Annotation
Genome Annotation
Protein-level Annotation
Process-level Annotation
15 DBs
21 Applications
28How It Works
Interactive Editor Visualisation
Nucleotide Annotation Workflows
Download sequence from Reference Server
Save to Distributed Annotation Server
- 500 Web access
- 1800 clicks
- 200 copy/paste
- 3 weeks work
- in 1 workflow and few second execution
29(No Transcript)
30eDiamond
Applications of SMF
Teleradiology and QC VirtualMammo
Training and Differential Diagnosis Find one
like it
?
Advanced CAD SMF-CAD workstation
Epidemiology SMFcomputed breast density
31Image guided interventions
Images Courtesy Derek Hill Guys Hospital
32Image guided interventions (2)
Images Courtesy Guys Hospital
33Surgical verificationAccuracy of surgical
placement against plan
- Surgeon plans on X-ray or CT, uses database of
prostheses - Operation takes place using plan as guidance
- Post operative X-ray evaluated for accuracy of
placement - Data stored and used for short term assessment
and long term evaluation studies
Courtesy of Ian Revie Depuy International
34- UK e-Science projects emphasize data federation
and integration as much as computation - Metadata and ontologies key to higher level Grid
services - e-Science projects will produce a deluge of
scientific data that will need to be annotated
and curated in scientific data digital
libraries
35Databases in the Grid
Semantic Web
Data Complexity
Classical Grid
Classical Web
Computational Complexity
36OGSA DAI Project
- Key middleware project for UK Program
- - Total Budget 3M (CP 1.5M)
- Three Centres involved
- - Edinburgh, Manchester and Newcastle
-
- Industrial partners
- - IBM US, IBM Hursley and Oracle UK
- Goal is to develop high-quality data-centric
middleware
37OGSA DAI Project
- Design Specification completed
- Papers for GGF WG on Database Access and
Integration Services - Alpha versions delivered
- Distributed Query Service
- XML Database Interface
- Relational Database Interface
- Beta versions by April 2003
- Integrate with Globus GT3 release
38e-Science and the Future of Scientific Research
- e-Science will change the dynamic of the way
science is undertaken. - John Taylor, 2001
- Need to break down the barriers between the
Victorian bastions of science biology,
chemistry, physics, . - Develop permeable structures that promote
rather than hinder multidisciplinary
collaboration - Engage Computing Services and Libraries in
developing a new e-Science support service on
Campus -
39e-Science and Computer Science
- The lesson of the Web
- The Semantic Grid
- The myGrid project
- The Discovery Net Project
- Computer Science Research and the Grid
40Error 404 Page not found
- If you want the Web to scale,
- You must allow the links to fail
- Wendy Hall after Tim Berners-Lee
- HTML as the Fortran of Hypertext!
41 Semantic Web
42Metadata Ontologies
- Metadata computationally accessible data about
the services - Ontologies the shared and common understanding
of a domain - A vocabulary of terms
- Definition of what those terms mean.
- A shared understanding for people and machines
- Usually organised into a taxonomy.
43Reasoning in DAMLOIL
- Consistency check if knowledge is meaningful
- Subsumption structure knowledge, compute
classification - Equivalence check if two classes denote same
set of instances - Instantiation check if individual instance of
class C - Retrieval retrieve set of individuals that
instantiate C
44Computer Science Challengesfrom e-Science
- UK CS Team led by Tom Rodden identified 4 major
research challenges arising from e-Science - - Developing a Semantic Grid
- - Trusted Ubiquitous Systems
- - Rapid Customized Assembly of Services
- - Autonomic Computing
-
45Towards a Semantic Grid
- Trace provenance from initial data to information
and knowledge structures - Techniques to allow scalable reasoning over
uncertain/incomplete knowledge - Tools for design, development and deployment of
large-scale ontologies - Support for semantic-directed knowledge discovery
to complement data-mining - Development of flexible network-based reasoning
and decision support services
46Trusted Ubiquitous Systems
- New theories to model, specify and analyse trust
in distributed ubiquitous systems - New quality of service and service-based models
for ubiquitous systems - New design guidelines and practices to enable the
development of reusable trusted components - New understanding of the practical engineering
trade-offs required to realise trusted ubiquitous
systems
47Rapid Customised Assembly of Services
- New theories to describe and reason about
semantics and behaviour of services and
compositional effects - Agent and service representations that promote
adaptability and emergent, opportunistic and
implicit arrangement of services - New tools to support the discovery, composition
and use of services based on high-level
description of requirements - Techniques to support directed automatic
composition, decomposition and recomposition of
services
48Autonomic Computing
- Techniques to analyze, describe and reason about
adaptive systems - Management of semi-autonomous systems with
policies, services and software agents - Interoperability and reasoning across and between
different autonomous domains - Modeling and measurement of performance of QoS
for autonomic structures - Techniques to capture and represent history,
context and environment
49IBM Autonomic Computing Vision
50A Business Case for the Grid
- Total Cost of Ownership TCO
- Value of Open Standards
- Industrial Applications
- Time to exploitation
- e-Utilities
51Current IT Environment
Distributed, Heterogeneous, Complex
52Current IT Environment
Distributed, Heterogeneous, Complex
Complexity, TCO
Tech. Cost, Utilization
53Server / Storage Utilization
24-hour Period Utilization
Prime-shift Utilization
Peak-hour Utilization
60
70
85-100
Mainframes
lt10
10-15
50-70
UNIX
2-5
5-10
30
Intel-based
52
N/A
N/A
Storage
Source IBM Scorpion White Paper Simplifying the
Corporate IT Infrastructure, 2000
54Total Cost of Ownership TCO
Much More than Hardware and Software Costs
IT Budgets
Hardware
10.0
Software
32
12.0
Hardware
Software
Personnel
16
Maintenance
Integration
Personnel
Personnel
16.0
16.0
30
Maintenance
30.0
55Grid Computing Sales Pitch
Storage
Operating System
I/O
Data
Processing
Applications
Distributed Computing Over a Network, Using Open
Standards to Enable Heterogeneous Operations
56Grid Technology Enables
- Increased Server Utilization
- Workload Management and Consolidation
- Reduced Cycle Times
- Collaboration and Access to Data
- Federation of Data
- Global Distribution
- Resilient/Highly Available Infrastructure
- Business Continuity
- Recovery and Failover
Supporting Heterogeneous Resources Through
Open Standards.
57Increased Server Utilization
- Exploit distributed resources to provide
capacity for high-demand applications - Existing applications that cannot be run
effectively on a single processor - New large scale application that provide
strategic business advantages - Reduce infrastructure cost associated with
over-provisioned resources - Balance workload based on policies
- Optimize for cost or throughput
- Reduce the cost of manpower to manage and
configure resources - Fewer resources to manage for the same workload
58Collaboration and Access to Data
- Enable collaboration across applications to
integrate results - Leverage Distributed Data and Resources
- Support large multi-disciplinary collaborations
- Link Business Processes
- Federation of Data
- Both within a single organization and between
partners - Exploit Replication Services Across Enterprises
Design Analytics
Design
Pricing
Design
Simulation
59The Value of Open Standards
Distributed Computing Grid (Globus -gt OGSA)
Applications Web Services (SOAP, WSDL, UDDI)
Information World-wide Web (html, http, j2ee,
xml)
Communications e-mail (pop3,SMTP,Mime)
Networking The Internet (TCP/IP)
60- Sun and the Grid
- Grid Computing is one of the three next big
things for Sun and our customers - Ed Zander, COO
- Microsoft and the Grid
- The alignment of OGSA with XML Web services is
important because it will make Internet-scale,
distributed Grid Computing possible - Robert Wahbe, General Manager of Web Services
61(No Transcript)
62(No Transcript)
63Industry Applications
Unique by Industry with Common Characteristics
Manufacturing
Financial Services
LS/ Bioinformatics
Govt Education
Energy
Telco Media
Primary Focus
64Globalization Grid Butterfly.net
- Unlimited Numbers of Players
- Distributed Artificial Intelligence
- Multiple Concurrent Players
- 1,000 downloads of developers kit per week
- Hot-swappable Components
- Developers, Publishers, ESPs
65(No Transcript)
66HP, the Grid and e-Utilities
- The Grid fabric for e-Utilities will be
- Soft malleable, multi-purpose
- Dynamic resources will be constantly changing
- Federated global structure not owned by any
single authority - Heterogeneous from supercomputer clusters to
PCs - John Manley, HP Labs
67Timescales for Exploitation?
- IBM see early adopters of Grid technology
coming from pharmaceutical, engineering and
petrochemical sectors - UK program confirms this picture (AstraZeneca,
GSK, Merck, Pfizer, Rolls Royce, BAESystems,
Schlumberger) - IBM see Grid middleware being adopted by more
mainstream commerce and industry in 2003/2004
timeframe
68(No Transcript)
69Status of the Grid
- Today - early adoption phase - just like the
Web in the early days - Industry now selling IntraGrid solutions
- Genuine Virtual Organisation InterGrid
middleware not yet mature - Tomorrow - sophisticated combinations of
services to locate information, applications to
process it, and computer systems to run them - Autonomic Middleware infrastructure capable of
supporting Virtual Organisations, c-Commerce and
e-Utilities will take time!
70e-Government and the Grid
-
- The Grid intends to make access to computing
power, scientific data repositories and
experimental facilities as easy as the Web makes
access to information. - Tony Blair, 2002
71Acknowledgements
- With thanks to
- Gerd Breiter, Phillipe Bricard, David Boyd,
- Jens Jensen, Daron Green, Mike Brady,
- Derek Hill, Carole Goble, Yike Guo,
- Jeremy Frey, Bill Johnston, Ray Browne,
- Jim Fleming, Anne Trefethen and many others
72(No Transcript)