Everything you always wanted to know about the Grid and never dared to ask - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Everything you always wanted to know about the Grid and never dared to ask

Description:

e-Science will change the dynamic of the way science is undertaken.' John Taylor, 2001 ... Gateway. Hub Server Group. Local. Director. Network. Business Logic ... – PowerPoint PPT presentation

Number of Views:384
Avg rating:3.0/5.0
Slides: 73
Provided by: ajh86
Category:

less

Transcript and Presenter's Notes

Title: Everything you always wanted to know about the Grid and never dared to ask


1
Everything you always wanted to know about the
Grid and never dared to ask
  • Tony Hey and Geoffrey Fox

2
Outline
  • Lecture 1 Origins of the Grid The Past to the
    Present (TH)
  • Lecture 2 Web Services, Globus, OGSA and the
    Architecture of the Grid (GF)
  • Lecture 3 Data Grids, Computing Grids and P2P
    Grids (GF)
  • Lecture 4 Grid Functionalities Metadata,
    Workflow and Portals (GF)
  • Lecture 5 The Future of the Grid - e-Science to
    e-Business (TH)

3
Lecture 1
  • Origins of the Grid
  • The Past to the Present
  • Grid Computing Book
  • Chs 1,2,3,4,5,6,36

4
Lecture 5
  • The Future of the Grid
  • e-Science to e-Business
  • Grid Computing Book
  • Chs 38, 39, 40,41,42,43

5
Lecture 5
  • e-Science Research and the Future of Scientific
    Research
  • Computer Science Research Issues
  • A Business Case for the Grid
  • Concluding Remarks

6
e-Science and the Future of Scientific Research
  • e-Science will change the dynamic of the way
    science is undertaken.
  • John Taylor, 2001

7
Integrated e-Science Environment
Framework for distributed scientific computing
and experimentation
8
e-Science Examples
  • Particle Physics
  • Virtual Observatories
  • e-Engineering
  • e-Chemistry
  • Bioinformatics
  • High-Throughput Applications
  • e-Health

9
DAME Project
In flight data
Global Network eg SITA
Ground Station
Airline
DSS Engine Health Center
Maintenance Centre
Internet, e-mail, pager
Data centre
10
Comb-e-Chem
Structure Properties
Knowledge Prediction
Structures DB
Properties DB
Simulation and calculation
11
Combinatorial Chemistry
  • Parallel synthetic approach
  • create hundreds of materials
  • screen properties to find those that fit the bill
  • Typically requires several passes
  • find chemical structure of the best candidates
  • create new batches of similar materials for
    subsequent passes
  • Leads to explosive growth in
  • volume of data generated
  • potential to exploit this data

12
Array production of different chemical species
13
Well plate with typically 96 or 384 cells
Library synthesis
Mass Spec
Raman
databases
Structure and properties analysis
x-ray
High throughput systems
  • 100,000s compounds at a time analysis
  • Produces huge amounts of complex data

14
Remote equipment, multiple users, few experts
Data control links
Access Grid links
Remote (Dark) Laboratory
  • Model for National crystallographic Service NCS

15
NCS Workflow
Send sample material to NCS service
Search materials database and predict properties
using Grid computations
Download full data on materials of interest
Collaborate in e-Lab experiment and obtain
structure
16
NCS Portal Access
17
NCS Experimental Services
18
NCS Lab Service
19
myGrid Project
  • Imminent deluge of data
  • Highly heterogeneous
  • Highly complex and inter-related
  • Convergence of data and literature archives

20
myGrid Generic Technologies
  • Database access from the Grid
  • Process enactment on the Grid
  • Personalisation services
  • Metadata services
  • Development of Agent Services
  • Ultimate goal is to put Grid Services together
  • with Ontologies to develop Semantic Grid

21
Workflow
  • Know how.
  • Associate base resources with derived data.
  • Keep, describe, find, compare, protect, share.
  • Repeat/reuse/re-enact
  • Specialise/Customise/Personalise
  • Evolution notification, knowledge
  • Quality best practice
  • Need the workflows to be effective
  • good experimental practice.

1
2
3
4
22
Personalisation
  • Dynamic creation of personal data sets
  • Personal views over repositories
  • Personalisation of workflows
  • Personal notification
  • Annotation of datasets and workflows
  • Personalisation of service descriptions what I
    think the service does

1
2
3
4
23
Provenance
  • Who, what, where, why, when, how?
  • The traceability of knowledge as it is evolves
    and as it is derived.
  • Identity the Life Sciences ID
  • Lab Books, Methods in papers.
  • Immutable Metadata
  • Migration travels with its data but may not be
    stored with it.
  • Private vs Shared provenance records.
  • Ownership/credit

1
2
3
4
24
Discovery Net Project
25
Discovery Process Management
  • Workflow Service Composition Discovery
    Pathway
  • Towards a Standard Workflow Representation for
    Discovery Informatics Discovery Process Markup
    Language (DPML)
  • Discovery Pathway Construction Recording and
    managing a collaboratively-built discovery
    process
  • Distributed Service Composition Components
    organsied by the workflow can be executing
    anywhere
  • Discovery Pathway as Key Intellectual Property
    Discovery Processes can be stored, reused,
    audited, refined and deployed in various forms

D-Net Workflow for Genome Annotation 16
services executing across Internet
26
Dynamic Integration Services
  • Dynamic Application Integration On-demand
    access and composition of remote analysis
    components
  • Towards a Dynamic Component Integration
  • Knowledge Servers allow users to register,
    locate and remotely execute components
  • Execution Servers allow users to control the
    execution of components distributed environments
  • Easy Maintenance New components can be added
    through a clean API

Clustering
Classification
Text analysis
Gene function perdition
D-NET API
Promoter Prediction
Homology Search
27
Case Study SC2002 HPC Challenge
D-Net based Global Collaborative Real- Time
Genome Annotation
High Throughput Sequencers
Nucleotide-level Annotation
Genome Annotation
Protein-level Annotation
Process-level Annotation
15 DBs
21 Applications
28
How It Works
Interactive Editor Visualisation
Nucleotide Annotation Workflows
Download sequence from Reference Server
Save to Distributed Annotation Server
  • 500 Web access
  • 1800 clicks
  • 200 copy/paste
  • 3 weeks work
  • in 1 workflow and few second execution

29
(No Transcript)
30
eDiamond
Applications of SMF
Teleradiology and QC VirtualMammo
Training and Differential Diagnosis Find one
like it
?

Advanced CAD SMF-CAD workstation
Epidemiology SMFcomputed breast density
31
Image guided interventions
Images Courtesy Derek Hill Guys Hospital
32
Image guided interventions (2)
Images Courtesy Guys Hospital
33
Surgical verificationAccuracy of surgical
placement against plan
  • Surgeon plans on X-ray or CT, uses database of
    prostheses
  • Operation takes place using plan as guidance
  • Post operative X-ray evaluated for accuracy of
    placement
  • Data stored and used for short term assessment
    and long term evaluation studies

Courtesy of Ian Revie Depuy International
34
  • UK e-Science projects emphasize data federation
    and integration as much as computation
  • Metadata and ontologies key to higher level Grid
    services
  • e-Science projects will produce a deluge of
    scientific data that will need to be annotated
    and curated in scientific data digital
    libraries

35
Databases in the Grid
Semantic Web
Data Complexity
Classical Grid
Classical Web
Computational Complexity
36
OGSA DAI Project
  • Key middleware project for UK Program
  • - Total Budget 3M (CP 1.5M)
  • Three Centres involved
  • - Edinburgh, Manchester and Newcastle
  • Industrial partners
  • - IBM US, IBM Hursley and Oracle UK
  • Goal is to develop high-quality data-centric
    middleware

37
OGSA DAI Project
  • Design Specification completed
  • Papers for GGF WG on Database Access and
    Integration Services
  • Alpha versions delivered
  • Distributed Query Service
  • XML Database Interface
  • Relational Database Interface
  • Beta versions by April 2003
  • Integrate with Globus GT3 release

38
e-Science and the Future of Scientific Research
  • e-Science will change the dynamic of the way
    science is undertaken.
  • John Taylor, 2001
  • Need to break down the barriers between the
    Victorian bastions of science biology,
    chemistry, physics, .
  • Develop permeable structures that promote
    rather than hinder multidisciplinary
    collaboration
  • Engage Computing Services and Libraries in
    developing a new e-Science support service on
    Campus

39
e-Science and Computer Science
  • The lesson of the Web
  • The Semantic Grid
  • The myGrid project
  • The Discovery Net Project
  • Computer Science Research and the Grid

40
Error 404 Page not found
  • If you want the Web to scale,
  • You must allow the links to fail
  • Wendy Hall after Tim Berners-Lee
  • HTML as the Fortran of Hypertext!

41
Semantic Web
42
Metadata Ontologies
  • Metadata computationally accessible data about
    the services
  • Ontologies the shared and common understanding
    of a domain
  • A vocabulary of terms
  • Definition of what those terms mean.
  • A shared understanding for people and machines
  • Usually organised into a taxonomy.

43
Reasoning in DAMLOIL
  • Consistency check if knowledge is meaningful
  • Subsumption structure knowledge, compute
    classification
  • Equivalence check if two classes denote same
    set of instances
  • Instantiation check if individual instance of
    class C
  • Retrieval retrieve set of individuals that
    instantiate C

44
Computer Science Challengesfrom e-Science
  • UK CS Team led by Tom Rodden identified 4 major
    research challenges arising from e-Science
  • - Developing a Semantic Grid
  • - Trusted Ubiquitous Systems
  • - Rapid Customized Assembly of Services
  • - Autonomic Computing

45
Towards a Semantic Grid
  • Trace provenance from initial data to information
    and knowledge structures
  • Techniques to allow scalable reasoning over
    uncertain/incomplete knowledge
  • Tools for design, development and deployment of
    large-scale ontologies
  • Support for semantic-directed knowledge discovery
    to complement data-mining
  • Development of flexible network-based reasoning
    and decision support services

46
Trusted Ubiquitous Systems
  • New theories to model, specify and analyse trust
    in distributed ubiquitous systems
  • New quality of service and service-based models
    for ubiquitous systems
  • New design guidelines and practices to enable the
    development of reusable trusted components
  • New understanding of the practical engineering
    trade-offs required to realise trusted ubiquitous
    systems

47
Rapid Customised Assembly of Services
  • New theories to describe and reason about
    semantics and behaviour of services and
    compositional effects
  • Agent and service representations that promote
    adaptability and emergent, opportunistic and
    implicit arrangement of services
  • New tools to support the discovery, composition
    and use of services based on high-level
    description of requirements
  • Techniques to support directed automatic
    composition, decomposition and recomposition of
    services

48
Autonomic Computing
  • Techniques to analyze, describe and reason about
    adaptive systems
  • Management of semi-autonomous systems with
    policies, services and software agents
  • Interoperability and reasoning across and between
    different autonomous domains
  • Modeling and measurement of performance of QoS
    for autonomic structures
  • Techniques to capture and represent history,
    context and environment

49
IBM Autonomic Computing Vision
50
A Business Case for the Grid
  • Total Cost of Ownership TCO
  • Value of Open Standards
  • Industrial Applications
  • Time to exploitation
  • e-Utilities

51
Current IT Environment
Distributed, Heterogeneous, Complex
52
Current IT Environment
Distributed, Heterogeneous, Complex
Complexity, TCO
Tech. Cost, Utilization
53
Server / Storage Utilization
24-hour Period Utilization
Prime-shift Utilization
Peak-hour Utilization
60
70
85-100
Mainframes
lt10
10-15
50-70
UNIX
2-5
5-10
30
Intel-based
52
N/A
N/A
Storage
Source IBM Scorpion White Paper Simplifying the
Corporate IT Infrastructure, 2000
54
Total Cost of Ownership TCO
Much More than Hardware and Software Costs
IT Budgets
Hardware
10.0
Software
32
12.0
Hardware
Software
Personnel
16
Maintenance
Integration
Personnel
Personnel
16.0
16.0
30
Maintenance
30.0
55
Grid Computing Sales Pitch
Storage
Operating System
I/O
Data
Processing
Applications
Distributed Computing Over a Network, Using Open
Standards to Enable Heterogeneous Operations
56
Grid Technology Enables
  • Increased Server Utilization
  • Workload Management and Consolidation
  • Reduced Cycle Times
  • Collaboration and Access to Data
  • Federation of Data
  • Global Distribution
  • Resilient/Highly Available Infrastructure
  • Business Continuity
  • Recovery and Failover

Supporting Heterogeneous Resources Through
Open Standards.
57
Increased Server Utilization
  • Exploit distributed resources to provide
    capacity for high-demand applications
  • Existing applications that cannot be run
    effectively on a single processor
  • New large scale application that provide
    strategic business advantages
  • Reduce infrastructure cost associated with
    over-provisioned resources
  • Balance workload based on policies
  • Optimize for cost or throughput
  • Reduce the cost of manpower to manage and
    configure resources
  • Fewer resources to manage for the same workload

58
Collaboration and Access to Data
  • Enable collaboration across applications to
    integrate results
  • Leverage Distributed Data and Resources
  • Support large multi-disciplinary collaborations
  • Link Business Processes
  • Federation of Data
  • Both within a single organization and between
    partners
  • Exploit Replication Services Across Enterprises

Design Analytics
Design
Pricing
Design
Simulation
59
The Value of Open Standards
Distributed Computing Grid (Globus -gt OGSA)
Applications Web Services (SOAP, WSDL, UDDI)
Information World-wide Web (html, http, j2ee,
xml)
Communications e-mail (pop3,SMTP,Mime)
Networking The Internet (TCP/IP)

60
  • Sun and the Grid
  • Grid Computing is one of the three next big
    things for Sun and our customers
  • Ed Zander, COO
  • Microsoft and the Grid
  • The alignment of OGSA with XML Web services is
    important because it will make Internet-scale,
    distributed Grid Computing possible
  • Robert Wahbe, General Manager of Web Services

61
(No Transcript)
62
(No Transcript)
63
Industry Applications
Unique by Industry with Common Characteristics
Manufacturing
Financial Services
LS/ Bioinformatics
Govt Education
Energy
Telco Media
Primary Focus
64
Globalization Grid Butterfly.net
  • Unlimited Numbers of Players
  • Distributed Artificial Intelligence
  • Multiple Concurrent Players
  • 1,000 downloads of developers kit per week
  • Hot-swappable Components
  • Developers, Publishers, ESPs

65
(No Transcript)
66
HP, the Grid and e-Utilities
  • The Grid fabric for e-Utilities will be
  • Soft malleable, multi-purpose
  • Dynamic resources will be constantly changing
  • Federated global structure not owned by any
    single authority
  • Heterogeneous from supercomputer clusters to
    PCs
  • John Manley, HP Labs

67
Timescales for Exploitation?
  • IBM see early adopters of Grid technology
    coming from pharmaceutical, engineering and
    petrochemical sectors
  • UK program confirms this picture (AstraZeneca,
    GSK, Merck, Pfizer, Rolls Royce, BAESystems,
    Schlumberger)
  • IBM see Grid middleware being adopted by more
    mainstream commerce and industry in 2003/2004
    timeframe

68
(No Transcript)
69
Status of the Grid
  • Today - early adoption phase - just like the
    Web in the early days
  • Industry now selling IntraGrid solutions
  • Genuine Virtual Organisation InterGrid
    middleware not yet mature
  • Tomorrow - sophisticated combinations of
    services to locate information, applications to
    process it, and computer systems to run them
  • Autonomic Middleware infrastructure capable of
    supporting Virtual Organisations, c-Commerce and
    e-Utilities will take time!

70
e-Government and the Grid
  • The Grid intends to make access to computing
    power, scientific data repositories and
    experimental facilities as easy as the Web makes
    access to information.
  • Tony Blair, 2002

71
Acknowledgements
  • With thanks to
  • Gerd Breiter, Phillipe Bricard, David Boyd,
  • Jens Jensen, Daron Green, Mike Brady,
  • Derek Hill, Carole Goble, Yike Guo,
  • Jeremy Frey, Bill Johnston, Ray Browne,
  • Jim Fleming, Anne Trefethen and many others

72
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com