5th Feb 03 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

5th Feb 03

Description:

Band Wagon v Research Opportunity. Thresholds, Visions and Questions. 3. Essentials of e-Science ... Text, digital media, structured, organised & curated data, ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 35
Provided by: Malc154
Category:
Tags: 5th | feb | wagon

less

Transcript and Presenter's Notes

Title: 5th Feb 03


1
The Challenge of Data Integration Data Grid
Discovery? Prof. Malcolm Atkinson Director w
ww.nesc.ac.uk 22nd January 2003
2
Overview
  • Essentials of e-Science
  • Collaboration
  • Resource Sharing
  • Data Sharing
  • Mutual Dependence
  • Essentials of the Grid
  • Distributed Virtual Machine?
  • Essentials of Data Sharing
  • Database Research did it?
  • New Challenges
  • Data Access Integration Building Bricks
  • Band Wagon v Research Opportunity
  • Thresholds, Visions and Questions

3
Essentials of e-Science
What happened?
4
UK e-Science Programme (1)2001 - 2003
DG Research Councils
Grid TAG
Total US 200 M
E-Science Steering Committee
Director
Directors Awareness and Co-ordination Role
Directors Management Role
Generic Challenges EPSRC (15m), DTI (15m)
Over 3 years
Academic Application Support Programme Research
Councils (74m), DTI (5m) PPARC (26m) BBSRC
(8m) MRC (8m) NERC (7m) ESRC (3m) EPSRC
(17m) CLRC (5m)
80m Collaborative projects
Plus gt 90M for HPCx 6 years
Industrial Collaboration (40m)
5
UK e-Science
150 Million e-Science 55Million HPCx
From presentation by Tony Hey
6
UK e-Science Investment
Nationale-Science Centre
Edinburgh
Glasgow
Newcastle
Belfast
  • Projects
  • gt 60 started
  • gt 30 proposed
  • EU Projects

Manchester
Daresbury Lab
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
7
UK e-Science Programme (2)2003 - 2005
DG Research Councils
Grid TAG
Total gt 150 M
E-Science Steering Committee
Director
Directors Awareness and Co-ordination Role
Directors Management Role
Generic Challenges EPSRC (15m), DTI (15m)
Over 2 years
Academic Application Support Programme Research
Councils (74m), DTI (5m) PPARC (26m) BBSRC
(8m) MRC (8m) NERC (7m) ESRC (3m) EPSRC
(17m) CLRC (5m)
80m Collaborative projects
Industrial Collaboration (40m)
8
Essentials of e-Science
Why its Happening
9
Collaboration Growing
What's New? Scale At a Distance Instantaneous
Dynamic
  • Hard Problems, Multi-disciplinary, Expense
  • Sharing
  • Ideas
  • Thought processes and Stimuli
  • Effort
  • Resources
  • Requires
  • Communication
  • Common understanding Framework
  • Mechanisms for sharing fairly
  • Organisation and Infrastructure

Requires Trust
Scientists have done this for Centuries
10
Collaboration Growing
Text, digital media, structured, organised
curated data, annotation, computable models,
visualisation, shared instruments, shared
systems, shared administration,
  • Data, Policy Digital Infrastructure Key
  • Sharing
  • Ideas
  • Thought processes and Stimuli
  • Effort
  • Resources
  • Requires
  • Communication
  • Common understanding Framework
  • Mechanisms for sharing fairly
  • Organisation and Infrastructure

Changing the ways Science is done
Nationally Internationally Distributed,
Routine, Daily, Automated,
That Requires very Significant Investment in
DigitalSystems and their Support
11
Collaboration Growing
  • Digital Communication, Metadata,
  • Sharing
  • Ideas
  • Thought processes and Stimuli
  • Effort
  • Resources
  • Requires
  • Communication
  • Common understanding Framework
  • Mechanisms for sharing fairly
  • Organisation and Infrastructure

Digital networks, digital work-places, digital
instruments,
Metadata, ontologies, standards, shared curated
data, shared codes,
Common platforms, shared software, shared
training,
Authentication, Authorisation, Accounting,
Provenance, Policies,
Shared Provision of Platform,
The Grid SHOULD make this much easier
by providing a common, supported high-level of
Software and Organisational infrastructure
12
Interdependence
  • Science has relied on experiment and theory
  • Simulation, Data Mining, Analysis

Plus Moor's Law ...
Experiment - Italy 1,500 AD
Computing Science at the Party
13
Interdependence
14
Database Growth
PDB protein structures
15
The Grid
What's happening?
16
Globus Toolkit History
Does not include downloads fromNMI, UK
eScience, EU Datagrid,IBM, Platform, etc.
GT 2.0 Released
Physiology of the Grid Paper Released
Anatomy of the Grid Paper Released
Significant Commercial Interest in Grids
The Grid Blueprint for a New Computing Infrastru
cture published
NSF European Commission Initiate Many New Grid
Projects
Early Application Successes Reported
GT 1.0.0 Released
NASA begins funding Grid work,DOE adds support
17
Encompassing Vision
18
People Industry
  • Global Grid Forum
  • GGF2 260 Jul 01
  • GGF3 220 Oct 01
  • GGF4 400 Feb 02
  • GGF5 900 Jul 02
  • GGF6 450 Oct 02
  • GGF7 gt1000 Mar 03
  • UK All Hands
  • AHM02 350 Sep 02
  • GlobusWorld
  • 1 450 Jan 03
  • IBM This week
  • IBM DRIVES GRID COMPUTING FOR COMMERCIAL
    BUSINESS WITH TEN NEW GRID OFFERINGS
  • Targets
  • Financial, Life Sciences
  • Automotive Aerospace
  • Governments
  • Partners
  • Platform, DataSynapse
  • Avaki, Entropia
  • United Devices
  • IBM last 20 months
  • Leaders of OGSI
  • Development teams
  • Grid Jamboree
  • GGF

This is a Global Phenomenum
19
The Grid
What is it?
20
High-Altitude Views
  • A Rallying Cry
  • Meeting a Hard Challenge requires Many Minds
  • Operating Maintaining Infrastructure requires
    Many Hands Many Companies
  • Another Stab at Distributed Computing
  • Hard Challenge Intellectually and Practically
    Important
  • Dependable Ubiquity over Heterogeneity
    Fallibility

All Views Significant
  • An Ambitious Virtual Machine
  • Consistent large scale computational environments
  • A Global Operating System
  • Collective Resources, Common Management

21
An Architectural View
Application
Application Platform Developers
Common Application Platform for Group of
Applications
Grid Plumbing Security Infrastructure
Operations Teams
Providers
22
Open Grid Services Infrastructure
  • Confluence of Web Services Grid
  • Consistent Interface Description
  • Based on WSDL 1.2 proposal
  • Extend Properties
  • Separate Binding from Interface
  • Function Composition Inheritence
  • Exploit WS Investment
  • Grid Features
  • Security
  • Life-Time Management
  • Service (state) Information via Data Elements
  • Discovery
  • Grouping
  • Notification
  • OGSI Version 1 Proposal at GGF7 (March 03)

Open, Strongly Led Design Process
Multiple Development Efforts
Open Source Alpha Release Jan 03
23
Open Grid Services Architecture
  • Ubiquitous Building Blocks
  • Using OGSI Platform
  • Open Extensible
  • Encourage Refactoring Experiments
  • Initially
  • The Globus 2 model
  • Except State Information now distributed
  • Example New Features
  • Global Name Mapping Service
  • Replication and Caching Service
  • Data Access Integration
  • Metering, Logging, Authorisation, Charging,

Many Open Issues
24
Grid Challenge
  • Balancing Direct Access to the Platforms with
    Abstraction Virtualisation
  • Developers often have exploitable application
    knowledge
  • Automation necessary helpful
  • Interface matching, operation validation,
  • Optimisation at many scales
  • There isnt enough effort to develop Languages
    Abstractions

Needs CS Research!
25
Data Sharing
What's needed?
26
Data Integration
Repeat until Hypothesis Tested
Scientist with Idea
Repeat for next Hypothesis
27
Wellcome Trust Cardiovascular Functional
Genomics
28
OGSA-DAI Partners
IBM USA
EPCC NeSC
Glasgow
Newcastle
Belfast
Manchester
Daresbury Lab
Cambridge
Oxford
EPCC NeSCIBM UK IBM USA Manchester
e-SC Newcastle e-SCOracle
Oracle
Hinxton
RAL
Cardiff
London
IBM Hursley
Southampton
3 million, 18 months, started February 2002
29
OGSA-DAI Data Access and Integration for the New
Grid
30
DAI Key Services
GridDataService GDS Access to data DB
operations GridDataServiceFactory GDSF Makes GDS
GDSF GridDataServiceRegistry GDSR Discovery of
GDS(F) Data GridDataTranslationService GDTS Tra
nslates or Transforms Data GridDataTransportDepot
GDTD Data transport with persistence
Integrated Structured Data Transport Relational
XML models supported Role-based
Authorisation Binary structured files (later)
31
DAI Architecture
32
1a. Request to Registry for sources of data about
x
1b. Registry responds with Factory handle
2a. Request to Factory for access to database
2c. Factory returns handle of GDS to client
2b. Factory creates GridDataService to manage
access
3a. Client queries GDS with XPath, SQL, etc
3c. Results of query returned to client as XML
3b. GDS interacts with database
33
1a. Request to Registry for sources of data about
x y
1b. Registry responds with Factory handle
2a. Request to Factory for access and
integration to databases
34
Biomedical (or ANY) Data
  • Opportunities
  • Global Production of Published Data
  • Volume? Diversity?
  • Combination ? Analysis ? Discovery
  • Challenges
  • Data Huggers
  • Meagre metadata
  • Ease of Use
  • Automated, optimised integration
  • Traceability, Dependability
  • Opportunities
  • Specialised Indexing
  • Structurally varied replication
  • Consistent Structured Universe of Discourse
  • Data Computation Integration
  • Challenges
  • Approximate Matching
  • Multi-scale optimisation
  • Bad habits / industrial structures
  • Safety and Multi-scale optimisation

35
Data Integration Challenges
  • High-Level Languages
  • Describing the Data Extraction Recipes
  • Describing the Sources Components
  • Metadata that drives automation validation
  • Mobility
  • Code Data
  • Integrating Existing DB technology
  • Moving the DBMS to the Grid context
  • New Optimisation Challenges
  • Data Computation Storage Movement
  • Shared Distributed Annotation Systems
  • How to Reference
  • Provenance Acknowledgement

36
What Do We Do Now?
Address the Challenges?
37
Challenges
  • A Programming Development Model
  • Dependability at this Scale
  • Foundations for Trust
  • Raising the Level of Automation
  • Supporting New Forms of
  • Collaboration
  • Data

Opportunities for All
38
Questions Answers
Write a Comment
User Comments (0)
About PowerShow.com