IBM UK - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

IBM UK

Description:

e-Science driving Disruptive Technology. Economic impact, Mobile Code, Decomposition ... Summer schools. Workshops. Training. Community building. Company ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 35
Provided by: MalcolmA1
Category:
Tags: ibm | driving | game | school

less

Transcript and Presenter's Notes

Title: IBM UK


1
IBM TCG Symposium Prof. Malcolm
Atkinson Director www.nesc.ac.uk 21st May
2003
2
Outline
  • What is e-Science?
  • UK e-Science
  • UK e-Science Roles and Resources
  • UK e-Science Projects
  • NeSC e-Science Institute
  • Events
  • Visitors
  • International Programme
  • Scientific Data Curation
  • Data Access Integration
  • Data Analysis Interpretation
  • e-Science driving Disruptive Technology
  • Economic impact, Mobile Code, Decomposition
  • Global infrastructure, optimisation management
  • Dont care where computing

3
What is e-Science?
4
Foundation for e-Science
  • e-Science methodologies will rapidly transform
    science, engineering, medicine and business
  • driven by exponential growth (1000/decade)
  • enabling a whole-system approach

sensor nets
5
Convergence Ubiquity
Multi-national, Multi-discipline,
Computer-enabled Consortia, Cultures Societies
New Opportunities, New Results, New Rewards
6
UCSF
UIUC
From Klaus Schulten, Center for Biomollecular
Modeling and Bioinformatics, Urbana-Champaign
7
global in-flight engine diagnostics
100,000 engines 2-5 Gbytes/flight 5 flights/day
2.5 PB/day
Distributed Aircraft Maintenance Environment
Universities of Leeds, Oxford, Sheffield York
8
Tera ? Peta Bytes
  • RAM time to move
  • 15 minutes
  • 1Gb WAN move time
  • 10 hours (1000)
  • Disk Cost
  • 7 disks 3500 (SCSI)
  • Disk Power
  • 100 Watts
  • Disk Weight
  • 5.6 Kg
  • Disk Footprint
  • Inside machine
  • RAM time to move
  • 2 months
  • 1Gb WAN move time
  • 14 months (1 million)
  • Disk Cost
  • 6800 Disks 490 units 32 racks 4.7 million
  • Disk Power
  • 100 Kilowatts
  • Disk Weight
  • 33 Tonnes
  • Disk Footprint
  • 60 m2

Now make it secure reliable!
May 2003 Approximately Correct
9
e-Science in the UK
10
UK 2000 Spending Review
From presentation by Tony Hey
11
Additional UK e-Science Funding
  • First Phase 2001 2004
  • Application Projects
  • 74M
  • All areas of science and engineering
  • gt60 Projects
  • 340 at first All Hands Mtg
  • Core Programme
  • 35M
  • Collaborative industrial projects
  • 80 Companies
  • gt 30 Million
  • Second Phase 2003 2006
  • Application Projects
  • 96M
  • All areas of science and engineering
  • Core Programme
  • 16M 25M (?)
  • Core Grid Middleware

EU money ! 40M Janet upgrade HPC(x) 55M
12
e-Science and SR2002
  • 2004-6 2001-4
  • MRC 13.1M (8M)
  • BBSRC 10.0M (8M)
  • NERC 8.0M (7M)
  • EPSRC 18.0M (17M)
  • HPC 2.5M (9M)
  • Core Prog. 16.2M ? (15M) 20M
  • PPARC 31.6M (26M)
  • ESRC 10.6M (3M)
  • CLRC 5.0M (5M)

13
National e-Science Centre
14
NeSC in the UK
You are here
Edinburgh
Glasgow
Newcastle
Belfast
Manchester
Daresbury Lab
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
15
www.nesc.ac.uk
16
UK Grid Operational Heterogeneous
  • Currently a Level-2 Grid based on Globus Toolkit
    2
  • Transition to OGSI/OGSA will prove worthwhile
  • There are still issues to be resolved
  • OGSA definition / delivery
  • Hosting environments Platforms
  • Combinations of Services supported
  • Material and grids to support adopters
  • A schedule of transitions should be
    (approximately provisionally) published
  • Expected time line
  • Now GT2 L2 service GT3 M/W development
    evaluation
  • Q3 Q4 2003 GT2 L3 GT3 L1
  • Q1 Q2 2004 significant project transitions to
    GT3 L2/L3
  • Late Q4 2004 most projects have transitioned
    end GT2 L3

17
e-Science Institute
18
e-Science Institute Past Programme of Events
  • Planned 6 two-week research workshops / year
  • Actually ran 48 events in first 12 months!
  • Highlights
  • GGF5, HPDC11 and a cluster of workshops
  • Protein Science, Neuroinformatics,
  • Major training events
  • Steve Tuecke Grid Globus (2)
  • Web Services, DiscoveryLink, Relational DB
    design,
  • e-SI Clientele and Outreach (year 1)
  • gt 2600 individuals
  • From gt 500 organisations
  • 236 speakers
  • Many participants return frequently

19
Data Access Integration
20
Biology Medicine
  • Extensive Research Community
  • gt1000 per research university
  • Extensive Applications
  • Many people care about them
  • Health, Food, Environment
  • Interacts with virtually every discipline
  • Physics, Chemistry, Nanoengineering,
  • 450 Databases relevant to bioinformatics
  • Heterogeneity, Interdependence, Complexity,
    Change,
  • Wonderful Scientific Questions
  • How does a cell work?
  • How does a brain work?
  • How does an organism develop?
  • Why is the biosphere so stable?
  • What happens to the biosphere when the earth
    warms up?

1 petabyte digital data / hospital / year
21
Database Growth
PDB Content Growth
22
ODD-Genes
PSE
23
Scientific Data
  • Challenges
  • Data Huggers
  • Meagre metadata
  • Ease of Use
  • Optimised integration
  • Dependability
  • Opportunities
  • Global Production of Published Data
  • Volume? Diversity?
  • Combination ? Analysis ? Discovery
  • Opportunities
  • Specialised Indexing
  • New Data Organisation
  • New Algorithms
  • Varied Replication
  • Shared Annotation
  • Intensive Data Computation
  • Challenges
  • Fundamental Principles
  • Approximate Matching
  • Multi-scale optimisation
  • Autonomous Change
  • Legacy structures
  • Scale and Longevity
  • Privacy and Mobility

24
Infrastructure Architecture
Data Intensive X Scientists

Data Intensive Applications for Science X

Simulation, Analysis Integration Technology for
Science X

Generic Virtual Data Access and Integration Layer

OGSA










OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources

Distributed

Virtual Integration Architecture
25
Draft Specification for GGF 7
26
Disruptive e-Science Drivers?
27
Mohammed Mountains
  • Petabytes of Data cannot be moved
  • It stays where it is produced or curated
  • Hospitals, observatories, European Bioinformatics
    Institute,
  • Distributed collaborating communities
  • Expertise in curation, simulation analysis
  • Distributed diverse data collections
  • Discovery depends on insights
  • Tested by combining data from many sources
  • Using sophisticated models algorithms
  • What can you do?

28
Move computation to the data
  • Assumption code size ltlt data size
  • Develop the database philosophy for this?
  • Queries are dynamically re-organised bound
  • Develop the storage architecture for this?
  • Compute on disk? (SoC space on disk chips??)
  • Safe hosting of arbitrary computation
  • Proof-carrying code for data and compute
    intensive tasks robust hosting environments
  • Provision combined storage compute resources
  • Decomposition of applications
  • To ship behaviour-bounded sub-computations to
    data
  • Co-scheduling co-optimisation
  • Data Code (movement), Code execution
  • Recovery and compensation

29
Software Changes
  • Integrated Problem Solving Environments
  • Users application developers see
  • Abstract computer and storage system
  • Where and how things are executed can be ignored
  • Diversity, detail, ownership, dependability, cost
  • Explicit and visible
  • Increasing sophistication of description
  • Metadata for discovery
  • Metadata for management and optimisation
  • Applications developed dynamically by composition
  • Mobile, Safe Re-organisable Code
  • Predictable behaviour
  • Decomposition re-composition
  • New programming languages understanding needed

30
Organisational Cultural Changes
  • Access to Computation Data must be simple
  • All use a computational, semantic, data-rich web
  • Responsibility of data publishers
  • Cost, dependability, trustworthy, capable,
    flexibility,
  • Shared contributions compose indefinitely
  • Knowledge accumulation and interdependence
  • Contributor recognition and IPR
  • Complexity and management of infrastructure
  • Always on
  • Must be sustained
  • Paid for
  • Hidden

Health, Energy, Finance, Government , Education
Games _at_ Home
31
Comments Questions Please
www.ogsadai.org.uk
www.nesc.ac.uk
32
Extra slides?
33
DAI basic Services
34
DAIT basic Services

1a. Request to Registry for
sources of data about x
Data

y

Registry

1b. Registry

responds with

Factory handle

2a. Request to Factory for access and

integration from resources Sx and Sy

Factory

2c. Factory

returns handle of GDS to client

3b. Client
2b. Factory creates

tells

GridDataServices network

analyst

Client

3a. Client submits sequence of

scripts each has a set of queries

GDTS

to GDS with XPath, SQL, etc

1
XML
Analyst

GDS

GDTS

database

GDS

2
S

x
GDS

S

y
3c. Sequences of result sets returned to

Relational
analyst as formatted binary described in

GDTS

GDS

GDS

2
3
a standard XML notation

database

1
GDS

GDTS
Write a Comment
User Comments (0)
About PowerShow.com