Title: Tony Hey
1- Tony Hey
- Director of UK
- e-Science Programme
- Tony.Hey_at_epsrc.ac.uk
2e-Science and the Grid
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. - e-Science will change the dynamic of the way
science is undertaken. - John Taylor
- Director General of Research
Councils - Office of Science and Technology
3NASAs IPG
- The vision for the Information Power Grid is to
promote a revolution in how NASA addresses
large-scale science and engineering problems by
providing persistent infrastructure for - highly capable computing and data management
services that, on-demand, will locate and
co-schedule the multi-Center resources needed to
address large-scale and/or widely distributed
problems - the ancillary services that are needed to support
the workflow management frameworks that
coordinate the processes of distributed science
and engineering problems
4IPG Baseline System
MCAT/SRB
MDS CA
Boeing
DMF
O2000
cluster
MDS
EDC
GRC
O2000
NGIX
CMU
NREN
ARC
NCSA
GSFC
LaRC
JPL
O2000
cluster
SDSC
NTON-II/SuperNet
MSFC
MDS
O2000
JSC
KSC
5Multi-disciplinary Simulations
Wing Models
- Lift Capabilities
- Drag Capabilities
- Responsiveness
Stabilizer Models
Airframe Models
- Deflection capabilities
- Responsiveness
Crew Capabilities - accuracy - perception -
stamina - re-action times - SOPs
Engine Models
- Braking performance
- Steering capabilities
- Traction
- Dampening capabilities
- Thrust performance
- Reverse Thrust performance
- Responsiveness
- Fuel Consumption
Landing Gear Models
Whole system simulations are produced by
couplingall of the sub-system simulations
6Multi-disciplinary Simulations
National Air Space Simulation Environment
Stabilizer Models
GRC
44,000 Wing Runs
50,000 Engine Runs
Airframe Models
66,000 Stabilizer Runs
LaRC
ARC
Virtual National Air Space VNAS
22,000 Commercial US Flights a day
22,000 Airframe Impact Runs
- FAA Ops Data
- Weather Data
- Airline Schedule Data
- Digital Flight Data
- Radar Tracks
- Terrain Data
- Surface Data
Simulation Drivers
48,000 Human Crew Runs
132,000 Landing/ Take-off Gear Runs
(Being pulled together under the NASA
AvSP Aviation ExtraNet (AEN)
Landing Gear Models
Many aircraft, flight paths, airport operations,
and the environment are combined to get a virtual
national airspace
7The Grid as an Enabler for Virtual Organisations
- Ian Foster and Carl Kesselman Take 2
- The Grid is a software infrastructure that
enables flexible, secure, coordinated resource
sharing among dynamic collections of individuals,
institutions and resources - - includes computational systems and data
storage resources and specialized facilities - Enabling infrastructure for transient Virtual
Organisations
8Globus Grid Middleware
- Single Sign-On
- Proxy credentials, GRAM
- Mapping to local security mechanisms
- Kerberos, Unix, GSI
- Delegation
- Restricted proxies
- Community authorization and policy
- Group membership, trust
- File-based
- GridFTP gives high performance FTP integrated
with GSI
9US Grid Projects
- NASA Information Power Grid
- DOE Science Grid
- NSF National Virtual Observatory
- NSF GriPhyN
- DOE Particle Physics Data Grid
- NSF Distributed Terascale Facility
- DOE ASCI Grid
- DOE Earth Systems Grid
- DARPA CoABS Grid
- NEESGrid
- DOH BIRN
- NSF iVDGL
10EU GridProjects
- DataGrid (CERN, ..)
- EuroGrid (Unicore)
- DataTag (TTT)
- Astrophysical Virtual Observatory
- GRIP (Globus/Unicore)
- GRIA (Industrial applications)
- GridLab (Cactus Toolkit)
- CrossGrid (Infrastructure Components)
- EGSO (Solar Physics)
11National Grid Projects
- UK e-Science Grid
- Japan Grid Data Farm, ITBL
- Netherlands VLAM, PolderGrid
- Germany UNICORE, Grid proposal
- France Grid funding approved
- Italy INFN Grid
- Eire Grid proposals
- Switzerland - Grid proposal
- Hungary DemoGrid, Grid proposal
- ApGrid
12UK e-Science Initiative
- 120M Programme over 3 years
- 75M is for Grid Applications in all areas of
science and engineering - 10M for Supercomputer upgrade
- 35M for development of industrial strength
Grid middleware - Require 20M additional matching funds
from industry
13UK e-Science Grid
Edinburgh
Glasgow
Newcastle
DL
Belfast
Manchester
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
14Generic Grid Middleware
- All e-Science Centres donating resources to form
a UK national Grid - Supercomputers, clusters, storage, facilities
- All Centres will run same Grid Software
- - Starting point is Globus, Storage Resource
Broker and Condor - Work with Global Grid Forum and major computing
companies (IBM, Oracle, Microsoft, Sun,.) - Aim to industry harden Grid software to be
capable of realizing secure VO vision
15IBM Grid Press Release 2/8/01
- Interview with Irving Wladawsky-Berger
- Grid computing is a set of research management
services that sit on top of the OS to link
different systems together - We will work with the Globus community to build
this layer of software to help share resources - All of our systems will be enabled to work with
the grid, and all of our middleware will
integrate with the software
16Particle Physics and Astronomy e-Science Projects
- GridPP
- links to EU DataGrid, CERN LHC Computing
Project, U.S. GriPhyN and PPGrid Projects, and
iVDGL Global Grid Project - AstroGrid
- links to EU AVO and US NVO projects
17GridPP Project (1)
- CERN LHC Machine due to be completed by 2006
- ATLAS/CMS Experiments each involve more than 2000
physicists from more than 100 organisations in
USA, Europe and Asia - Within first year of operation from 2006 (7?)
each project will need to store, access, process
and analyze 10 PetaBytes of data - Use hierarchical Tiers of data and compute
Centres providing 200 Tflop/s
18GridPP Project (2)
- LHC Datavolume expected to reach 1 Exabyte and
require several PetaFlop/s compute power by 2015 - Use simulated data production and analysis from
2002 - 5 complexity data challenge (Dec 2002)
- 20 of the 2007 CPU and 100 complexity (Dec
2005) - Start of LHC operation 2006 (7?)
- Testbed Grid deployments from 2001
19IRC e-HealthCare Grand Challenge
- Equator Technological innovation in physical and
digital life - AKT Advanced Knowledge Technologies -
- DIRC Dependability of Computer-Based Systems
- From Medical Images and Signals to Clinical
Information
20EPSRC e-Science Projects (1)
- Comb-e-ChemStructure-Property Mapping
- Southampton, Bristol, Roche, Pfizer, IBM
- DAME Distributed Aircraft Maintenance
Environment - York, Oxford, Sheffield, Leeds, Rolls Royce
- Reality Grid A Tool for Investigating Condensed
Matter and Materials - QMW, Manchester, Edinburgh, IC, Loughborough,
Oxford, Schlumberger,
21EPSRC e-Science Projects (2)
- My Grid Personalised Extensible Environments for
Data Intensive in silico Experiments in Biology - Manchester, EBI, Southampton, Nottingham,
Newcastle, Sheffield, GSK, Astra-Zeneca, IBM - GEODISE Grid Enabled Optimisation and Design
Search for Engineering - Southampton, Oxford, Manchester, BAE, Rolls Royce
- Discovery Net High Throughput Sensing
Applications - Imperial College, Infosense,
22Comb-e-ChemStructure-Property Mapping
- Goal is to integrate structure and property data
sources within knowledge environment to find new
chemical compounds with desirable properties - Accumulate, integrate and model extensive range
of primary data from combinatorial methods - Support for provenance and automation including
multimedia and metadata - Southampton, Bristol, Cambridge Crystallographic
Data Centre - Roche Discovery, Pfizer, IBM
23MyGrid An e-Science Workbench
- Goal is to develop workbench to support
- Experimental process of data accumulation
- Use of community information
- Scientific collaboration
- Provide facilities for resource selection, data
management and process enactment - Bioinformatics applications
- Functional genomics, database annotation
- Manchester, EBI, Newcastle,Nottingham, Sheffield,
Southampton - GSK, AstraZeneca, Merck, IBM, Sun, ...
24Grid Database Requirements (1)
- Scalability
- Store Petabytes of data at TB/hr
- Low response time for complex queries to retrieve
data for more processing - Large number of clients needing high access
throughput - Grid Standards for Security, Accounting, ..
- GSI with digital certificates
- Data from multiple DBMS
- Co-schedule database and compute servers
25Grid Database Requirements (2)
- Handling Unpredictable Usage
- Most existing DB applications have reasonably
predictable access patterns and usage ond DB
resources can be restricted - Typical commercial applications generate large
numbers of small transactions from large number
of users - Grid applications can have small number of large
transactions needing more ad hoc access to DBMS
resources - much greater variations in time and resource usage
26Grid Database Requirements (3)
- Metadata-driven access
- Expect need 2-step access to data
- Step 1 Metadata search to locate required data
on one or more DBMS - Step 2 Data accessed, sent to compute server for
further analysis - Application writer does not know which specific
DBMS accessed in Step 2 - Need standard API for Grid-enabled DBMS
- Multiple Database Integration
- - Support distributed queries and transactions
- - Scalability requirements
27 Grid-Service Interface to DBs (Thoughts of Paul
Watson)
- Services would include
- Metadata
- Used by location and directory services
- Query
- Use GridFTP, support streaming and computation
co-scheduling? - Transactions
- Support distributed transactions via Virtual DBMS?
28Grid-Service Interface to DBs(continued)
- Bulk Loading
- Use Grid FTP?
- Scheduling
- Allow DBMS and Compute resource to be
co-scheduled and bandwidth pre-allocated - Major challenge for DBMS to support resource
pre-allocation and management? - Accounting
- Provide information for Grid accounting and
capacity planning
29- Application projects use Clusters,
Supercomputers, Data Repositories - Emphasis on support for data federation and
annotation as much as computation - Metadata and ontologies key to higher level Grid
services - For commercial success Grid needs to have
interface to DBMS