UK e-Science Future Infrastructure for Scientific Data Mining, Integration and Visualisation - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

UK e-Science Future Infrastructure for Scientific Data Mining, Integration and Visualisation

Description:

UK eScience Future Infrastructure for Scientific Data Mining, Integration and Visualisation – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 52
Provided by: Malc154
Category:

less

Transcript and Presenter's Notes

Title: UK e-Science Future Infrastructure for Scientific Data Mining, Integration and Visualisation


1
UK e-Science Future Infrastructure for
Scientific DataMining, Integration and
Visualisation Malcolm Atkinson Director of
National e-Science Centre www.nesc.ac.uk 25th
October 2002 SDMIV workshop, e-Science
InstituteEdinburgh
2
Overview
  • UK e-Science
  • Reminder of Investment and Infrastructure
  • International e-Science
  • Examples and Collaboration
  • Data Access and Integration
  • Lego Bricks for Scientific Application Developers
  • Tailored Application and Computing Scientists
  • A Computer Scientists Christmas List
  • Diversity and Opportunity
  • The Way Ahead

3
e-Science
  • Fundamentally about Collaboration
  • Sharing
  • Ideas
  • Thought processes and Stimuli
  • Effort
  • Resources
  • Requires
  • Communication
  • Common understanding Framework
  • Mechanisms for sharing fairly
  • Organisation and Infrastructure

Requires Trust
Scientists (Biologists) have done this for
Centuries
4
e-Science (take 2)
Text, digital media, structured, organised
curated data, computable models, visualisation,
shared instruments, shared systems, shared
administration,
  • Fundamentally about Collaboration
  • Sharing
  • Ideas
  • Thought processes and Stimuli
  • Effort
  • Resources
  • Requires
  • Communication
  • Common understanding Framework
  • Mechanisms for sharing fairly
  • Organisation and Infrastructure

Changing the ways Science is done
Nationally Internationally Distributed,
Routine, Daily, Automated,
That Requires very Significant Investment in
DigitalSystems and their Support
5
e-Science (take 3)
  • Fundamentally about Collaboration
  • Sharing
  • Ideas
  • Thought processes and Stimuli
  • Effort
  • Resources
  • Requires
  • Communication
  • Common understanding Framework
  • Mechanisms for sharing fairly
  • Organisation and Infrastructure

Digital networks, digital work-places, digital
instruments,
Metadata, ontologies, standards, shared curated
data, shared codes,
Common platforms, shared software, shared
training,
Citation, Authentication, Authorisation,
Accounting, Provenance, Policies,
Shared Provision of Platform,
The Grid SHOULD make this much easier
by providing a common, supported high-level of
Software and Organisational infrastructure
6
Grid Expectations
  • Persistence
  • Always there, Always Working, Always Supported
  • Stability
  • You can build on foundations that dont move
  • Trustworthy Predictable
  • Honours commitments
  • Digital policies, digital contracts, security,
  • Data integrity, longevity and accessibility
  • Performance
  • High-level Extensible
  • The capabilities you need are already there
  • Ubiquitous
  • Your collaborators use it

7
Grid Reality
Political, Economic Technical issues to Solve
  • Persistence
  • Always there, Always Working, Always Supported
  • Stability
  • You can build on foundations that dont move
  • Trustworthy Predictable
  • Honours commitments
  • Digital policies, digital contracts, security,
  • Data integrity, longevity and accessibility
  • Performance
  • High-level Extensible
  • The capabilities you need are already there
  • Ubiquitous
  • Your collaborators use it

Early days but Open Grid Services link with Web
Services GGF standardisation
Only Show in Town
Not yet but very substantial global effort to
achieve this
Good basis for extension Commitment to basic
functionality WS Community effort
Global Industrial Rallying Cry Must work with
Web Services
8
UK Grid Network
Nationale-Science Centre
Edinburgh
Glasgow
Newcastle
Access Grid always-on video walls
Belfast
Manchester
Daresbury Lab
Cambridge
Oxford
Hinxton
RAL
Cardiff
London
Southampton
9
SuperJanet4, June 2002
20Gbps
10Gbps
Scotland via Glasgow
Scotland via Edinburgh
2.5Gbps
622Mbps
WorldCom Glasgow
WorldCom Edinburgh
155Mbps
NNW
NorMAN
YHMAN
WorldCom Manchester
WorldCom Leeds
Northern Ireland
EMMAN
MidMAN
WorldCom Reading
WorldCom London
EastNet
TVN
External Links
WorldCom Bristol
WorldCom Portsmouth
South Wales MAN
LMN
SWAN BWEMAN
Kentish MAN
Tony Hey July 2001
LeNSE
10
National e-Science Centre
  • Events
  • Workshops
  • Research Meetings
  • International Meetings
  • History of Events
  • GGF5
  • HPDC11
  • Summer school
  • gt 50 workshops held
  • gt 1000 people in total
  • Many return often
  • Planned Events
  • 25 workshops
  • Conferences to 2005
  • Visitors
  • 3 arrived
  • 4 arranged
  • International collaboration, visits visitors
  • China
  • Argonne National Lab
  • SDSC
  • NCSA
  • Centre Projects
  • Pilot Projects
  • Regional Support
  • Research Projects
  • EPSRC, MRC, WT, SHEFC

Please use this Facility
11
A day in the life of NeSC
12
UCSF
UIUC
From Klaus Schulten, Center for Biomollecular
Modeling and Bioinformatics, Urbana-Champaign
13
DataGrid Testbed
(gt40)
Testbed Sites
Dubna
Moscow
Lund
Estec KNMI
RAL
Berlin
IPSL
Prague
Paris
Brno
CERN
Lyon
Santander
Milano
Grenoble
PD-LNL
Torino
Madrid
Marseille
BO-CNAF
Pisa
Lisboa
Barcelona
ESRIN
Roma
Valencia
Catania
Francois.Etienne_at_in2p3.fr - Antonia.Ghiselli_at_cnaf.
infn.it
14
A Simplified Grid Anatomy
Scientific Application
Application Developers
Grid Plumbing Security Infrastructure
Operations Team
Owners
15
The Crux
Keep all the (pink)groups HAPPY
Scientific Application
Application Developers
Grid Plumbing Security Infrastructure
Operations Team
Owners
16
A SDMIV Grid Anatomy
SDMIV Users
Scientific Application
Grid Plumbing Security Infrastructure
Data Providers Data Curators
17
Database Growth
PDB protein structures
18
Data MiningScience vs Commerce
  • Data in files FTP a local copy /subset.ASCII or
    Binary.
  • Each scientist builds own analysis toolkit
  • Analysis is tcl script of toolkit on local data.
  • Some simple visualization tools x vs y
  • Data in a database
  • Standard reports for standard things.
  • Report writers for non-standard things
  • GUI tools to explore data.
  • Decision trees
  • Clustering
  • Anomaly finders

Jim Gray UCSC April 2002
19
Butsome science is hitting a wallFTP and GREP
are not adequate
  • You can GREP 1 MB in a second
  • You can GREP 1 GB in a minute
  • You can GREP 1 TB in 2 days
  • You can GREP 1 PB in 3 years.
  • Oh!, and 1PB 10,000 disks
  • At some point you need indices to limit
    search parallel data search and analysis
  • This is where databases can help
  • You can FTP 1 MB in 1 sec
  • You can FTP 1 GB / min ( 1 /GB)
  • 2 days and 1K
  • 3 years and 1M

50,000 Kg 250 KW 60 Racks 120m2
Jim Gray UCSC April 2002
20
OGSA OGSI
Grid Technology
Web Services
www.gridforum.org/ogsi-wg www.gridforum.org/ogsa-w
g www.gridforum.org/
21
Web Services
  • Rapid Integration
  • Dynamic binding
  • Commercial Power
  • Financial Political
  • Independence
  • Client from Service
  • Service from Client
  • Separation
  • Function from Delivery
  • Description
  • WSDL, WSC, WSEF,
  • Tools Platforms
  • Java ONE, Visual .NET
  • WebSphere, Oracle,

www. w3c. org / TR / SOAP or TR/wsdl
22
Grid Technology
  • Virtual Organisations
  • Sharing Collaboration
  • Security
  • Single Sign in, delegation
  • Distribution fast FTP
  • But Various Protocols
  • Resource Mangement
  • Discovery
  • Process Creation
  • Scheduling
  • Monitoring
  • Portability
  • Ubiquitous APIs Modules
  • Govnmt Agency Buy in
  • Industrial Buy in

Foster, I., Kesselman, C. and Tuecke, S., The
Anatomy of the Grid Enabling Virtual
Organisations, Intl. J. Supercomputer
Applications, 15(3), 2001 http//www.gridforum.org
/ogsi-wg
23
Open Grid Services Architecture
Industrial Commitment
Foster, I., Kesselman, C., Nick, J. and Tuecke,
S., The Physiology of the Grid An Open Grid
Services Architecture for Distributed Systems
Integration
24
Scientific Data
  • Deluge of Data
  • Exponential growth
  • Doubling timesAstronomy 12 monthsBio-Sequences
    9 monthsFunctional Genomics 6 monthsBytes/dollar
    12 to 18 months
  • Not How big it is but

25
Scientific Data
  • Deluge of Data
  • Exponential growth
  • Doubling timesAstronomy 12 monthsBio-Sequences
    9 monthsFunctional Genomics 6 monthsBytes/dollar
    12 to 18 months
  • Not How big it is but
  • What you do with it
  • Sharing
  • Curation
  • Metadata
  • Automated movement, access integration
  • Computational Access

26
Scientific Data
  • Deluge of Data
  • Exponential growth
  • Doubling timesAstronomy 12 monthsBio-Sequences
    9 monthsFunctional Genomics 6 monthsBytes/dollar
    12 to 18 months
  • Not How big it is but
  • How you Embrace Manage Change
  • The Database is a Knowledge chest
  • The Database is a Communication Hub
  • Autonomously Managed (Curated) change
  • An Essential part of e-BioMedical, Astronomical,
    , Science Engineering

Data Federation Integration is Hard
27
Wellcome Trust Cardiovascular Functional
Genomics
28
Data Access Integration
  • Central to e-ScienceAstronomy, Earth Sciences,
    Ecology, Biology, Medicine,
  • Collaboration
  • Shared Databases
  • Curated Knowledge
  • Accumulated Observations
  • Accumulated Simulations
  • Computation
  • Data mining
  • Input to models
  • Calibration of models
  • Presentation
  • Publication of results
  • Visualisation

29
GGF DAIS WG
  • Chairs
  • Norman Paton (Manchester Uni.)
  • Leanne Guy (CERN)
  • Dave Pearson (Oracle UK)
  • Activity
  • BoF GGF4 Toronto
  • WG Meeting GGF5 Edinburgh
  • Papers for GGF6
  • Workshops Mail lists
  • Goals
  • Agree Standards for Database Access Integration
  • Freely available reference implementations
  • OGSA-DAI one source focus for discussions

Norman Paton, Inderpal Narang, Leanne Guy, Susan
Maliaka, Greg Ricardi,
http//www.cs.man.ac.uk/grid-db/
30
OGSA-DAI project
  • Lego kit for Data Access Integration
  • Components for e-Science Applications
  • Accelerated Application Development
  • Multiple Data Models
  • Distributed Data
  • Access via Grid Proxies
  • Integration, Translation Transformation
  • Open Source Reference Implementation
  • For DAIS-WG standard
  • Trigger for Component Construction
  • Start a community

31
OGSA-DAI Partners
IBM USA
EPCC NeSC
Glasgow
Newcastle
Belfast
Manchester
Daresbury Lab
Oxford
EPCC NeSCIBM UK IBM USA Manchester
e-SC Newcastle e-SCOracle
Oracle
RAL
Cardiff
London
IBM Hursley
Southampton
3 million, 18 months, started February 2002
32
Primary Components
33
Advanced Components
34
Composed Components
35
Composing Components
OGSA-DAIComponent
Data Transport
OGSA-DAIComponent
Data Transport
OGSA-DAIComponent
Data Transport
Data Transport
36
DAI Key Components
GridDataService GDS Access to data DB
operations GridDataServiceFactory GDSF Makes GDS
GDSF GridDataServiceRegistry GDSR Discovery of
GDS(F) Data GridDataTranslationService Translat
es or Transforms Data GridDataTransportDepot GDTD
Data transport with persistence
Relational XML models supported Role-based
Authorisation Binary structured files
37
OGSA Relationship
38
DAI portType Usage
39
Distributed Query
40
OGSA-DAI Time Line
WS GSI UK support ( gt 100 downloads)
XML OGSA Prototypes for Early Adopters
Design Documents Demos for DAIS WG _at_ GGF5
XML OGSA Prototype Available
RDB GT2 / OGSA Prototypes Available
GGF6 WG Papers Prototypes
Ship Alpha Release for GT3 Integration
Presentation Beta _at_ GGF7
Productisation, RAMPS Extension
Feb 02
May 02
Jul 02
Sep 02
Dec 02
Feb 03
May 03
Sep 03
Phase 2 Starts
Phase 1 Starts
41
OGSA-DAI Summary
  • On Schedule Going Well
  • Contributions via DAIS-WG _at_ GGF5 6
  • Releases with GT3 Releases scheduled
  • Status Early Days
  • Released prototypes
  • Tested Architectural Design
  • Using OGSA
  • Working with Early Adopter Pilot Projects
  • AstroGrid MyGrid
  • First PRODUCT release Dec 02
  • Influence OGSA-DAI direction
  • Via DAIS-WG Direct messages to us

42
Data Processing
  • Processing Characteristics
  • Well defined work flow
  • Correction, calibration, transformation,filtering,
    merging
  • Relatively static reference data
  • Stable processing functions (audited changes)
  • Periodic reprocessing from archive

Dave Pearson Provenance and Derivation workshop
18 Oct 02, Chicago
43
Analysis and Interpretation
  • Analysis Characteristics
  • - Variable workflow
  • - Standard functions
  • - Standard and personal
  • filtering and summarisation
  • - Retain drill down capability

Dave Pearson Provenance and Derivation workshop
18 Oct 02, Chicago
44
Analysis and Interpretation
  • Conclusions/Inferences
  • Descriptions
  • Trends
  • Correlations
  • Relationships
  • Analysis and Interpretation Characteristics
  • Highly dynamic work flow
  • Multiple data types
  • Volatile data
  • Annotations, inferences, conclusions
  • Evidential reasoning
  • Shared multiple versions of truth
  • Periodic version consolidation

Dave Pearson Provenance and Derivation workshop
18 Oct 02, Chicago
45
Metadata Requirements
  • Technical Metadata
  • Direct referencing - Physical location and data
    schema/structure
  • Data currency/status version, time stamping
  • Accreditation/Access permissions - Ownership
    (Dublin Core)
  • Query time/Governance - data volume, no. of
    records, access paths
  • Contextual Metadata
  • Logical referencing physical data
    semantic/syntactic ontologies
  • Lexical translation Thesaurus, ontological
    mapping
  • Named derivations (summarisations)
  • Scope of Requirements
  • All science communities
  • Related to provenance

Dave Pearson Provenance and Derivation workshop
18 Oct 02, Chicago
46
Metadata Requirements
  • Data Versioning
  • Distinguish latest/agreed version of data
  • Maintain history record of change
  • Synchronise and mirror replicated data
  • Distinguish shared personal interpretations
    and/or annotations
  • Provenance
  • Record of data processing calibration,
    filtering, transformation
  • Record of workflow methods, standards and
    protocols
  • Reasoning evidential justification for
    inferences conclusions
  • Scope of Requirements
  • All science communities
  • Includes Technical and Contextual Metadata

Dave Pearson Provenance and Derivation workshop
18 Oct 02, Chicago
47
Provenance Issues
  • Schema evolution
  • Granularity of record
  • Processed v Derived
  • Inheritance
  • Lack of structured annotations, ontologies
  • Interactive analysis dynamic workflow
  • Multiple derived data sources
  • Context of usage
  • Best practice can change
  • Multiple versions of the truth
  • Evidential reasoning
  • Existing data applications
  • Where is the provenance record stored

Dave Pearson Provenance and Derivation workshop
18 Oct 02, Chicago
48
Collaborative Annotation
  • See DAS
  • Distributed Annotation Service
  • Challenges
  • Autonomy
  • Selective viewing
  • Identification
  • Provenance
  • Derivation

49
Biomedical e-Scientists
  • Is this one species?
  • Understanding bird energy
  • Understanding a river / ocean interaction
  • Understanding a biochemical pathway
  • Understanding a cell
  • Understanding a Heart or Brain
  • Understanding Rhododendra
  • Understanding Evolution
  • No One-Size fits all solutions
  • But sharable re-usable components

50
Opportunities
  • Many, many
  • More than we can address
  • Compute needs
  • Data management needs
  • Data integration needs
  • Must choose some pioneers
  • To meet a range of common requirements
  • To provoke rich high-level platform
  • To generate re-usable components
  • A Long-Term Commitment Needed

51
Advancing SDMIV Grid
SDMIV Users
Scientific Application
SDMIV (Grid) Application Component Library
Grid Plumbing Security Infrastructure
52
Summary
  • e-Science
  • Data as well as Compute Challenges
  • Needed to be put together
  • Need ubiquitous supported consistent platforms
  • Grid
  • A (potentially) invaluable platform
  • Only show in town
  • Data Integration
  • Hard ? Develop Use Standard kit of parts
  • Started to build the kit
  • No ready made general integration
  • Combines application computing science
  • Opportunities
  • No one-size fits all, but re-usable subsystems
  • Invest in wider range of Problem driven
    pioneering
  • Strategic choices needed
Write a Comment
User Comments (0)
About PowerShow.com