NeSC Data Projects and Initiatives - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

NeSC Data Projects and Initiatives

Description:

Using the protocols and ideas that have made the web a success for humans... Improved support for different database specific SQL types ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 41
Provided by: nationales
Category:

less

Transcript and Presenter's Notes

Title: NeSC Data Projects and Initiatives


1
NeSC Data Projects and Initiatives
  • Dr. Dave Berry
  • Research Manager

2
Contents
  • The Data Deluge
  • Web Services
  • The DAI vision
  • The OGSA-DAI Project and GGF
  • The OGSA-DAI Software
  • Edikt
  • Other relevant projects in the UK

3
Acknowledgements
  • This talk includes material prepared by
  • The OGSA-DAI project
  • The e-Diamond project
  • The BRIDGES project
  • The GGF OGSA Working Group
  • and others

4
The Data Deluge
  • Entering an age of data
  • CERN LHC will generate 1GB/s 10PB/y
  • VLBA (NRAO) generates 1GB/s today
  • Pixar generate 100 TB/Movie
  • Data stored in many different ways
  • Relational databases
  • XML databases
  • Flat files
  • Need ways to facilitate
  • Data discovery
  • Data access
  • Data integration

Mont Blanc (4810 m)
Downtown Geneva
5
Astronomical Databases
Data and images courtesy Alex Szalay, John
Hopkins
  • No. sizes of data sets as of mid-2002, grouped
    by wavelength
  • 12 waveband coverage of large areas of the
    sky
  • Total about 200 TB data
  • Doubling every 12 months
  • Largest catalogues nr. 1B objects

6
Bioinformatics Databases
PDB Content Growth
  • Biobliographic (MedLine, )
  • Amino Acid Seq (SWISS-PROT, )
  • 3D Molecular Structure (PDB, )
  • Nucleotide Seq (GenBank, EMBL, )
  • Biochemical Pathways (KEGG, WIT)
  • Molecular Classifications (SCOP, CATH,)
  • Motif Libraries (PROSITE, Blocks, )

7
Web Services
  • Using the protocols and ideas that have made the
    web a success for humans
  • And applying them to distributed programming
  • HTTP
  • Single networking port
  • Autonomy Failure handling
  • Open standards
  • Tools Platforms
  • Apache axis
  • Websphere, .NET, Oracle Application Server, Sun
    ONE,

8
From Browsing to Programming
  Browsing the web Programming the web
Readers People Software
Discovery Google, Altavista, UDDI,
Description N/A WSDL
Operations Get, post, Service-specific
Protocol HTTP SOAP over HTTP
Format HTML, XHTML XML Schema
9
A Perspective on WS Specifications
10
Open Grid Services Architecture
Access resource
Manage resource
Share resource
Continuous Availability
Applications on demand
Resources on demand
Secure and universal access
Global Accessibility
Business integration
Vast resource scalability
Web Services
Grid Protocols
The architecture of the Global Grid Forum
11
GGF11 OGSA specification informational document
Cataloging
Provisioning
VO Mgmt
Integration
Policy Mgmt
Access
Context Services
Information Services
Data Services
Event Mgmt
Trouble- shooting
Discovery
Logging
Infrastructure Services
Execution Mgmt Services
WSRF
WSN
WSDM
Job Mgmt
Execution Planning
Workflow Mgmt
Workload Mgmt
Application Mgmt
Naming
Resource Mgmt Services
Self Mgmt Services
Provisioning
Deployment
Configuration
Reservation
Security Services
Heterogeneity Mgmt
Authentication
Optimization
Authorization
Service Level Attainment
Integrity
QoS Mgmt
Boundary Traversal
12
Data Access and Integration
  • Web Services for querying and integrating
    structured data resources
  • The foundation framework for
  • Building tailored DAI applications
  • Higher-level services
  • Replication Data located in multiple locations
  • Federation Composition of multiple sources
  • Provenance How was data generated?

13
The OGSA-DAI Project
Funded by the Grid Core Programme OGSA-DAI 3
million, 18 months, from Feb 2002 Three major
releases, three interim releases DAIT
(DAI-Two) Keep the OGSA-DAI brand name 1.5
million, 24 months, from Oct 2003 Four major
releases
14
DAI in GGF and OGSA
  • Data Access and Integration Services WG
  • Strong involvement from OGSA-DAI members
  • Standardise the interfaces WS-DAI
  • OGSA-DAI a reference implementation
  • Experience informing specification work
  • OGSA WG Data Design Team
  • Designing the data-oriented aspects of OGSA
  • Created after GGF10 (March 2004)
  • Led by NeSC

15
OGSA Design Teams
Data Service design team
Information Service design team
EMS design team
Naming design team
OGSA-WG
Self Mgmt design team
Resource Mgmt design team
Security Service design team
Core (roadmap) design team
16
Data Services design team
  • Informal domain expert groups within OGSA
  • May include co-chairs of other WG/RGs
  • Output is included in OGSA specification

DAIS-WG
OGSA Data Service Design team
GSM-WG
GFS-WG
OGSA-WG
Tele cons, F2F meetings
Info-D WG
ADF, OREP,
17
OGSA v2 Document Deliverables
Root Documents
Glossary
Usecase doc
Architecture v2
Design team Documents
Service descriptions
Scenarios
Working Group Specifications
GGF Recommendation documents
18
How OGSA-DAI works
19
OGSA-DAI compared to JDBC
  • Language independence at the client end
  • Platform independence
  • Do not have to worry about connection technology,
    drivers, etc
  • Can handle XML resources
  • Can embed additional functionality at the service
    end
  • Transformations
  • Third party delivery
  • Avoiding unnecessary data movement
  • Provision of Metadata is powerful
  • Usefulness of the Registry for service discovery
  • Dynamic service binding process

20
Future DAI Services

1a. Request to Registry for
sources of data about x
Data

y

Registry

1b. Registry

responds with

Factory handle

2a. Request to Factory for access and

integration from resources Sx and Sy

Data Access Integrationmaster

2c. Factory

returns handle of GDS to client

3b. Client
2b. Factory creates

tells

GridDataServices network

analyst

Client

3a. Client submits sequence of

scripts each has a set of queries

GDTS

to GDS with XPath, SQL, etc

1
XML
Analyst

GDS

GDTS

database

GDS

2
S

x
GDS

S

y
3c. Sequences of result sets returned to

Relational
analyst as formatted binary described in

GDTS

GDS

GDS

2
3
a standard XML notation

database

1
GDS

GDTS

21
Activities are the drivers
  • Express a task to be performed by a GDS
  • Three broad classes of activities
  • Statement
  • Transformations
  • Delivery
  • Extensible
  • Easy to add new functionality
  • Does not require modification to the service
    interface
  • Extension operate within the OGSA-DAI framework
  • Functionality
  • Implemented at the service
  • Work where the data is (do not require to move
    data back)

22
OGSA-DAI Deck
23
Building Applications
  • Activities are grouped together
  • Perform document
  • Data can flow between activities
  • Optimisation
  • Avoids multiple message exchanges
  • Can deliver to other GDSs
  • Prerequisite for data integration
  • Base middleware for projects requiring data
    access
  • Some capability for data integration

24
Release 4, April 2004
  • Provides Data Access components, an extensible
    framework for building applications and some
    integration components
  • Built on top of Globus Toolkit 3.2
  • Supports relational, xml and some files
  • MySQL, Oracle, DB2, SQL Server, Postgres,
    XIndice, CSV
  • Supports various delivery options
  • SOAP, FTP, GridFTP, HTTP, files, email,
    inter-service
  • Supports various transforms
  • XSLT, ZIP, GZip
  • Supports message level security using X509
    certificates
  • Client Toolkit library for application developers
  • GUI data browser (contributed by FirstDIG
    project)
  • Separate Distributed Query Processing components
  • Comprehensive documentation and tutorials in
    XHTML format

25
Downloads by Release
2746 downloads (4.7 downloads a day)
26
Downloads by country
792 registered users _at_ 23/8/04
27
Release 5, October 2004
  • Re-engineered interface-independent core OGSA-DAI
    functionality.
  • Improved dependability and security integration.
  • New file data resources representing flat files
    queried using full text searches (e.g. EMBL
    format).
  • Installation and Configuration Wizard, including
    all-in-one installer
  • Improved Data Browser which allows XPath
    querying.
  • Set of standard benchmarks.
  • JSP Quick View interface.
  • Support for other databases (e.g. Access, Exist,
    HSQL).

28
Release 6, April 2006
  • Data Integration applications supporting
    identified scenarios
  • OGSA-DQP as an integrated part of release
  • Fully compliant JDBC Driver for OGSA-DAI
  • Support for WS-Security implementations
  • Support for stored procedures on all supported
    databases
  • Improved support for different database specific
    SQL types
  • SQL translation between vendor dialects for
    subset of queries
  • Support for XQuery data resources
  • We expect to comply with a version of the
    emerging DAIS specification at this release.

29
Who is Using OGSA-DAI?
N2Grid (http//www.cs.univie.ac.at/institute/index
.html?project-8080)
Bridges (http//www.brc.dcs.gla.ac.uk/projects/bri
dges/)
BioSimGrid (http//www.biosimgrid.org/)
INWA (http//www.epcc.ed.ac.uk/projects/inwa/)
BioGrid (http//www.biogrid.jp/)
AstroGrid (http//www.astrogrid.org/)
eDiaMoND (http//www.ediamond.ox.ac.uk/)
OGSA-DAI (http//www.ogsadai.org.uk)
GEON (http//www.geongrid.org/)
myGrid (http//www.mygrid.org.uk/)
MCS (http//www.isi.edu/deelman/MCS/)
ODD-Genes (http//www.epcc.ed.ac.uk/oddgenes/)
OGSA-WebDB (http//www.gtrc.aist.go.jp/dbgrid/)
GridMiner (http//www.gridminer.org/)
FirstDig (http//www.epcc.ed.ac.uk/firstdig/)
GeneGrid (http//www.qub.ac.uk/escience/projects.p
hpgenegrid)
IU RGRBench (http//www.cs.indiana.edu/plale/proj
ects/RGR/OGSA-DAI.html)
30
Project classification
31
Edikt
Requirementsanalysis
Technologymatchmaking
Edikt project
Gap filling
Rigorousengineering
  • The team 8 professional software engineers,
    support staff, project manager, commercialisation
    manager, architect, and SAB
  • SHEFC funded research and development grant
  • 3 years funding May 2002 2005
  • 3 years funding upon successful project and
    review

32
ELDAS Data Access Service
Grid User1
Grid User2
JavaFramework
Another (partial) implementation of the GGF
WS-DAI specifications
ELDAS
EJB - DAS
DB2 DB
MySQL DB
Xindice DB
Oracle 9i DB
  • Implemented using Enterprise Java Beans
  • Data Access Components interface to distinct
    DBMSs
  • Accessible as a grid data service or a web data
    service

33
BinX accessing legacy binary data
simulations
  • The Problem
  • Many binary data files
  • Applications must knowthe data format
  • Binary data formats are machine-specific

BinaryData File
BinaryData File
BinaryData File
  • The Solution
  • Write a stand-aside format description in XML
  • Provide a library to
  • Interpret the description
  • Provide file access across different machines
  • Build higher-level services

BinX Library
e-ScienceApplication
34
Mammography
A prototype of a national database of
mammographic images in support of the UK breast
screening programme
Temporal mammography
Computer Aided Detection
Standard Mammo Format
Mammograms have different appearances, depending
on image settings and acquisition systems
3D View
35
(No Transcript)
36
The BRIDGES Project
  • Biomedical Research Informatics Delivered by Grid
    Enabled Services
  • NeSC (Edinburgh and Glasgow) and IBM
  • www.brc.dcs.gla.ac.uk/projects/bridges
  • Supporting project for CFG project
  • Generating data on hypertension
  • Rat, Mouse, Human genome databases
  • Variety of tools used
  • BLAST, BLAT, Gene Prediction, visualisation,
  • Variety of data sources and formats
  • Microarray data, genome DBs, project partner
    research data, medical records,
  • Aim is integrated infrastructure supporting
  • Data federation
  • Security

37
BRIDGES
VO Authorisation
38
INWA Project
  • Innovation Node Western Australia
  • Informing Business Regional Policy
    Grid-enabled fusion of global data and local
    knowledge
  • Involved 10 partners (6 UK 4 Australia)
  • Aim
  • Data mine commercially sensitive data
  • Security an absolute MUST
  • Employ Grid technologies
  • Need access to data and computational resources
  • OGSA-DAI
  • Access data resources
  • SunDCG's TOG (Transfer-queue Over Globus)
  • Handle job submission to analyse micro array data

39
INWA
40
Further Information on OGSA-DAI
  • The OGSA-DAI Project Site
  • http//www.ogsadai.org.uk
  • The DAIS-WG site
  • http//cs.man.ac.uk/grid-db
  • OGSA-DAI Users Mailing list
  • users_at_ogsadai.org.uk
  • General discussion on grid DAI matters
  • Formal support for OGSA-DAI releases
  • http//www.ogsadai.org.uk/support
  • support_at_ogsadai.org.uk
  • OGSA-DAI training courses
Write a Comment
User Comments (0)
About PowerShow.com