Client Toolkit - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Client Toolkit

Description:

Pixar generate 100 TB/Movie. Storage getting cheaper. Data stored in many different ways ... Also previews of WS-I and WS-RF/GT4 releases. 9. http://www.ogsadai.org.uk ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 21
Provided by: theo87
Category:

less

Transcript and Presenter's Notes

Title: Client Toolkit


1
OGSA-DAI Data Access and Integration for the Grid
Neil Chue Hong N.ChueHong_at_epcc.ed.ac.uk Amy
Krause A.Krause_at_epcc.ed.ac.uk
2
Overview
  • Motivation
  • Goals
  • Partners
  • Features
  • Projects
  • Further information

3
OGSA-DAI Motivation
  • Entering an age of data
  • Data Explosion
  • CERN LHC will generate 1GB/s 10PB/y
  • VLBA (NRAO) generates 1GB/s today
  • Pixar generate 100 TB/Movie
  • Storage getting cheaper
  • Data stored in many different ways
  • Data resources
  • Relational databases
  • XML databases
  • Flat files
  • Need ways to facilitate
  • Data discovery
  • Data access
  • Data integration
  • Empower e-Business and e-Science
  • The Grid is a vehicle for achieving this

4
Goals for OGSA-DAI
  • Aim to deliver application mechanisms that
  • Meet the data requirements of Grid applications
  • Functionality, performance and reliability
  • Reduce development cost of data centric Grid
    applications
  • Provide consistent interfaces to data resources
  • Acceptable and supportable by database providers
  • Trustable, imposed demand is acceptable, etc.
  • Provide a standard framework that satisfies
    standard requirements
  • A base for developing higher-level services
  • Data federation
  • Distributed query processing
  • Data mining
  • Data visualisation

5
Integration Scenario
  • A patient moves hospital

Amalgamated patient record
Oracle
CSV file
DB2
A (PID, name, address, DOB)
B (PID, first_contact)
C (PID, first_name, last_name, address,
first_contact, DOB)
6
Why OGSA-DAI?
  • Why use OGSA-DAI over JDBC?
  • Language independence at the client end
  • Do not need to use Java
  • Platform independence
  • Do not have to worry about connection technology
    and drivers
  • Can handle XML and file resources
  • Can embed additional functionality at the service
    end
  • Transformations, Compression, Third party
    delivery
  • Avoiding unnecessary data movement
  • Provision of Metadata is powerful
  • Usefulness of the Registry for service discovery
  • Dynamic service binding process
  • The quickest way to make data accessible on the
    Grid
  • Installation and configuration of OGSA-DAI is
    fast and straightforward

7
Project Partners
Funded by the Grid Core Programme OGSA-DAI 3
million, 18 months, from Feb 2002 Three major
releases, three interim releases DAIT
(DAI-Two) Keep the OGSA-DAI brand name 1.5
million, 24 months, from Oct 2003 Four major
releases GGF DAIS WG Strong involvement. Standard
ise the interfaces OGSA-DAI to be a reference
implementation
8
Core features
  • An extensible framework for building applications
  • Supports relational, xml and some files
  • MySQL, Oracle, DB2, SQL Server, Postgres,
    XIndice, CSV, EMBL
  • Supports various delivery options
  • SOAP, FTP, GridFTP, HTTP, files, email,
    inter-service
  • Supports various transforms
  • XSLT, ZIP, GZip
  • Supports message level security using X509
    certificates
  • Client Toolkit library for application developers
  • Comprehensive documentation and tutorials
  • Third production release is coming in November
  • OGSI/GT3 based
  • Also previews of WS-I and WS-RF/GT4 releases

9
Activities are the drivers
  • Express a task to be performed by a GDS
  • Three broad classes of activities
  • Statement
  • Transformations
  • Delivery
  • Extensible
  • Easy to add new functionality
  • Does not require modification to the service
    interface
  • Extension operate within the OGSA-DAI framework
  • Functionality
  • Implemented at the service
  • Work where the data is (do not require to move
    data back)

10
OGSA-DAI Deck
11
Client Toolkit
  • Why? Nobody wants to write XML!
  • A programming API which makes writing
    applications easier
  • Now Java
  • Next Perl, C, C?, ML!?

// Create a query SQLQuery query new
SQLQuery(SQLQueryString) ActivityRequest request
new ActivityRequest() request.addActivity(query
) // Perform the query Response response
gds.perform(request) // Display the
result ResultSet rs query.getResultSet() displa
yResultSet(rs, 1)
12
Project classification
  • AstroGrid
  • ODD-Genes
  • Bridges

Physical Sciences
  • BioSimGrid
  • BioGrid
  • GEON
  • eDiamond
  • myGrid

Biological Sciences
  • GeneGrid

OGSA-DAI
  • N2Grid
  • MCS
  • OGSA Web-DB
  • GridMiner
  • IU RGBench
  • FirstDig

Computer Sciences
  • INWA

Commercial Applications
13
  • e-Digital MammOgraphy National Database
  • Built a prototype of a national database of
    mammographic images in support of the UK Breast
    screening programme
  • Employ Grid technologies to facilitate this
    process

14
(No Transcript)
15
  • eDiaMoND Findings
  • OGSA-DAI provides a flexible framework
  • Dynamically configure the system through
    discovery
  • Activities can operate with different levels of
    granularity
  • Federation can introduced at various levels
  • Extended Activities to access IBM DB2 Content
    Manager

16
GeneGrid
  • Grid Based Framework for Bioinformatics Virtual
    Bioinformatics Laboratory
  • Integration of Existing Technologies Data Sets
  • Gene Study in Silico
  • Develop Specialist Data Sets
  • Grid Services for Commercial or 3rd Party Use
  • Data resources as XML collections (XIndice), flat
    files and relational databases (MySQL)
  • OGSA-DAI plus custom extensions
  • Beta testers for file based activities
  • http//www.qub.ac.uk/escience/projects/genegrid/

17
GeneGrid Architecture
GDM Service

GeneGrid Environment
GeneGrid Portal
GeneGrid Workflow Manager Service
GeneGrid Application Management Registry
GeneGrid Data Manager Registry
GDM Service
GeneGrid Process Manager Service
GeneGrid Input Results Parameters
GDM Service
BeSC
GAM Service
GAM Service
GDM Service
iGAP
GAM Service
GDM Service
Blast
EMBL DB
TMHMM
mpiBlast
SwissProt DB
EBI
SignalP
SDSC
SwissProt Database
EMBL Database
18
Distributed Query Processing
  • Higher level services building on OGSA-DAI
  • Queries mapped to algebraic expressions for
    evaluation
  • Parallelism represented by partitioning queries
  • Use exchange operators

19
GridMiner
  • Test application area medical
  • traumatic brain injury treatment
  • Predicting the outcome of seriously ill patients
  • analytical part focuses on data mining and
    On-Line Analytical Processing (OLAP)
  • Target
  • provide tools to discover and access relevant
    knowledge and information from different
    distributed and heterogeneous data sources
  • building on and extending OGSA-DAI
  • http//www.gridminer.org/

20
GridMiner Scenario
  • Heterogeneities
  • Name in A is First Last (as the target format)
  • Name in C has to be combined
  • Distribution
  • 3 data sources

21
Further information
  • The OGSA-DAI Project Site
  • http//www.ogsadai.org.uk
  • The DAIS-WG site
  • http//forge.gridforum.org/projects/dais-wg/
  • OGSA-DAI Users Mailing list
  • users_at_ogsadai.org.uk
  • General discussion on grid DAI matters
  • Formal support for OGSA-DAI releases
  • http//www.ogsadai.org.uk/support
  • support_at_ogsadai.org.uk
  • OGSA-DAI training courses

22
Project Membership
Malcolm
Norman
Paul
Kostas
Neil
Charaka
Mike
Ally
Amy
Mario
Andy
Simon
Brian
Dave
Patrick
Neil
Tom
IBM Dissemination Team
IBM Development Team
23
The End
  • Questions?

24
INWA Objectives
  • Innovation Node Western Australia
  • Informing Business Regional Policy
    Grid-enabled fusion of global data and local
    knowledge
  • Project
  • Run from Nov 2003 - Aug 2004
  • Involved 10 partners (6 UK 4 Australia)
  • Aim
  • Data mine commercially sensitive data
  • Security an absolute MUST
  • Employ Grid technologies
  • Need access to data and computational resources
  • Demonstrator using
  • OGSA-DAI
  • Incorporate data resources
  • Sun DCG's TOG (Transfer-queue Over Globus)
  • Handle job submission to analyse micro array data

25
INWA
26
INWA Lessons Learned
  • Performing Data Integration
  • TimeZone date problems
  • Security issues
  • Bugs in
  • JavaCoG in GT3
  • OGSA-DAI could not switch security for Grid data
    transfers
  • TOG had no security option
  • All of these have been fixed
  • Middleware not mature enough for commercial
    deployment

27
  • Biomedical Research Informatics Delivered by Grid
    Enabled Services
  • Want a Grid enabled front end to their software
  • Want to do a comparison evaluation between
  • IBM's Information Integrator
  • OGSA-DAI

28
Bridges Data Sources
29
Client
OGSA-DAI
IBM Information Integrator
30
FirstDIG
  • Data mining with the First Transport Group, UK
  • Example When buses are more than 10 minutes
    late there is an 82 chance that revenue drops by
    at least 10
  • http//www.epcc.ed.ac.uk/firstdig

OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI Client Application
Data Mining Application
31
EdSkyQuery-G
Sky Data ??
Sky Data ??
Sky Data ??
Sky Data ??
32
Data Service
Data Service
Scratch DB
Data Service
Data Service
Data Service
Data Service
DB2
PostgreSQL
Scratch DB
Scratch DB
Data Service
Data Service
Data Service
Data Service
Xindice
MySQL
33
OGSA-DAI Downloads R4
  • 690 downloads since May 04
  • Actual user downloads not search engine crawlers
  • -Does not include downloads as part of GT3.2
    releases
  • Total of 838 registered users
  • (_at_ 7/10/04)
  • Version (release date) Downloads
  • R1.0 (Jan 03) 104
  • R1.5 (Feb 03) 108
  • R2.0 (Apr 03) 250
  • R2.5 (Jun 03) 291
  • R3.0 (Jul 03) 792
  • R3.1 (Feb 04) 630
  • Total 2865

34
Users Group
  • A separate independent body to engage with users
    and feedback to developers
  • Chair Prof. Beth Plale of Indiana University
  • Twice-yearly meetings
Write a Comment
User Comments (0)
About PowerShow.com