OGSA-DAI Status and Benchmarks - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

OGSA-DAI Status and Benchmarks

Description:

An extensible framework for data access and integration. ... Customise for your project using. Additional Activities. Client Toolkit APIs ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 30
Provided by: neilch2
Category:

less

Transcript and Presenter's Notes

Title: OGSA-DAI Status and Benchmarks


1
OGSA-DAIStatus and Benchmarks
  • All Hands Meeting 2005
  • Nottingham, 22 September 2005

2
Overview
  • The all new OGSA-DAI overview
  • Benchmarking and profiling work
  • Project collaboration
  • Future plans

3
OGSA-DAI team
NeSC, Edinburgh
EPCC Team, Edinburgh
NEReSC, Newcastle
IBM Dissemination Team
IBM Development Team, Hursley
4
OGSA-DAI In One Slide
  • An extensible framework for data access and
    integration.
  • Expose heterogeneous data resources to a grid
    through web services.
  • Interact with data resources
  • Queries and updates.
  • Data transformation / compression
  • Data delivery.
  • Customise for your project using
  • Additional Activities
  • Client Toolkit APIs
  • Data Resource handlers
  • A base for higher-level services
  • federation, mining, visualisation,

5
The OGSA-DAI Framework
Application
Client Toolkit
OGSA-DAI service
Engine
SQLQuery
Activities
GZip
GridFTP
XPath
readFile
XSLT
JDBC
Data Resources
XMLDB
File
MySQL
DB2
XIndice
SWISS PROT
SQL Server
Data- bases
6
Extensibility Example
OGSA-DAI service
Engine
SQLQuery
SQLQuery
JDBC
Multiple SQL GDS
MySQL
7
Timeline
2005
2004
2003
OGSA-DAI WSRF 1.0
OGSI Release 6
?
Release 1
Release 3.1
OGSA-DAI WS-I 1.0/ OGSA-DAI WS-I 1.1 (OMII)
Release 1 interim
Release 4
Release 2
Release 2 interim
Release 5
Release 3
8
Release downloads
Data up to 28/07/05
9
Geographical download profiles
OGSI WSRF WS-I
China (28) China (32) UK (30)
UK (20) UK (19) China (28)
US (12) Germany (8) US (8)
Unknown (10) US (7) Japan (7)
4556 330 120
Data up to 29/07/05
10
Our stakeholders
  • OMII
  • Current version of OGSA-DAI WS-I 1.0 distribution
    runs on OMII
  • Release 1.1 due out soon
  • Issues when security is introduced
  • Globus
  • WSRF 0.9.6 distribution bundled with GT4.0
  • WSRF 1.0 distribution bundled with GT4.0.1
  • Projects
  • Number of projects have used/use/will use OGSA-DAI

AstroGrid Biogrid BioSimGrid Bridges caGrid DataMiningGrid
eDiamond FirstDig GEDDM GeneGrid GEON GridMiner
INWA IU RGRBench LEAD MCS myGrid N2Grid
ODD-Genes OGSA-WebDB SIMDAT GOLD    
11
Out with the old
Client
Client
Client Toolkit API
DAISGR
Server
GDSF
GDSF
GDSF
Relational
XML
Files
Data
12
in with the new!
Client
Server
Data
13
Changes in moving to WSRF/WS-I
  • Registry component (DAISGR) no longer supported
  • Hope to leverage of third party registration
    services
  • GRIMOIRES (http//www.omii.ac.uk/mp/mp_grimoires.h
    tm)
  • Others
  • GDS/GDSF roles combined
  • Use data services
  • Currently static services but
  • Reconfigurable services
  • Improvements to the GDS
  • Data resource abstraction decoupled from the
    service
  • Renaming (consistent naming across platform
    versions)
  • Ability to enforce control flow constraints
    (ordering activities)
  • Refactored exception framework
  • Temporary set-backs (we promise well fix them)
  • No security model
  • No concurrency
  • Previously used GDSs for concurrency
  • Support now moving to the engine

14
The Client Toolkit (CTk)
  • Provides programmatic abstraction for perform
    documents
  • Do not have to write XML explicitly
  • Abstraction over WSI and WSRF services at client
    side
  • dont need to know what type of service is at
    the other end (almost)
  • security model is the remaining issue
  • Currently only Java version of CTk
  • Stabilising API
  • Publish an API document
  • Allow 3rd parties to develop CTk for other
    programming languages

15
The Server Side
  • Server side
  • Presentation layer
  • Deal with messaging differences
  • Get one version per distribution
  • Core/Business Logic
  • Common to all distributions
  • Data Service Resource (DSR)
  • Data Layer
  • Relational databases
  • XML document repositories
  • File based repositories
  • New architecture being rolled out
  • see Malcolms talk in next session
  • concurrency, sessions and transactions

16
Benchmarking/Profiling
  • Establish benchmark suite to
  • Measure performance gains/losses between releases
  • Reveal implementation issues
  • Allows focused improvements
  • Establish best practice
  • Summer intern (Heather Kelly) produced results
  • Profiling allows us to identify particular areas
    which are causing poor performance in the
    benchmarks
  • Summer intern (Radoslaw Ostrowski) extended
    Netlogger and did some profiling
  • Most of the results are for OGSA-DAI R6
  • one slide showing what is happening in R7

17
Configuration
  • Measure the time to
  • Send SQL query to server
  • Return nRows
  • Sum the values in one of the columns
  • Do this 30 times
  • Calculate mean and standard deviation
  • Repeat the process having increased nRows by
    stepsize
  • Try various different databases
  • Notes
  • Time to establish connection in JDBC runs not
    included
  • JDBC does not return results in WebRowSet format
  • Server is already running
  • Data source little blackbook
  • Test database included in distributions

18
Some benchmarks
  • Relational query
  • StreamServlet requires two communications
  • could improve this
  • FTP not iterating over result set
  • JDBC scales much better than SOAP
  • ResultSet implementations
  • Forwards-backwards implementation builds DOM
    tree larger memory footprint

19
MySQL(nRows 10000, number of runs 30,
stepsize 500, blockSize 200)
20
DB2(nRows 10000, number of runs 30, stepsize
500, blockSize 200)
21
PostgreSQL(nRows 10000, number of runs 30,
stepsize 500, blockSize 200)
22
SQL Server(nRows 10000, number of runs 30,
stepsize 500, blockSize 200)
23
Oracle(nRows 10000, number of runs 30,
stepsize 500, blockSize 200)
24
OGSA-DAI WS-I(nRows 10000, number of runs
30, stepsize 500)
25
Database comparison (OGSA-Dai WSRF 1.0, nRows
10000, number of runs 30, stepsize 500)
26
Platform comparison(MySQL database, nRows
10000, number of runs 30, stepsize 500)
27
Profiling better RowSet conversion
ResultSet to RowSet conversion
28
R6-gtR7 removal of RowSet
29
Challenges
  • Intermediate representation
  • between multiple models (relational, XML,)
  • XML WebRowSet is flexible (c.f. GridMiner) but
    expansive
  • DFDL and GridFTP/parallel HTTP?
  • Query definition
  • translation of queries
  • Data transport and workflow
  • workflow is typically compute driven
  • Move computation to data
  • mobile code activities?
  • data services hosted on DBMS?

30
caBIG
  • Object-Oriented view of data
  • Data types are well-defined and registered in a
    repository
  • Standardized metadata facilitates discovery
  • custom query language implemented as an activity

31
LEAD
32
Users Group and DIALOGUE Workshops
  • 3rd Users Group meeting
  • June 1st
  • http//www.ogsadai.org.uk/docs/UG3/
  • DIALOGUE Workshops
  • Data Integration Applications Linking
    Organisations to Gain Understanding and
    Experience
  • Columbus, Edinburgh, Vienna, Indiana
  • Bringing together Data Integration middleware and
    application providers with users
  • http//www.datagrids.org

33
Future plans
  • A new version of the OGSA-DAI Engine
  • should look mostly the same externally
  • better support for concurrency, sessions and
    monitoring
  • see Architecture paper/talk presented on Monday
  • Implementing new versions of specifications
  • DAIS Specifications
  • Key things that we will be addressing after
    Release 7
  • Performance
  • A Security Model which can be applied across
    platforms
  • Full Transactions provision, including
    implementation of compensatory activities,
    distributed transactions
  • More data integration facilities
  • Better abstraction over DBMS variation

34
Conclusions
  • OGSA-DAI has had to undergo significant
    refactoring to keep stakeholders happy
  • Refactoring has allowed us to create an
    extensible framework which can be used for many
    data related tasks
  • We need to identify the components and
    improvements which will be useful to users
  • There is obviously room for improvement on
    performance, and we are working on it

35
Further information
  • The OGSA-DAI Project Site
  • http//www.ogsadai.org.uk
  • The DAIS-WG site
  • http//forge.gridforum.org/projects/dais-wg/
  • OGSA-DAI Users Mailing list
  • users_at_ogsadai.org.uk
  • General discussion on grid DAI matters
  • Formal support for OGSA-DAI releases
  • http//www.ogsadai.org.uk/support
  • support_at_ogsadai.org.uk
  • OGSA-DAI training courses

36
Core features of OGSA-DAI I
  • A framework for building applications
  • Supports data access, insert and update
  • Relational MySQL, Oracle, DB2, SQL Server,
    Postgres
  • XML Xindice, eXist
  • Files CSV, BinX, EMBL, OMIM, SWISSPROT,
  • Supports data delivery
  • SOAP over HTTP
  • FTP GridFTP
  • E-mail
  • Inter-service
  • Supports data transformation
  • XSLT
  • ZIP GZIP
  • Supports security
  • X.509 certificate based security

37
Core features of OGSA-DAI II
  • A framework for building data clients
  • Client toolkit library for application developers
  • A framework for developing functionality
  • Extend existing activities, or implement your own
  • Mix and match activities to provide functionality
    you need
  • Highly-extensible
  • Customise our out-of-the-box product
  • Provide your own services, client-side support
    and data-related functionality
  • Comprehensive documentation and tutorials
  • Latest release supports GT3.2 (to be deprecated),
    GT4.0, and Axis 1.2 / OMII_2 using Java 1.4

38
OGSA-DAI Design Principles I
  • Efficient client-server communication
  • Minimise where possible
  • One request specifies multiple operations
  • No unnecessary data movement
  • Move computation to the data
  • Utilise third-party delivery
  • Apply transforms (e.g., compression)
  • Build on existing standards
  • Fill-in gaps where necessary

39
OGSA-DAI Design Principles II
  • Do not hide underlying data model
  • Users must know where to target queries
  • Data virtualisation is hard
  • Extensible architecture
  • Modular and customisable
  • e.g., to accommodate stronger security
  • Extensible activity framework
  • Cannot anticipate all desired functionality
  • Activity unit of functionality
  • Allow users to plug-in their own

40
Data Integration challenges
  • Metadata extraction
  • define a common model for e.g. database schema?
  • Intermediate representation
  • between multiple models (relational, XML,)
  • XML WebRowSet is flexible (c.f. GridMiner) but
    expansive
  • DFDL and GridFTP/parallel HTTP?
  • Query definition
  • translation of queries
  • Data transport and workflow
  • workflow is typically compute driven
  • Move computation to data
  • mobile code activities?
  • data services hosted on DBMS?

41
Contributing to OGSA-DAI
  • Additional functionality
  • Provide activities which implement specific
    functionality
  • Provide extra client functionality
  • Provide different security mechanisms
  • Provide higher level components and applications
  • Different levels of contributions
  • Based on OGSA-DAI?
  • Works with OGSA-DAI?
  • Part of OGSA-DAI?

42
Distributed Query Processing
  • Queries mapped to algebraic expressions for
    evaluation
  • Parallelism represented by partitioning queries
  • Use exchange operators
  • Prototype available from
  • http//www.ogsadai.org.uk
  • Being integrated into OGSA-DAI

43
caBIG
  • Object-Oriented view of data
  • Data types are well-defined and registered in a
    repository
  • Standardized metadata facilitates discovery
  • custom query language implemented as an activity

44
LEAD
45
FirstDIG
  • Data mining with the First Transport Group, UK
  • Example When buses are more than 10 minutes
    late there is an 82 chance that revenue drops by
    at least 10
  • http//www.epcc.ed.ac.uk/firstdig

OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI
OGSA-DAI Client Application
Data Mining Application
46
GridMiner
  • Test application area medical
  • traumatic brain injury treatment
  • Predicting the outcome of seriously ill patients
  • analytical part focuses on data mining and
    On-Line Analytical Processing (OLAP)
  • Target
  • provide tools to discover and access relevant
    knowledge and information from different
    distributed and heterogeneous data sources
  • building on and extending OGSA-DAI
  • http//www.gridminer.org/

47
GridMiner Scenario
  • Heterogeneities
  • Name in A is First Last (as the target format)
  • Name in C has to be combined
  • Distribution
  • 3 data sources

48
Software Process
REVIEW
Programme Board
Technical Review Board
Technical Reviewer
Users Group
Peer Review and Inspection
Continual process ?
Reqs.
Design
Implement
QA
Ingest
DEVELOPERS
Nightly unit system tests
Deep track features
Release
Dissem.
Testing
Prototype
Additional test cases
System tests based on reqs
Test Cases
Fix Bugs
Support
Training
USERS
Use Cases
Prioritisation
Contribs
Requests
49
INWA
Write a Comment
User Comments (0)
About PowerShow.com