Database Access and Integration Workshop 1516 April 2002, NeSC Report Back - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Database Access and Integration Workshop 1516 April 2002, NeSC Report Back

Description:

Identify requirements of DB and information management applications within a Grid setting. ... (Albert Burger, Heriot-Watt University) ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 23
Provided by: zj72
Category:

less

Transcript and Presenter's Notes

Title: Database Access and Integration Workshop 1516 April 2002, NeSC Report Back


1
Database Access and Integration Workshop15-16
April 2002, NeSC(Report Back)
  • Jasmin Wason
  • Zhuoan Jiao
  • DAI Workshop Online Documents
  • http//umbriel.dcs.gla.ac.uk/NeSC/general/esi/past

2
Introduction to Database Task Force Norman
Paton (5 mins)
  • AIMS
  • Identify requirements of DB and information
    management applications within a Grid setting.
  • Develop road map for RD of DB functionalities in
    a Grid setting.
  • Standards for Grid/DB, links through GGF (wider
    community).
  • Advise Architecture Task Force on requirements
    for DB support.
  • Foster reference implementations.
  • ACTIONS
  • OGSA-DAI Project funded (18 months)
  • GGF Working Group
  • See www.cs.man.ac.uk/grid-db

3
Database requirement analysis Dave Pearson (45
mins)
  • Key requirements
  • Represent complex data models
  • Maintain metadata
  • Technical, contextual, mappings and rules
  • Establish reliability and quality of data
    (provenance)
  • Make data more accessible (publishing
    discovery)
  • Restrict who can read modify data (access
    control)
  • Get data in a ready-to-use state
  • Data retrieval
  • locational transparency, everyday language
  • Analysis and interpretation capability (e.g.
    evidential reasoning)
  • Managing large volumes of data

4
DAI Workshop Show-and-TellAstroGrid Project
(Clive Page, Leicester University)
  • Belfast, Cambridge, Edinburgh, Leicester,
    Manchester, RAL, UCL
  • Main aims
  • Federate the worlds distributed astronomical
    data.
  • Data mining using Data Grid and Web Services
    concepts.
  • Design virtual observatory based on OGSA
  • Database Challenges
  • Interoperability XML standards for queries,
    results metadata.
  • Resource discovery UDDI not very suitable
    (business-centric)
  • Query language SQL poorly matched to
    astronomical needs.
  • Indexing celestial position, poorly supported
    now.
  • Handling complex data structures. OODBMS a
    failure?

5
DAI Workshop Show-and-TellGEODISE Project
(Zhuoan Jiao, Southampton University)
  • GEODISE - Grid Enabled Optimisation and Design
    Search for Engineering
  • Goals To bring together and further the
    technologies of Design Optimisation, CFD, GRID
    computation, Knowledge Management Ontology in a
    demonstration of solutions to a challenging
    industrial problem.
  • Partners Southampton University, Oxford
    University, Manchester University, RAE Syetms,
    Rolls Royce, Fluent,
  • Current Status
  • Simple ontology for describing optimisation
    process
  • CFD and optimisation code wrapped as web services
  • Computations run on Condor web service
  • Automatic web service message logging, user
    information logging
  • Web based queries to databases
  • Visualization of the data using SVG web service

6
DAI Workshop Show-and-TellGEODISE Project
(continue)
  • Future Directions (mainly regarding database work
    package)
  • Wrap databases as web/Grid services
  • Query interface to distributed databases
    (possibly ontology based)
  • Knowledge assistance in optimisation and design
    decision making
  • Data mining
  • Evolution of database schema to cope with WSDL
    changes
  • Investigate native XML DBMS as alternative to
    relational
  • Work flow management (WSFL, etc.)
  • Distributed file access (GridFTP)
  • Security

7
DAI Workshop Show-and-TellDistributed Query on
the Grid (Paul Watson, Newcastle University)
  • Projects
  • Polar (parallel query processing over the Grid)
  • Database Access and Integration (DAI) for the
    Grid
  • MyGrid
  • Research Issues
  • Schema integration
  • Distributed query execution infrastructure
  • e.g. how to allocate machines to perform the
    join queries
  • Adapting to changing information
  • Adapting to failures
  • Workflow

8
DAI Workshop Show-and-TellXSPAN Project A
Cross-Species Anatomy Network
(Albert Burger, Heriot-Watt University)
  • Will allow anatomies from different species to be
    compared.
  • Links between textual ontology and 3D anatomy.
  • XSPAN Core
  • Generic XML schema for embryo anatomy (for
    interoperability)
  • DBs have different schemas but map data to
    generic XML Schema.
  • Graphical user interface and program level
    access.
  • To include heuristics, natural language
    processing techniques.
  • XSPAN Ontology
  • DAMLOIL
  • How to provide semi-automatic support for the
    creation of XSPAN ontology?

9
DAI Workshop Show-and-Tell Data Portal (Kerstin
Kleese van Dam, CLRC- Daresbury Laboratory)
  • HPC portal to find and manipulate data.
  • Wrapper for local database that transforms your
    metadata into a common format.
  • Can select data slices using RasDaMan
    multidimensional DB.
  • If data requested not in DB, simulation is run
    and data created then archived.
  • See http//esc.dl.ac.uk9000/index.html for a
    simple demo.

10
DAI Workshop Show-and-TellDiscovery Net (Vasa
Curcin, Imperial College Inforsense spin off)
  • Kensington Data Mining System
  • Data mining over heterogeneous data sources
  • Bioinformatics pharmaceutical, remote sensing
    data,
  • Visualisation methods for interactive discovery.
  • Integration into the Grid
  • Execute pre-processing and mining procedures in a
    distributed environment.
  • Workflow management
  • Vasa has offered Geodise a free (time limited?)
    trial of Kensington when we are ready to do data
    mining.

11
DAI Workshop Show-and-TellEuropean Datagrid
(EDG) project Leanne Guy (CERN)
  • LHC, raw data reduced through 3 levels from
    40TB/s to 100MB/s.
  • Trigger algorithms filter out physics thats not
    important.
  • Work package 2 Grid Data Management
  • Replication, Grid metadata management service
  • Spitfire
  • a Relational DB Service for the Grid.
  • Decouples the client from the RDBMS backend via a
    mediator.
  • API interface with Work package 1 - Computation
  • e.g. where is this file? Where is the best place
    to run a job?
  • Web services (planned but not yet implemented)

12
UK Role in Open Grid Architecture
Towards an Architectural Road Map (1) Malcolm
Atkinson
  • What is OGSA? Web Services Grid technology.

13
UK Role in Open Grid Architecture (OGSA)Towards
an Architectural Road Map (2)
  • OGSA describes many things that must be done, but
    not how to do them (1/2 the story).
  • OGSA features
  • WSDL WSIL WSEL - Description, Discovery
  • Tools and Platforms Apache axis,
  • Invocation - SOAP, PRC/RMI, Optimised Binding
  • Representations XML and XML Schema
  • Life time management factories, GS handles
  • Authentication, Change management, etc.

14
UK Role in Open Grid Architecture (OGSA)Towards
an Architectural Road Map (3)
  • OGSA Development
  • Globus2 ? Globus3, end of Dec. 2002
  • Need standard model for web service, to achieve
    interoperability.
  • Discussion Geodise Condor web service used in
    an example
  • Other projects may wrap other job submission
    tools as web services.
  • It would be good to have a standard interface to
    job submission services.
  • Allows users to shop around.

15
Database Access and Integration Services Planned
Development Activities (Norman Paton)
  • OGSA-DAI Project
  • Reference implementations docs (first 6 months)
  • Core relational services (Brain Collins, IBM)
  • Core XML services (Rob Baxter, EPCC)
  • Early adopters - AstroGrid and MyGrid at present.
  • Questionnaire to be sent round to see which
    projects want to participate/ use software etc.
  • Less well defined deliverables over following 12
    months
  • Enhancement, specialist data types, distributed
    queries

16
OGSA-Data Access and Integration Services (DAI)
  • To add data access integration service to OGSA
  • Protocol for accessing DB over OGSA
  • Relational/SQL, XML, OO, other semi-structured
    data.
  • Transparent integration of multiple databases.
  • SOAP/HTTP binding for WSDL protocol.
  • Documents
  • OGSA, NeSC, DBTF, Globus, GGF, etc
  • NeSC, Baseline Grid Data Service Spec Atkinson
    et al, Feb 2002
  • DBTF, Paton et al, Watson 2002
  • Globus, Tuecke et al, Feb 2002

17
OGSA-DAI (RDBMS) - Brain Collins, IBM Hursley
  • Research/Information Gathering
  • Documents (see previous slide)
  • Globus Toolkit V2 3 GSI, CAS, GridFTP
  • Technology
  • Linux, SQL, XSQL, JDBC, ODBC, Web Services, Grid
    Java CoG Kit, SRB, etc.
  • Scope
  • Test with Oracle, DB2, MySQL, welcome feedback on
    other DB.
  • Languages (Java/C ) and API (JDBC/ODBC)
  • Handle large dataset, multiple protocols for data
    transport.
  • Architecture
  • DB web service architecture
  • 1 DB or multiple DBs (virtual DB) wrapped as web
    services.
  • 1st six months concentrate on query and metadata.

18
OGSA-DAI (XML) Rob Baxter, EPCC-Edinburgh (1)
  • Grid XML Data Service (GXDS)
  • OGSA service wrappers for XML databases
  • Plus factory services for management of GXDS
  • XML Database Services (currently defines)
  • WSDL spec for GXDS version alpha
  • query, update
  • not implemented yet bulkLoad, schemaUpdate
  • Default implementation in Java
  • Apache/Axis in Tomcat for hosting environment.
  • Apache Xindice (zeen-dee-chay) XML Database
    (supports XPath)
  • Possibility of MS .NET in future?

19
OGSA-DAI (XML) - Rob Baxter, EPCC-Edinburgh (2)
  • Third party services for data transfer
  • Service requesters can use DB services as proxies
  • For long-lived queries and large data transfer
  • Formal review Aug Sep 2002 (post GGF5)
  • Seek feedback from AstroGrid, MyGrid, GridPP,
    RealityGrid, eDIKT
  • Geodise to contact Rob Baxter early May for an
    unsupported copy, or at least the WSDLs.

20
DAI Workshop (Day 2, morning)
  • Discussion Session
  • Security/Access Control Group
  • Query Language Group
  • Metadata Management Group
  • Replication Group
  • All groups mainly listed problems/requirements,
    but no solutions offered.
  • Future DBTF GGF WG Activities
  • Several special interest groups (SIGs) will be
    established, based on the above four discussion
    groups.
  • Share ideas between projects.
  • Web site/ bulletin board/ mailing list (NeSC to
    create early May)

21
  • Security and Access Control Breakout
  • Access rights by citizenship in a particular
    group
  • May change quickly and be valid for a certain
    period of time.
  • Data protection, granting access to your data.
  • Are solutions to the problems at a single (RDBMS)
    level, but harder at a federated level.
  • Query Languages Breakout
  • ADTs how middleware might expose these through
    extensions to standard SQL-92 systems

Query interface
Query in problem domain
Middleware
Publish query capabilities
DBMS
DBMS
DBMS
Translated queries
22
Metadata Management Breakout
  • Challenge
  • Manage integrity and applicability of mapping(s)
    over time between logical and physical
    representations of the data, and between logical
    and logical.

Superset of definitions within a science
discipline/company
Common reference ontology ?
rules
Semantic structure
Ontology, relationships, meaning of data values.
rules
How to access data without knowing physical
structure. Descriptive name e.g. engineering
optimisation data.
Terminology
1-1 mapping
Internal structure in storage resource. E.g.
ENG_OPT_DAT. E.g. relational tables, XML Schema.
Physical description
Write a Comment
User Comments (0)
About PowerShow.com