Title: Database Access and Integration Workshop 1516 April 2002, NeSC Report Back
1Database Access and Integration Workshop15-16
April 2002, NeSC(Report Back)
- Jasmin Wason
- Zhuoan Jiao
- DAI Workshop Online Documents
- http//umbriel.dcs.gla.ac.uk/NeSC/general/esi/past
2Introduction to Database Task Force Norman
Paton (5 mins)
- AIMS
- Identify requirements of DB and information
management applications within a Grid setting. - Develop road map for RD of DB functionalities in
a Grid setting. - Standards for Grid/DB, links through GGF (wider
community). - Advise Architecture Task Force on requirements
for DB support. - Foster reference implementations.
- ACTIONS
- OGSA-DAI Project funded (18 months)
- GGF Working Group
- See www.cs.man.ac.uk/grid-db
3Database requirement analysis Dave Pearson (45
mins)
- Key requirements
- Represent complex data models
- Maintain metadata
- Technical, contextual, mappings and rules
- Establish reliability and quality of data
(provenance) - Make data more accessible (publishing
discovery) - Restrict who can read modify data (access
control) - Get data in a ready-to-use state
- Data retrieval
- locational transparency, everyday language
- Analysis and interpretation capability (e.g.
evidential reasoning) - Managing large volumes of data
4DAI Workshop Show-and-TellAstroGrid Project
(Clive Page, Leicester University)
- Belfast, Cambridge, Edinburgh, Leicester,
Manchester, RAL, UCL - Main aims
- Federate the worlds distributed astronomical
data. - Data mining using Data Grid and Web Services
concepts. - Design virtual observatory based on OGSA
- Database Challenges
- Interoperability XML standards for queries,
results metadata. - Resource discovery UDDI not very suitable
(business-centric) - Query language SQL poorly matched to
astronomical needs. - Indexing celestial position, poorly supported
now. - Handling complex data structures. OODBMS a
failure?
5DAI Workshop Show-and-TellGEODISE Project
(Zhuoan Jiao, Southampton University)
- GEODISE - Grid Enabled Optimisation and Design
Search for Engineering - Goals To bring together and further the
technologies of Design Optimisation, CFD, GRID
computation, Knowledge Management Ontology in a
demonstration of solutions to a challenging
industrial problem. - Partners Southampton University, Oxford
University, Manchester University, RAE Syetms,
Rolls Royce, Fluent, - Current Status
- Simple ontology for describing optimisation
process - CFD and optimisation code wrapped as web services
- Computations run on Condor web service
- Automatic web service message logging, user
information logging - Web based queries to databases
- Visualization of the data using SVG web service
6DAI Workshop Show-and-TellGEODISE Project
(continue)
- Future Directions (mainly regarding database work
package) - Wrap databases as web/Grid services
- Query interface to distributed databases
(possibly ontology based) - Knowledge assistance in optimisation and design
decision making - Data mining
- Evolution of database schema to cope with WSDL
changes - Investigate native XML DBMS as alternative to
relational - Work flow management (WSFL, etc.)
- Distributed file access (GridFTP)
- Security
7DAI Workshop Show-and-TellDistributed Query on
the Grid (Paul Watson, Newcastle University)
- Projects
- Polar (parallel query processing over the Grid)
- Database Access and Integration (DAI) for the
Grid - MyGrid
- Research Issues
- Schema integration
- Distributed query execution infrastructure
- e.g. how to allocate machines to perform the
join queries - Adapting to changing information
- Adapting to failures
- Workflow
8DAI Workshop Show-and-TellXSPAN Project A
Cross-Species Anatomy Network
(Albert Burger, Heriot-Watt University)
- Will allow anatomies from different species to be
compared. - Links between textual ontology and 3D anatomy.
- XSPAN Core
- Generic XML schema for embryo anatomy (for
interoperability) - DBs have different schemas but map data to
generic XML Schema. - Graphical user interface and program level
access. - To include heuristics, natural language
processing techniques. - XSPAN Ontology
- DAMLOIL
- How to provide semi-automatic support for the
creation of XSPAN ontology?
9DAI Workshop Show-and-Tell Data Portal (Kerstin
Kleese van Dam, CLRC- Daresbury Laboratory)
- HPC portal to find and manipulate data.
- Wrapper for local database that transforms your
metadata into a common format. - Can select data slices using RasDaMan
multidimensional DB. - If data requested not in DB, simulation is run
and data created then archived. - See http//esc.dl.ac.uk9000/index.html for a
simple demo.
10DAI Workshop Show-and-TellDiscovery Net (Vasa
Curcin, Imperial College Inforsense spin off)
- Kensington Data Mining System
- Data mining over heterogeneous data sources
- Bioinformatics pharmaceutical, remote sensing
data, - Visualisation methods for interactive discovery.
- Integration into the Grid
- Execute pre-processing and mining procedures in a
distributed environment. - Workflow management
- Vasa has offered Geodise a free (time limited?)
trial of Kensington when we are ready to do data
mining.
11DAI Workshop Show-and-TellEuropean Datagrid
(EDG) project Leanne Guy (CERN)
- LHC, raw data reduced through 3 levels from
40TB/s to 100MB/s. - Trigger algorithms filter out physics thats not
important. - Work package 2 Grid Data Management
- Replication, Grid metadata management service
- Spitfire
- a Relational DB Service for the Grid.
- Decouples the client from the RDBMS backend via a
mediator. - API interface with Work package 1 - Computation
- e.g. where is this file? Where is the best place
to run a job? - Web services (planned but not yet implemented)
12UK Role in Open Grid Architecture
Towards an Architectural Road Map (1) Malcolm
Atkinson
- What is OGSA? Web Services Grid technology.
13UK Role in Open Grid Architecture (OGSA)Towards
an Architectural Road Map (2)
- OGSA describes many things that must be done, but
not how to do them (1/2 the story). - OGSA features
- WSDL WSIL WSEL - Description, Discovery
- Tools and Platforms Apache axis,
- Invocation - SOAP, PRC/RMI, Optimised Binding
- Representations XML and XML Schema
- Life time management factories, GS handles
- Authentication, Change management, etc.
14UK Role in Open Grid Architecture (OGSA)Towards
an Architectural Road Map (3)
- OGSA Development
- Globus2 ? Globus3, end of Dec. 2002
- Need standard model for web service, to achieve
interoperability. - Discussion Geodise Condor web service used in
an example - Other projects may wrap other job submission
tools as web services. - It would be good to have a standard interface to
job submission services. - Allows users to shop around.
15Database Access and Integration Services Planned
Development Activities (Norman Paton)
- OGSA-DAI Project
- Reference implementations docs (first 6 months)
- Core relational services (Brain Collins, IBM)
- Core XML services (Rob Baxter, EPCC)
- Early adopters - AstroGrid and MyGrid at present.
- Questionnaire to be sent round to see which
projects want to participate/ use software etc. - Less well defined deliverables over following 12
months - Enhancement, specialist data types, distributed
queries -
16OGSA-Data Access and Integration Services (DAI)
- To add data access integration service to OGSA
- Protocol for accessing DB over OGSA
- Relational/SQL, XML, OO, other semi-structured
data. - Transparent integration of multiple databases.
- SOAP/HTTP binding for WSDL protocol.
- Documents
- OGSA, NeSC, DBTF, Globus, GGF, etc
- NeSC, Baseline Grid Data Service Spec Atkinson
et al, Feb 2002 - DBTF, Paton et al, Watson 2002
- Globus, Tuecke et al, Feb 2002
17OGSA-DAI (RDBMS) - Brain Collins, IBM Hursley
- Research/Information Gathering
- Documents (see previous slide)
- Globus Toolkit V2 3 GSI, CAS, GridFTP
- Technology
- Linux, SQL, XSQL, JDBC, ODBC, Web Services, Grid
Java CoG Kit, SRB, etc. - Scope
- Test with Oracle, DB2, MySQL, welcome feedback on
other DB. - Languages (Java/C ) and API (JDBC/ODBC)
- Handle large dataset, multiple protocols for data
transport. - Architecture
- DB web service architecture
- 1 DB or multiple DBs (virtual DB) wrapped as web
services. - 1st six months concentrate on query and metadata.
18OGSA-DAI (XML) Rob Baxter, EPCC-Edinburgh (1)
- Grid XML Data Service (GXDS)
- OGSA service wrappers for XML databases
- Plus factory services for management of GXDS
- XML Database Services (currently defines)
- WSDL spec for GXDS version alpha
- query, update
- not implemented yet bulkLoad, schemaUpdate
- Default implementation in Java
- Apache/Axis in Tomcat for hosting environment.
- Apache Xindice (zeen-dee-chay) XML Database
(supports XPath) - Possibility of MS .NET in future?
19OGSA-DAI (XML) - Rob Baxter, EPCC-Edinburgh (2)
- Third party services for data transfer
- Service requesters can use DB services as proxies
- For long-lived queries and large data transfer
- Formal review Aug Sep 2002 (post GGF5)
- Seek feedback from AstroGrid, MyGrid, GridPP,
RealityGrid, eDIKT - Geodise to contact Rob Baxter early May for an
unsupported copy, or at least the WSDLs.
20DAI Workshop (Day 2, morning)
- Discussion Session
- Security/Access Control Group
- Query Language Group
- Metadata Management Group
- Replication Group
- All groups mainly listed problems/requirements,
but no solutions offered. - Future DBTF GGF WG Activities
- Several special interest groups (SIGs) will be
established, based on the above four discussion
groups. - Share ideas between projects.
- Web site/ bulletin board/ mailing list (NeSC to
create early May)
21- Security and Access Control Breakout
- Access rights by citizenship in a particular
group - May change quickly and be valid for a certain
period of time. - Data protection, granting access to your data.
- Are solutions to the problems at a single (RDBMS)
level, but harder at a federated level. - Query Languages Breakout
- ADTs how middleware might expose these through
extensions to standard SQL-92 systems
Query interface
Query in problem domain
Middleware
Publish query capabilities
DBMS
DBMS
DBMS
Translated queries
22Metadata Management Breakout
- Challenge
- Manage integrity and applicability of mapping(s)
over time between logical and physical
representations of the data, and between logical
and logical.
Superset of definitions within a science
discipline/company
Common reference ontology ?
rules
Semantic structure
Ontology, relationships, meaning of data values.
rules
How to access data without knowing physical
structure. Descriptive name e.g. engineering
optimisation data.
Terminology
1-1 mapping
Internal structure in storage resource. E.g.
ENG_OPT_DAT. E.g. relational tables, XML Schema.
Physical description