Approaches to the Integration of Distributed and Heterogeneous Data Resources - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Approaches to the Integration of Distributed and Heterogeneous Data Resources

Description:

Based on common data model (such as CML and GML) ... Abstraction and Replica information for data 'Global user' name space and authentication ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 51
Provided by: asa2
Category:

less

Transcript and Presenter's Notes

Title: Approaches to the Integration of Distributed and Heterogeneous Data Resources


1
Approaches to the Integration of Distributed and
Heterogeneous Data Resources
  • Ahmet Sayar
  • Indiana University
  • Computer Science Department

2
Motivation
  • Integrating data from multiple data sources
  • Distributed query and transactions of data.
  • Definitions and adoptions of data, metadata and
    their storages.
  • Accessing the data seamlessly.
  • Transparency, support for heterogeneity,
    extensibility and scalability.

3
Outline
  • Data Integration Approaches
  • Application Specific Solutions
  • Application-Integration Framework
  • ASIS (Application Specific Information System)
  • Database Federation
  • Ogsa-DAI (Ogsa-Data Access and Integration)
  • Compare ASIS with Ogsa-DAI
  • Digital Libraries
  • SRB (Storage Resource Broker)
  • Sompels Digital Library Approach
  • Compare ASIS with SRB and Sompels DL

4
Application Specific Solutions
  • The most common means of data integration
  • Expensive -in terms of time and skills
  • Developing and using requires deep system
    knowledge
  • Better results for special-purpose applications
  • Fragile
  • Changes to the underlying sources may easily
    break the application
  • Hard to extend
  • A new data source requires new code to be written

5
Outline
  • Data Integration Approaches
  • Application Specific Solutions
  • Application-Integration Framework
  • ASIS
  • Database Federation
  • Ogsa-DAI
  • Compare ASIS with Ogsa-DAI
  • Digital Libraries
  • SRB
  • Sompels DL
  • Compare ASIS with SRB and Sompels DL

6
Application-Integration Framework
  • It can also be called component-based framework
  • Such as CORBA or Filters with common interfaces
  • Not necessarily address data integration issues
  • Based on common data model (such as CML and GML)
  • With adaptors, if the source change the adaptor
    may have to change, but application may never see
    it.
  • Adding a new source is easy
  • a new adaptor may need to be written.
  • The adaptor may already be exist online.
  • No need to detailed system knowledge
  • Ex. ASIS - OGC GIS Application Integration
    Framework

7
ASIS (1)
  • Enables inter-service communication through
    well-defined service interfaces, message formats
    and capabilities metadata.
  • Data model is ASL (Application Specific Lang.)
  • Metadata model is capability document
  • Data and metadata have common predefined schema
  • Components are Filter Services
  • Web Services, comon service interfaces defined in
    WSDL
  • Information/data services enabling distributed
    access, querying and transformation through their
    predictable input/output interfaces.
  • Chainable, located, and capable of updating their
    metadata manually or dynamically

8
ASIS (2)
  • Data and data storage model
  • Any data can be integrated into the system after
    transforming to ASL.
  • Heterogeneity is handled at the end-Filters with
    adaptors.
  • ASL is community-accepted application specific
    language
  • GML (Geographic Markup Lang.) in GIS applications
  • CML (Chemistry Markup Lang.) in Chemistry
    applications
  • Filters common service interfaces
  • getCapabilities, getData, getFeatureInfo.
  • Requests to Filters interfaces
  • getCapabilitiesReq, getDataReq, getFeatureInfoReq
  • Expected return types are defined in Filters
    capability metadata

9
ASIS (3)
  • Metadata and Metadata storage model
  • Data integration is done through Filters
    capability metadata
  • Metadata is stored in local Filters file system
    as a flat file.
  • Capability
  • Inspired from OGC WMS capability specification.
  • Look like Dublin Core format.
  • Capability like structure is also used in
    Gannons approach (XPOLA), for Grid services
    security issues.
  • Describes dynamic Web/Grid resources.
  • Updated manually or dynamically.
  • Consists of descriptor, service and provider
    metadata
  • Inter-service communication is achieved without a
    third-party. Enables chain of Filters.

10
ASIS (4)Data Access and Filter Chaining
  • Each Filter is capable of acting as both a server
    and a client
  • Capability integration is done through
    getCapability service interface
  • Requests for common service interfaces are
    created in accordance with predefined XML schema

F3
F1
State Boundary
F2
F4
Earth
Fault
Fault
11
Outline
  • Data Integration Approaches
  • Application Specific Solutions
  • Application-Integration Framework
  • ASIS
  • Database Federation
  • Ogsa-DAI
  • Compare ASIS with Ogsa-DAI
  • Digital Libraries
  • SRB
  • Sompels DL
  • Compare ASIS with SRB and Sompels DL

12
Database Federation
  • Middleware consisting of database management
    system
  • Uniform access to number of heterogeneous data
    sources
  • Provides query language used to combine,
    contrast, analyze and manipulate the data
  • Data integration is done through Database
    integration.
  • Combine data from multiple sources in a single
    SQL statement query recreation.
  • Ex. Ogsa-DAI (Open Grid Service Architecture
    Data Access and Integration)

13
Ogsa-DAI (1)
  • Provides common Java API for accessing and
    integrating data resources such relational and
    XML databases, and files- in Grid environment
  • Specifically designed for OGSA architecture
  • SQL queries on relational resources and XPath
    statements on XML collections
  • Provides data pipelining (similar to Filter
    chaining) via an XML document called perform
    document.
  • Allows developers to easily add or extend
    functionality within Ogsa-DAI, activity
    document.

14
Ogsa-DAI (2)
  • Data and storage model
  • Any data stored in XML or relational databases,
    files
  • No common data model
  • Data is provided through GDS (Grid Data Services)
  • Uses Ogsa-DQP (Distributed Query Processor) to
    coordinate to access to multiple data services
  • The enactment engine is the core of Ogsa-DAI.
    Orchestrate running of the perform document
  • Information in perform document includes
  • The list of activities and their XML schemas and
    implementation classes.
  • The list of role mappers and details
  • The info about data resource

15
Ogsa-DAI (3)
  • Metadata storage model
  • Metadata is kept in Catalog Service (MCS)
  • MCS enables attribute-based querying
  • Metadata is for the datasets, data can be
    anything (binary, text ..)
  • Data integration is done through XML based
    activity file mixing activities (in SQL queries)
    and metadata
  • Simple data access scenario
  • A client contacts a DAISGR first to locate the
    GDSFs.
  • Accesses suitable GDSFs directly to find out more
    about their properties and the data resources
    they represent.
  • Asks GDSF to instantiate a GDS
  • Accesses resource by sending the GDS the
    GDS-Perform doc.

16
Ogsa-DAI (4)
  • Metadata model
  • No common schema for metadata like capability
  • Defines Metadata for the datasets
  • No schema in XML
  • Stored in Database tables as attributes
  • Defines Metadata for the Database system to
    enable querying and defining activities
  • Schema in XML (mcsActivity.xsd schema file)
  • Kept as XML file in the file system
    (mcsActivity.xml)

17
ASIS vs. Ogsa-DAI
  • Ogsa-DAI does not define metadata and data in XML
    schema. Metadata is mixed with Database schema.
    ASIS has predefined data and metadata models.
  • Ogsa-DAI uses any data, and they have predefined
    Database schema to enable querying and accessing
    data.
  • ASISs data integration is on demand and based on
    capability federation. Instead, Ogsa-DAIs data
    integration is coded in XML struc perform and
    activity documents.
  • Ogsa-DAI has central (MCS), ASIS has distributed
    metadata approach.
  • Both system are based on Web Services.
  • Ogsa-DAI uses GridFTP, and ASIS uses
    NaradaBrokering for the performance issues in
    data transfers.

18
Outline
  • Data Integration Approaches
  • Application Specific Solutions
  • Application-Integration Framework
  • ASIS
  • Database Federation
  • Ogsa-DAI
  • Compare ASIS with Ogsa-DAI
  • Digital Libraries
  • SRB
  • Sompels DL
  • Compare ASIS with SRB and Sompels DL

19
Digital Libraries
  • Main focus is publishing and discovering of the
    digital objects.
  • Digital Objects file, URL, SQL command string
    and any string of bits.
  • Collects data from multiple different data
    sources.
  • It is little bit different from the other data
    integration approaches
  • Data curation services such as publishing and
    removing data from the data sources.
  • Ex. SRB (Storage Resource Broker) and Sompels
    Digital Library Approach

20
SRB (1)
  • A federated client server system
  • Each server managing/brokering a set of resources
  • An implementation architecture for
  • Data grids
  • Digital Libraries.
  • Storage resources include digital libraries, MSS,
    UniTree and file systems
  • SRB consists of three components
  • MCAT services,
  • SRB servers to access to storage repositories and
  • SRB clients
  • Mediates access to distributed heterogeneous
    resources
  • Uses MCAT (Metadata Catalog Service) to
    facilitate brokering and attribute based
    querying.
  • Integrates data and metadata

21
SRB (2)
  • Data and storage model
  • Uniform storage interface
  • Resource-specific drivers to map from defined
    storage to interface
  • Storage resources are registered within SRB as
    physical resources
  • Logical resources (LSR) enable replication.
  • LSR one or more than one physical resource
  • Client API refers to LSR. Collections are created
    by LSR
  • Metadata storage model (MCAT)
  • Serves both a core-metadata and domain-dependent
    metadata
  • Core-metadata is a standardized schema like
    Dublin Core
  • Stores metadata about data, collections, users,
    resources, methods
  • Attribute based access and querying, updating
    metadata catalog
  • Implemented as a relational database. Oracle, DB2
    or Sybase
  • Abstraction and Replica information for data
  • Global user name space and authentication
  • Authorization through ACL and tickets

22
SRB (3)
  • Metadata and Metadata Exchange Model
  • MAPS (Metadata Attribute Presentation Structure)
  • Independent of the internal representation of the
    attributes inside the catalog.
  • Provides a uniform interface specification that
    can be used between user applications and the
    MCAT catalog and vice verse.
  • Structures which form the MAPS
  • MAPS_Query_Struct,
  • MAPS_Result_Struct,
  • MAPS_Update_Struct and
  • MAPS_Definition_Struct
  • Mapping from MAPS to other models and exchange
    format. Dublin Core format is under
    implementation.

23
SRB (4)
  • Simple data access scenario
  • SRB server spawns SRB agent to authenticate the
    user/Application by comparing it with information
    stored in MCAT.
  • Find the location in MCAT.
  • Check user request against permissions stored in
    MCAT.
  • SRB agent contacts user with the result of his
    request.
  • SRB agent communicates with the user through a
    port specific to this client session.
  • SRB server chaining scenario (integrated SRBs)
  • First 3 steps from simple data access case.
  • SRB agent contacts remote SRB agent via remote
    SRB server.
  • The second SRB agent returns the pointer to the
    data item to the first SRB agent which passes it
    on to the user.
  • The SRB client interact with the data item
    directly. The federated SRB scheme -SRB server
    acts as a client to another.

24
ASIS vs. SRB
  • SRB doesnt define metadata in XML structure (as
    ASIS does)
  • SRB uses any data but ASIS uses ASL
  • SRB keeps the metadata in Catalogue Services
    (MCAT). ASIS uses XML structured capability
    metadata
  • SRB has central metadata handling approach, ASIS
    has distributed metadata handling approach
  • ASISs data integration is based on metadata
    federation, SRBs data integration is based on
    SRB server federation.
  • Instead of Filters, SRB uses SRB server and
    agents for accessing data resources.

25
Sompels DL (1)
  • Scholarly communication as a network-based
    workflow
  • Instead of Filters and ASL in ASIS, Sompel
    defines repositories and digital objects,
    respectively.
  • Repository is a networked system that provides
    services pertaining to a collection of Digital
    Objects
  • Repositories have common service interfaces.
  • Obtain, Harvest and Put.
  • Two classes of participants.
  • Data providers (DP) and Service providers (SP)
  • SP collect metadata from DPs (via 3 service
    interface) normalize and cluster it to deal with
    duplicates.
  • DP offer some type of search mechanism for their
    own repositories.

26
Sompels DL (2)
  • Data and storage model
  • Data is the abstraction of the Digital Objects
  • Digital Objects Digital data key metadata.
  • Serialization of Digital Objects Surrogates
  • Surrogates
  • Information for the value chains and service
  • information used at repository service
    interfaces.
  • In the XML/RDF format
  • Composed of dataStream and/or Entity tag
    elements.
  • Chained object is defined by keymetadataID or
    providerInfo.
  • Different storage types book repositories,
    teaching object repositories, dataset
    repositories etc.
  • Repositories are active nodes. Repositories
    enable the use and re-use of materials in many
    contexts.

27
Sompels DL (3)
  • Metadata model
  • Surrogates are essentially metadata records for
    objects
  • Based on Dublin Core format with domain specific
    extensions.
  • Dublin core has 15 standard entities to define
    resources.
  • For more details see http//doublincore.org
  • Chaining for integrating data
  • Application/User doesnt need to use workflow
    engine or script to create or run the chain. (As
    in ASIS)
  • Chain (they call value chain) is hidden in the
    surrogates.
  • Surrogates are updated through the common
    interfaces (put obtain and harvest) of the
    resources.
  • Chain is defined in the Entity element in the
    surrogate document with the Lineage sub
    element.
  • Sample chaining scenario
  • A paper might have references to some papers and
    these papers might be references to some other
    papers.
  • Value chain does not stop.
  • Papers have different metadata (value added)
    through value chain

28
ASIS vs. Sompels Approach
  • Instead of Filters and ASL in ASIS, Sompel
    defines repositories and digital objects
    respectively
  • DP correspond to End-Filters, and SP correspond
    to Filters in ASIS
  • ASIS do not have publishing or putting service
    interfaces
  • Obtain corresponds to getData in ASIS
  • Harvest corresponds to getCapabilities in
    ASIS
  • Both have distributed metadata approaches for
    data integration
  • ASIS direct communication between Filters by
    using GetCapabilities interface
  • Sompes DL direct communication between
    repositories and services by using Harvest
    interface
  • Sompels DL uses Dublin Core for the
    representation of the resources ASIS uses its
    own schema.
  • ASIS uses ASL for the representation of the data
    - Sompels approach doesnt have common data
    model.

29
Summary
  • Application-Integration Framework (ASIS)
  • Easy to add new sources
  • Using online Filters providing required adaptors
  • peer-to-peer chain of Filters
  • no central metadata catalog server Distributed
    capability exchange and aggregation
  • SOA
  • Re-usable components (Filters) for different
    applications in predefined domain
  • Implications of Filter services
  • Scalable and Fault-tolerant
  • Load-balancing and caching
  • Dynamically updating capability metadata

30
THANKS !
31
APPENDIX
32
Capability in Grid Services Security
  • XPOLA
  • The infrastructure is built on a peer-to-peer
    chain-of-trust model. No central admins
  • WS-Security compliant
  • Extensible PKI and SAML based
  • Dynamic and reusable (manually or automatically
    generated)
  • Composed of two sectors.
  • Policy document (SAML, lifetime info, binding
    info etc.)
  • Providers signature
  • Existing grid security solutions to fine-grained
    authorization were not addressing general
    Web/Grid services in compliant with Web Services
    security specs.
  • With central admins, other approaches dont
    address dynamic services

33
Sample Capabilities File (too simplified) GIS
Domain
  • lt?xml version'1.0' encoding"UTF-8"
    standalone"no" ?gt lt!DOCTYPE WMT_MS_Capabilities
    SYSTEM "http//toro.ucs.indiana.edu8086/xml/capab
    ilities.dtd"gt ltCapabilities version"1.1.1"
    updateSequence"0"gt      ltServicegt          
    ltNamegtCGL_Mappinglt/Namegt          
    ltTitlegtCGL_Mapping WMSlt/Titlegt          
    ltOnlineResource xmlnsxlink"http//www.w3.org/199
    9/xlink" xlinktype"simple

  • xlinkhref"http//toro.ucs.indiana.edu8086/WMSSe
    rvices.wsdl" /gt      ltContactInformationgt
  • ..
  • lt/ContactInformationgt
  • lt/Servicegt
  •      ltCapabilitygt           ltRequestgt
                  ltGetCapabilitiesgt
                       ltFormatgtWMS_XMLlt/Formatgt
                          ltDCPTypegtltHTTPgtltGetgt
                            ltOnlineResource
    xmlnsxlink"http//w3.org/1999/xlink"
    xlinktype"simple

  • xlinkhref"http//toro.ucs.indiana.edu8086/WMS
    Services.wsdl" /gt                      
    lt/Getgtlt/HTTPgtlt/DCPTypegt               
    lt/GetCapabilitiesgt                ltGetMapgt
                        ltFormatgtimage/GIFlt/Formatgt
                        ltFormatgtimage/PNGlt/Formatgt
                          ltDCPTypegtltHTTPgtltGetgt
                            ltOnlineResource
    xmlnsxlink"http//w3.org/1999/xlink"
    xlinktype"simple

  • xlinkhref"http//toro.ucs.indiana.edu8086/WMS
    Services.wsdl" /gt                      
    lt/Getgtlt/HTTPgtlt/DCPTypegt               lt/GetMapgt
              lt/Requestgt           ltLayergt
                   ltNamegtCaliforniaFaultslt/Namegt
                   ltTitlegtCaliforniaFaultslt/Titlegt
                   ltSRSgtEPSG4326lt/SRSgt
                   ltLatLonBoundingBox minx"-180"
    miny"-82" maxx"180" maxy"82"  / gt           
    lt/Layergt      lt/Capabilitygt lt/Capabilitiesgt

34
Dublin Core
  • Challenge of resource description and discovery
  • Language for making a particular class of
    statements about resources
  • There 2 namespaces Dublin Core element set
    (dc)and Dublin Core qualifiers (dcq ex.
    dcqiso8601).
  • Some of Dublin core metadata element set
  • Title (dctitle), subject, description, creator,
    publisher, type, format, source, language, rights
  • Using DC in RDF, specifications for DC in RDF
    (work in progress)
  • Resource has(verb) property(dccreator)
    X(dcAhmet)

35
Sample Dublin Core
http//www.ils.unc.edu/mrc/jcdl2006/slides/kunze.p
df
36
Open Archive InitiativeOAI
37
OAI
  • Deals with e-print server world
  • Need to develop services that permitted searching
    across papers housed at multiple repositories
  • Repositories also needed capabilities to
    automatically identify and copy papers that had
    been deposited in them.
  • Definition of an interface to permit e-print
    servers to expose the metadata for the papers
    that it held.
  • Service providers with similar metadata standards
    need to harvest this metadata
  • Service providers act as a federation of
    repositories, by indexing documents, so that
    multiple collections cen be searched as though
    they form a single collection

38
OAI-PMH
  • For the variety of the communities engaged in
    publishing content on the Web
  • Any networked server can emplly the protocol to
    enable service providers to collect its metadata
  • HTTP-based request-response transaction
  • Service Providers
  • Harvest metadata from Data Providers using the
    OAI protocol and use the returned metadata as a
    basis for building value-added services.
  • Data Providers (repositories)
  • Adopt OAI technical as a means of exposing
    metadata about their content.

39
Comments on OAI
  • OAI-PMH is ultimately only as useful as the
    metadata it transports.
  • The tendency of implementers to almost
    exclusively apply the lowest common denominator
    of unqualified dublin core makes it difficult to
    implement more advanced search interface
    features.
  • Content providers should prefer more expressive
    metadata schema like MARC or qualified DC and
    find ways to augment human-generated descriptive
    metadata.

40
Sompels Digital Library Approach
41
Sompels ApproachHierarchy steps
http//msc.mellon.org/Meetings/Interop/lagoze_data
_model.pdf
42
Sompels DLData Model
msc.mellon.org/Meetings/Interop/lagoze_data_model.
pdf
43
Ogsa-DAI
44
Ogsa-DAI Figure
http//www.globus.org/grid_software/data/dai.php
45
Perform Document
http//www.ogsadai.org.uk/documentation/ogsadai-ws
i-2.2/doc/interaction/Perform.html
46
MCS
  • MCS present a design of Metadata Catalog Service
    that provides mechanism for storing and accessing
    descriptive metadata attributes
  • Requirements Store domain-independent
    attributes, user-defined attributes, query with a
    set of attributes, query with a logical name,
    authentication, authorization and auditing
  • Allows users to discover data sets based on the
    value of descriptive attributes, rather then
    requiring to know specific names or physical
    locations of data items

47
MCAT vs. MCS
  • MCAT can be used just with SRB
  • MCS can be used just in OGSA architecture
  • MCAT stores both physical and logical addresses
  • MCS stores logical metadata attributes and
    handles that can be resolved by a data location
    or data access services.
  • They can both be extended for serving
    application-specific metadata, but they dont
    have generalized way for doing that.

48
SRB
49
SRB
50
CLIENT
  • Example interaction with SRB using Scommands
  • Sinit
  • Start interaction with SRB
  • Spwd
  • Display current position within SRB repository
  • Smeta -i I UDSMD0author I UDSMD1bob
    myfile
  • Add metadata describing the author the file
  • Smeta -i I UDSMD0author I
    UDSMD1arthur
  • Search for files with author metadata set as
    arthur
  • Sget myFile
  • Copy myFile from SRB to local storage
  • Sreplicate S anotherResource myFile
  • Create a replica of myFile on anotherResource
  • Srm myFile
  • Remove myFile (and all replicas) from SRB
  • Sexit
  • End interaction with SRB
Write a Comment
User Comments (0)
About PowerShow.com