A Principled Approach to Data Integration and Reconciliation in Data Warehousing - PowerPoint PPT Presentation

Loading...

PPT – A Principled Approach to Data Integration and Reconciliation in Data Warehousing PowerPoint presentation | free to download - id: 56b11a-ZWUxN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

A Principled Approach to Data Integration and Reconciliation in Data Warehousing

Description:

Title: A Principled Approach to Data Integration and Reconciliation in Data Warehousing Author: Alan Wessman Last modified by: Continuing Ed Created Date – PowerPoint PPT presentation

Number of Views:274
Avg rating:3.0/5.0
Slides: 17
Provided by: AlanWe2
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: A Principled Approach to Data Integration and Reconciliation in Data Warehousing


1
A Principled Approach to Data Integration and
Reconciliation in Data Warehousing
  • Diego Calvanese
  • Giuseppe De Giacomo
  • Maurizio Lenzerini
  • Daniele Nardi
  • Riccardo Rosati
  • Presented by Alan Wessman

2
Introduction
  • Problem Acquire data from a set of sources for a
    particular application
  • Typical architecture wrappers and mediators
  • Core problem specify and implement mediators
  • Paper focus Data warehouses

3
Data Warehouse Integration
  • Most sources internal to organization
  • Need global corporate view of data
  • Conceptual model defines sources and data
    warehouse (local-as-view)
  • Three levels of architecture
  • Conceptual Global model
  • Logical Query specifications for sources and
    warehouse
  • Physical Wrappers and mediators implementing
    query specifications

4
Architecture
5
Specifying Logical Schemas
  • For each table of source S, create an adorned
    query
  • Head Table name, columns
  • Body Content of table (query over conceptual
    model)
  • Adornment
  • Domains (data types) of columns
  • Key attributes

6
Adorned Query Example
Halibut(Date, Price) lt- Menu(Date, Halibut,
Price) Price Lira, Date
JulianDate Swordfish(Date, Price) lt- Menu(Date,
Swordfish, Price) Price Lira, Date
JulianDate SushiMenu(TunaPrice, SquidPrice, Date)
lt- Menu(Date, Tuna, TunaPrice), Menu(Date,
Squid, SquidPrice) TunaPrice Yen,
SquidPrice Yen, Date JulianDate
7
Query Consistency
  • Let Q be an adorned query and B its body.
  • Let M be the conceptual model.
  • B is inconsistent wrt M if for every
    interpretation of M, evaluation of B is empty
  • Q is inconsistent wrt M if either B is
    inconsistent or the annotations are inconsistent
  • Inference techniques exist for checking query
    consistency

8
Interschema Correspondences
  • Specify how data in different schemas relates
  • Non-materialized relational tables (computed
    on-demand)
  • Like adorned query but annotations identify
    helper programs
  • Reusable by other correspondences

9
Interschema Correspondences
  • Three types of correspondence
  • Conversion
  • How data from one source is converted into data
    fitting a different schema
  • Matching
  • How data from different sources matches
  • Reconciliation
  • How data from different sources is reconciled to
    become data in the warehouse

10
Conversion Correspondence
  • How data from one source is converted into data
    fitting a different schema
  • convert(x, y) lt- conj(x, y, z)
  • through program(x, y, z)
  • conj Conjunctive query, specifies when
    conversion applies
  • program Program that performs the conversion
  • x Input tuple of values satisfying conditions
    for x in conj
  • y Output tuple of values satisfying conditions
    for y in conj
  • z Additional parameters required by program

11
Matching Correspondence
  • How data from different sources matches
  • match(x1, , xk) lt- conj(x1, , xk, z)
  • through program(x1, , xk, z)
  • Differs from Conversion Correspondence in use of
    k tuples that may be matched
  • program returns true if the k tuples match

12
Reconciliation Correspondence
  • How data from different sources is reconciled to
    the warehouse
  • reconcile(x1, , xk, z) lt- conj(x1, , xk,
    z, w)
  • through program(x1, , xk, z, w)
  • z Data warehouse tuple result of
    reconciliation.
  • w Additional parameters (like z in previous
    slides)

13
Reusing Correspondences
  • Only reuse if previously defined
  • Example 1
  • match(x, y) lt- convert1(x, z),
    convert2(y, z), conj(x, y, z, w)
  • through none
  • Example 2
  • reconcile(x, y, z) lt- convert1(x,
    w1), convert2(y, w2), match1(w1,
    w2), convert3(w1, z), conj(x, y, z, w)
  • through none

14
Specifying Mediators
  • Aim Specify for each relation in warehouse how
    the tuples should be constructed from the sources
  • Task Materialize a new relation T in the
    warehouse
  • Steps
  • Specify T as an adorned query q lt- q c1, ,
    cn
  • Look for a rewriting of q in terms of queries q1,
    , qs corresponding to materialized views in the
    warehouse
  • Look for a rewriting of (what remains of q) in
    terms of queries corresponding to tables in the
    sources and the conversion, matching, and
    reconciliation correspondences
  • Resulting query is specification for the mediator
    for T

15
Computing the Rewriting
  • Rewriting typically needs to merge results of
    several queries
  • Produce set of merging clausesFormmerging
    tuple-spec1 and and tuple-specnsuch that
    matching-conditioninto tuple-spect1 and and
    tuple-spectm
  • Generates template designer specifies such
    that and into parts, or writes custom merging
    clauses

16
Conclusion
  • Start with conceptual model and several types of
    correspondences
  • Query rewriting algorithm generates mediator
    specifications
  • Designer fills in any remaining details
  • No empirical results
About PowerShow.com