Muse:%20A%20System%20for%20Understanding%20and%20Designing%20Mappings - PowerPoint PPT Presentation

About This Presentation
Title:

Muse:%20A%20System%20for%20Understanding%20and%20Designing%20Mappings

Description:

Muse: A System for Understanding and Designing Mappings ... Muse Overview ... Muse-G probes every attribute in S ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 2
Provided by: Bog6
Category:

less

Transcript and Presenter's Notes

Title: Muse:%20A%20System%20for%20Understanding%20and%20Designing%20Mappings


1
Muse A System for Understanding and Designing
Mappings
Bogdan Alexe Laura Chiticariu Renée J.
Miller Daniel Pepper Wang-Chiew Tan
UC Santa Cruz U. of
Toronto UC Santa Cruz
Motivation
Muse Overview
  • Muse is a mapping design wizard that uses data
    examples to help designers understand, design and
    refine schema mappings
  • In Muse, the designer works with data examples
    rather than with complex specifications to
    understand the semantics of a mapping
  • Muse uses real data examples whenever possible,
    otherwise it constructs synthetic examples
  • Muse consists of two components Muse-G (design
    of desired nesting semantics for mappings) and
    Muse-D (choosing the desired interpretation of
    ambiguous mappings)
  • Schema mapping relationship between a source
    database schema and a target database schema
  • Designing a schema mapping is a fundamental
    problem in information integration
  • Specifying a semantically correct schema mapping
    is usually a complex task
  • Automatic tools can suggest potential mappings
  • Ensuring mapping correctness still requires
    intricate manual work
  • Few tools are available for helping a designer
    understand and design alternative mappings

Designing Nesting Semantics with Muse-G
  • Nesting semantics are expressed through grouping
    functions, which are defined for each nested set
    in the target schema
  • A grouping function is a form of Skolem
    function, with atomic attributes as parameters
  • Example grouping function from mapping m2
  • SKProjs(ltall attributes of c, p and e
    gt) target Project records are grouped
    according to the values of all attributes of the
    Company, Project and Employee source records

Step 2 Probing on the cname attribute
OrgDB Rcd Orgs Set of Org Rcd
oname Projects Set
of Project Rcd
pname
manager Employees Set of
Employee Rcd eid
ename
CompDB Rcd Companies Set of
Company Rcd cid
cname location Projects Set
of Project Rcd pid
pname cid
manager Employees Set of
Employee Rcd eid
ename contact
Scenario 2 OrgDB Orgs IBM
ProjectsSK(y) DB e4 WiFi
e6 SBC ProjectsSK(y) DB
e4 WiFi e6 Employees e4 John
e6 Kat
Example source Companies 11 IBM NY 14
SBC NY Projects P1 DB 11 e4 P4 WiFi 14
e6 Employees e4 John x234 e6 Kat x331
Target instances Scenario 1 OrgDB Orgs
IBM ProjectsSK(IBM,y) DB e4
SBC ProjectsSK(SBC,y) WiFi e6
Employees e4 John e6 Kat
f1
  • Example Designing the grouping function for the
    target Projects set
  • Suppose the set of possible arguments is S
    cid, cname, location
  • Muse-G probes every attribute in S
  • At each probe, a small carefully chosen source
    instance is considered, from which two
    differentiating target instances are obtained
    one includes the probed attribute in the grouping
    function (Scenario 1 below), and the other omits
    it (Scenario 2 below).

f2
y subset of NY
The designer chooses scenario 1 (includes cname
in the grouping function)
Step 3 Probing on the location attribute
m1 for c in CompDB.Companies exists o in
OrgDB.Orgs where c.cnameo.oname and
o.Projects SKProjs(c.cid,c.cname,c.location)
m2 for c in CompDB.Companies, p in
CompDB.Projects, e in CompDB.Employees
satisfy p.cidc.cid and e.eidp.manager
exists o in OrgDB.Orgs, p1 in o.Projects, e1 in
OrgDB.Employees satisfy
p1.managere1.eid where c.cnameo.oname
and e.eide1.eid and e.enamee1.ename
and p.pnamep1.pname and
o.Projects SKProjs(ltall attributes of c, p and
e gt) m3 for e in CompDB.Employees exists e1 in
OrgDB.Employees where e.eid e1.eid and
e.enamee1.ename
Example source Companies 11 IBM NY 13 IBM
SF Projects P1 DB 11 e4 P2 Web 13
e5 Employees e4 John x234 e5 Anna x888
Target instances Scenario 1 OrgDB Orgs
IBM ProjectsSK(IBM,NY) DB e4
IBM ProjectsSK(IBM,SF) Web e5
Employees e4 John e5 Anna
Scenario 2 OrgDB Orgs IBM
ProjectsSK(IBM) DB e4 Web
e5 Employees e4 John e5 Anna
The designer chooses scenario 2 (excludes
location from the grouping function)
Conclusion the desired grouping function for
Projects is SK(cname)
Extensions
Choosing Desired Mapping Interpretation with
Muse-D
CompDB Rcd Projects Set of
Project Rcd pid
pname manager
tech-lead Employees Set of
Employee Rcd eid
ename contact
OrgDB Rcd Projects Set of
Project Rcd pname
supervisor email
  • The mapping scenario on the left is ambiguous
    it can be interpreted in several ways
  • e.g. the project supervisor can be either the
    manager or the tech-lead
  • In total, there are four alternative
    interpretations
  • Key idea of Muse-D provide an example source
    instance to illustrate the four interpretations
    in a compact way
  • Muse-G can take advantage of constraints on the
    source schema (such as keys, and more generally,
    functional dependencies)
  • The designer can refine the desired nesting
    semantics incrementally

Example source Projects P1 DB e4
e5 Employees e4 John john_at_ibm e5 Anna
anna_at_ibm
Target instance Orgs Projects DB
John john_at_ibm Anna anna_at_ibm
Ambiguous mapping
ma for p in CompDB.Projects, e1 in
CompDB.Employees, e2 in
CompDB.Employees satisfy
e1.eidp.manager and
e2.eidp.tech-lead exists p1 in
OrgDB.Projects where p.pnamep1.pname and
(e1.enamep1.supervisor
or e2.enamep1.superviso
r) and
(e1.contactp1.email
or e2.contactp1.email)
Choice values for supervisor and email (the
designer makes one selection for each attribute)
Write a Comment
User Comments (0)
About PowerShow.com