Interactive Query Formulation over Web ServiceAccessed Sources - PowerPoint PPT Presentation

About This Presentation
Title:

Interactive Query Formulation over Web ServiceAccessed Sources

Description:

CNET Computer. PCWorld Portals. Compatible Combinations. of Computers, Routers. and Printers. CNET's Top Combinations. CNET's Search Desktops. PCWorld's Product ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 32
Provided by: michailpe
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: Interactive Query Formulation over Web ServiceAccessed Sources


1
Interactive Query Formulationover Web
Service-Accessed Sources
  • Michalis Petropoulos
  • Alin Deutsch
  • Yannis Papakonstantinou

ACM SIGMOD, June 2006
2
Large-Scale Data Integration Systems
?
Web Domain
Web Forms Reports
  • CNETs Top Combinations
  • CNETs Search Desktops
  • PCWorlds Product Finder

End User
?
Application Domain
?
?
  • CNET Computer
  • PCWorld Portals

Developer
Application
Application
?
Integration Domain
Mediator
Integrated Schema
Compatible Combinations of Computers, Routers an
d Printers
Integration Engineer
Source Domain
  • Dell Computers by CPU
  • Cisco Routers by Rate
  • HP Printers by Speed

?
Web Service
Web Service
Web Service

Source Owner
Data Source
Data Source
  • Dell Computers
  • Cisco Routers
  • HP Printers

Source Schema
Source Schema

3
Large-Scale Data Integration Systems
?
Web Domain
Web Forms Reports
End User
?
What queries can the mediator answer for me? CL
IDE
Application Domain
?
?
Developer
Application
Application
?
Integration Domain
Mediator
Integrated Schema
Integration Engineer
Source Domain
?
Web Service
Web Service
Web Service

Source Owner
Data Source
Data Source
Source Schema
Source Schema

4
Running Example
Parameterized Views
Dell
Cisco
  • Schema
  • Computers(cid, cpu, ram, price)
  • NetCards(cid, rate, standard, interface)
  • Views
  • V1 ComByCpu(cpu) ? (Computer)
  • SELECT DISTINCT Com1.
  • FROM Computers Com1
  • WHERE Com1.cpucpu
  • V2 ComNetByCpuRate(cpu, rate) ?
  • (Computer, NetCard)
  • SELECT DISTINCT Com1., Net1.
  • FROM Computers Com1, Network Net1
  • WHERE Com1.cidNet1.cid

Schema Routers(rate, standard, price, type)
Views V3 RouWired() ? (Router) SELECT
DISTINCT Rou1. FROM Routers Rou1 WHERE Rou1.t
ype'Wired' V4 RouWireless() ? (Router) SE
LECT DISTINCT Rou1. FROM Routers Rou1 WHERE R
ou1.type'Wireless'
Wired Routers
Computers for a given cpu
Wireless Routers
Computers NetCards for a given cpu rate
  • Conjunctive Queries CQ
  • Equality Comparison Conditions
  • Parameters

5
Running Example
Integrated Schema
?
?
Developer
Application
  • Integrated schema puts togetherthe Dell and
    Cisco schemas
  • Attribute Associations
  • (Computers.cid, NetCards.cid)
  • (NetCards.rate, Routers.rate)
  • (NetCards.standard, Routers.standard)

Mediator
Integrated Schema
V1
V3
V2
V4
Dell
Cisco
6
Sophisticated Mediators MakeFeasible Queries
Hard to Predict
  • Feasible Queries FQ
  • Equivalent CQ query rewritings using the views
  • Might involve more than one views
  • Order might matter

Feasible
Query Get all P4 Computers, together with thei
r NetCards and their compatible Wireless Route
rs
Query Get all Computers
Infeasible
E
Mediator
B
D
Mediator
A
C
RouWireless()
ComNetByCpuRate(P4, 10)
V1
ComNetByCpuRate(P4, 54)
V4
V2
7
Problem
  • Large number of sources
  • Large number of views (web-services)
  • Mediator capabilities
  • Developer formulates an application query
  • Is an application query feasible?
  • If not, how do I know which ones are feasible?
  • Previous options
  • The developer had to browse the view definitions
    and somehow formulate a feasible query
  • Or formulate queries until a feasible one is
    found(trial-and-error)
  • No system-provided guidance

8
The CLIDE Solution
CLIDE
?
?
Developer
Application
  • A query formulation interface, which
    interactively guides the developer toward
    feasible queries by employing a coloring scheme

Mediator
Integrated Schema
V1
V3
V2
V4
Dell
Cisco
9
QBE-Like Interfaces
Microsoft SQL-Server
10
CLIDE Interface
Feasibility Flag
Selection Boxes
  • Table, selection, projection and join actions
  • Feasibility Flag
  • Color-based suggestions

11
Example Interaction
Snapshot 1
  • Yellow ? required action
  • All feasible queries require this action
  • White ? optional action
  • Feasible queries can be formulatedw/ or w/o
    these actions

12
Example Interaction
Snapshot 2
  • Blue ? required choice of action
  • At least one feasible query cannot be formulated
    unless this action is performed

C
Mediator
A
ComByCpu(P4)
B
V1
13
Example Interaction
Snapshot 2
  • Blue ? required choice of action
  • At least one of them is required to be taken in
    order to reach a feasible query
  • Join Lines
  • Only yellow and blue are displayed
  • Must appear in Attribute Associations

14
CLIDE Properties
  • Completeness of Suggestions
  • Every feasible query can be formulated by
    performing yellow and blue actions at every step
  • Minimality of Suggestions
  • At every step, only a minimal number of actions
    is suggested, i.e., the ones that are needed to
    preserve completeness
  • Rapid Convergence By Following Suggestions
  • The shortest sequence of actions from a query to
    any feasible query consists of suggested actions

15
Example Interaction
Snapshot 3
  • ? any other constant
  • Red ? prohibited action
  • Does not appear in any feasible query
  • Lead to Dead End state

16
Example Interaction
Snapshot 4
F
Mediator
A
D
RouWireless()
ComNetByCpuRate(P4, rate)
B
E
V4
V2
17
CLIDE Properties
  • Completeness of Suggestions
  • Every feasible query can be formulated by
    performing yellow and blue actions at every step
  • Minimality of Suggestions
  • At every step, only a minimal number of actions
    is suggested, i.e., the ones that are needed to
    preserve completeness
  • Rapid Convergence By Following Suggestions
  • The shortest sequence of actions from a query to
    any feasible query consists of suggested actions

18
Interaction Graph
Join Action
Table Action
Selection Action
Com1.cidNet1.cid
Com1.cpuP4
Com1
Com1.ram
Rou1
Com1.price
Net1









  • Nodes are queries
  • One for each q?CQ
  • Edges are actions
  • Table, selection, projection and join actions
  • Green nodes are feasible queries
  • Infinitely big structure
  • All CQ queries
  • All possible combinations of actions formulating
    them

19
Interaction Graph Colors
  • Yellow action ?
  • Every path from current node n to a feasible node
    contains ?
  • Blue action ?
  • At least one path from current node n to a
    feasible node nF contains ?
  • There is not path from n to nF that contains a
    feasible node
  • Red action ?
  • No path to a feasible node contains ?

Current Node
Com1.cid

Com1.cpu

Current Node
Com1.cid


Com1.cpu
Com1.cpu
Net1
Com1.cidNet1.cid
Net1.rate54Mbps
Com1.cidNet1.cid




Com1.ram
Com1.cidNet1.cid


Net1.rate54Mbps
Net1.rate54Mbps
Com1.price



Com1.cpu
Rou1
Net1.rateRou1.rate
Com1.cidNet1.cid
Rou1
Net1






Com2
Rou1
Com2

Com2
Com2.cidNet1.cid
Com2.cpuP4
Net1.rate54Mbps





20
CLIDE Properties
  • Rapid Convergence By Following Suggestions
  • The shortest path from the current node to any
    feasible node consists of suggested actions

21
CLIDE Architecture
?
Actions
Front-End
User
Current Query
Colored Actions Feasibility Flag
Back-End
Color Algorithm
Seed Queries SQ
Parameters Algorithm
Closest Feasible Queries FQC
Closest Feasible Queries Algorithm
Aliases Collapse Rule
Minimal Feasible Extension Queries
Maximally-Contained Rewriter
Views
Schemas
Attribute Associations
  • Back-End invoked every time the user performs an
    action
  • i.e., the user arrives at a new node in the
    interactions graph

22
Color DeterminedBy a Finite Set of Feasible
Queries
Challenge Infinitely Many Feasible Queries
?
Radius



n


Closest Feasible Queries FQC

Solution
  • Finite Set of Feasible Queries
  • Closest Feasible Queries FQC
  • FQC is sufficient to color actions

Challenge How far can the Closest Feasible
Queries FQC be?
23
Closest Feasible Queries FQC Algorithm
Solution Based on Maximally Contained Queries
FQMC
Maximally Contained Queries FQMC
pL Radius



n


Closest Feasible Queries FQC
  • Compute maximally contained queries FQMC
  • Radius pL is the longest path to a maximally
    contained query
  • Theorem All FQC queries are reachable via a
    path of length p ? pL

Challenge The set of nodes within pL can be too
many to explore
24
Closest Feasible Queries FQC Algorithm
Solution Find the Closest Feasible Queries
Directly
Maximally Contained Queries FQMC



n


Closest Feasible Queries FQC

More feasible nodes
  • Theorem All queries in FQMC are in FQC
  • But not all queries in FQC are in FQMC
  • Collapse Aliases to compute FQC \ FQMC

25
Remaining Issues of the Algorithm
  • Color remaining actions white or red
  • Handling projections
  • Dealing with parameterized queries

26
CLIDE Implementation Issues
  • Maximally contained queries need to be
    minimized(affect CLIDEs rapid convergence and
    minimality)
  • ? Implemented the minimization module from
    scratch
  • ? MiniCon computes mappings between the query
    and the views
  • ? Exposed these mappings to guide minimization
  • Optimizations to bring the response time below 3
    sec
  • ? Amortize the computation across multiple
    interactions steps by observing that query is
    changed minimally at every step
  • ? Instead of calling MiniCon from scratch at
    every step, we incrementally compute the closest
    feasible queries

27
Thank you
  • Play with our demo
  • http//www.clide.info

28
Maximally Contained Queries FQMC
Maximally Contained Query
Query Q2 Get all Computers with a given cpu
Query Q1 Get all Computers
  • Assuming fixed SELECT clause (projection list)
  • Covered extensively in literature
  • MiniCon, Bucket, InverseRules Algorithms
  • FQMC is finite

29
Closest Feasible Queries FQC Algorithm
Solution Collapse Aliases
Maximally Contained Feasible Queries FQMC

Closest Feasible Queries FQC

n



  • Collapse Aliases to compute FQC \ FQMC
  • Check satisfiability

30
Color Algorithm
  • Yellow and Blue
  • An action ? is colored based on which closest
    feasible queries it appear in
  • Yellow, if ? appears in all queries in FQC
  • Blue, if ? appears in at least one (but not all)
    query in FQC
  • White and Red
  • Attach Maximum Projection Lists to Closest
    Feasible Queries
  • Projections that can be added to a feasible
    query, without compromising feasibility
  • Projection ? is white if in the maximum
    projection list
  • Color selections based on projections

31
CLIDE Implementation Optimizations
  • Views expansion introduce redundancy
  • Affects CLIDEs rapid convergence and minimality
  • Efficient containment test crucial to redundancy
    removal

32
CLIDE Performance
Chains of Stars
  • Queries

A-span 7 B-span 3 Selections 4,6,8,10
A
Write a Comment
User Comments (0)
About PowerShow.com