Title: Querying for Information Integration: How to go from an Imprecise Intent to a Precise Query?
1Querying for Information Integration How to go
from an Imprecise Intent to a Precise Query?
- Aditya Telang
- Sharma Chakravarthy, Chengkai Li
2Motivation
- Retrieve castles near London that are reachable
by train in less than 2 hours - Find 3-bedroom houses in Houston within 2 miles
of a school and within 5 miles of a highway and
priced under 250,000 - Retrieve French restaurants within 1 mile of
IMAX Theater in Dallas, Texas
3Current Scenario
- Retrieve castles near London that are
- reachable by train in less than 2 hours
- Decision Making Process - Manually Combine
Results to arrive at a decision
London Train schedules
Trains from London
Castles Near London
4Ideal Scenario
Intent Retrieve castles near London that are
reachable by train in less than 2 hours
Information Integration System
Actual Results for the intent
5The InfoMosaic Approach
6Query Specification
Query Castle within 2 hours by train from London
- Query Bank within 1 mile of University of
Texas, Arlington
7How to specify a query?
- Search Method (e.g., Google)
- Just needs to search for the keyword in a set
of documents - Get list of documents and post-process (rank,
cluster, classify, etc.) - In an Integration scenario, this doesnt work
- bank D1
- University of Texas, Arlington D2
- 1 (out of 1 mile) is ignored
- Intersecting documents returned will not generate
results desired
8How to specify a query?
- Database Method (e.g., SQL)
- Too rigid
- Need to know database (or source) and its
corresponding attributes - SELECT T1.a1, T2.a2 FROM T1, T2 WHERE
- Web is not organized as a database hence exact
mapping between sources and attributes is not
feasible and not available
9How to specify a query?
- Natural Language
- Ideal mechanism
- Inherently hard considering ambiguities of
natural language. - school institution for education group of fish
- Mechanisms such as Question-Answering frameworks
focus on sophisticated language models built for
specific domains independently. - Incorrect assumption in a integration scenario
10Query Specification
SELECT castle.name, FROM castle_DB WHERE Castle.
location London
Relation containing tuples
List of documents retrieved from Web containing
text castle near London
11Query Specification
SELECT castle. WHERE castle.place London
No idea about user intent Castle building,
move in chess, ?
Information Integration
No idea about source, schema, attributes, etc. No
idea about how to pose a query
12Proposed Approach
- Approach refine-as-you-input
- Approach verify-after-input
13Approach 1 Refine-as-you-input
- Based on most popular paradigm of querying used
today keyword search - Input Set of keywords/concepts (e.g., castle,
train, ) - Output Set of 1 or more Precise Structured Query
- Challenge
- Keyword Resolution entity, attribute, value?
- Generating Query from minimal information
- Problem
- Could result in too many non-relevant queries
- Positive
- Paradigm accepted by Web, IR and even DB
community !!!
14Approach 1 Refine-as-you-input
User Interaction
15Approach 1 Verify-after-input
- Based on a rigorous method of formulation queries
similar to SQL - Input user filled template based
- Output single precise query
- Problem
- Users dont like filling too many details
- Coming up with a unique template across domains
- Positive
- Less ambiguous
- Reduced number of user interactions
16Approach 1 Verify-after-input
17Evaluation Plan
- Testing the approaches on RDBMS where the schema
and output is known - Actual user studies
18Related Work
19Future Work
- Perform extensive experiments to prove the
validity of the proposed approaches - Address other issues in information integration
- Current focus Ranking TelangDBRank07
20Thank You !