Supporting Queries with Imprecise Constraints presentation

About This Presentation

Transcript and Presenter's Notes

Title: Supporting Queries with Imprecise Constraints

1
Supporting Queries with Imprecise Constraints

Ullas Nambiar
Dept. of Computer Science
University of California, Davis

Subbarao Kambhampati Dept. of Computer
Science Arizona State University
18th July, AAAI -06, Boston, USA
2
Dichotomy in Query Processing

IR Systems
User has an idea of what she wants
User query captures the need to some degree
Answers ranked by degree of relevance

Databases
User knows what she wants
User query completely expresses the need
Answers exactly matching query constraints

3
Why Support Imprecise Queries ?
4
Others are following
5

What does Supporting Imprecise Queries Mean?

The Problem Given a conjunctive query Q over a
relation R, find a set of tuples that will be
considered relevant by the user.
Ans(Q) xx ? R, Rel(xQ,U) gtc
Constraints
Minimal burden on the end user
No changes to existing database
Domain independent

Relevance Assessment
6
Assessing Relevance Function Rel(xQ,U)

We looked at a variety of non-intrusive relevance
assessment methods
Basic idea is to learn the relevance function for
user population rather than single users
Methods
From the analysis of the (sample) data itself
Allows us to understand the relative importance
of attributes, and the similarity between the
values of an attribute
ICDE 2006WWW 2005 poster
From the analysis of query logs
Allows us to identify related queries, and then
throw in their answers
WIDM 2003 WebDB 2004
From co-click patterns
Allows us to identify similarity based on user
click pattern
Under Review

7
Our Solution AIMQ
8
The AIMQ Approach
For the special case of empty query, we start
with a relaxation that uses AFD analysis
9
An Illustrative Example

Relation- CarDB(Make, Model, Price, Year)
Imprecise query
Q - CarDB(Model like Camry, Price like
10k)
Base query
Qpr - CarDB(Model Camry, Price 10k)
Base set Abs
Make Toyota, Model Camry, Price
10k, Year 2000
Make Toyota, Model Camry, Price
10k, Year 2001

10
Obtaining Extended Set

Problem Given base set, find tuples from
database similar to tuples in base set.
Solution
Consider each tuple in base set as a selection
query.
e.g. Make Toyota, Model Camry, Price
10k, Year 2000
Relax each such query to obtain similar precise
queries.
e.g. Make Toyota, Model Camry, Price
, Year 2000
Execute and determine tuples having similarity
above some threshold.
Challenge Which attribute should be relaxed
first?
Make ? Model ? Price ? Year ?
Solution Relax least important attribute
first.

11
Least Important Attribute

Definition An attribute whose binding value
when changed has minimal effect on values binding
other attributes.
Does not decide values of other attributes
Value may depend on other attributes
E.g. Changing/relaxing Price will usually not
affect other attributes but changing Model
usually affects Price
Dependence between attributes useful to decide
relative importance
Approximate Functional Dependencies Approximate
Keys
Approximate in the sense that they are obeyed by
a large percentage (but not all) of tuples in the
database
Can use TANE, an algorithm by Huhtala et al
1999

12
Deciding Attribute Importance

Mine AFDs and Approximate Keys
Create dependence graph using AFDs
Strongly connected hence a topological sort not
possible
Using Approximate Key with highest support
partition attributes into
Deciding set
Dependent set
Sort the subsets using dependence and influence
weights
Measure attribute importance as

Attribute relaxation order is all non-keys first
then keys
Greedy multi-attribute relaxation

13
Tuple Similarity

Tuples obtained after relaxation are ranked
according to their
similarity to the corresponding tuples in base
set
where Wi normalized influence weights, ? Wi 1
, i 1 to Attributes(R)
Value Similarity
Euclidean for numerical attributes e.g. Price,
Year
Concept Similarity for categorical e.g. Make,
Model

14
Categorical Value Similarity

Two words are semantically similar if they have a
common context from NLP
Context of a value represented as a set of bags
of co-occurring values called Supertuple
Value Similarity Estimated as the percentage of
common Attribute, Value pairs
Measured as the Jaccard Similarity among
supertuples representing the values

ST(QMakeToyota)
Model Camry 3, Corolla 4,.
Year 20006,19995 20012,
Price 59954, 65003, 40006
Supertuple for Concept MakeToyota
15
Value Similarity Graph
16
Empirical Evaluation

Goal
Evaluate the effectiveness of the query
relaxation and similarity estimation
Database
Used car database CarDB based on Yahoo Autos
CarDB( Make, Model, Year, Price, Mileage,
Location, Color)
Populated using 100k tuples from Yahoo Autos
Census Database from UCI Machine Learning
Repository
Populated using 45k tuples
Algorithms
AIMQ
RandomRelax randomly picks attribute to relax
GuidedRelax uses relaxation order determined
using approximate keys and AFDs
ROCK RObust Clustering using linKs (Guha et al,
ICDE 1999)
Compute Neighbours and Links between every tuple
Neighbour tuples similar to each other
Link Number of common neighbours between two
tuples
Cluster tuples having common neighbours

17
Efficiency of Relaxation
Guided Relaxation
Random Relaxation
18
Accuracy over CarDB

14 queries over 100K tuples
Similarity learned using 25k sample
Mean Reciprocal Rank (MRR) estimated as
Overall high MRR shows high relevance of
suggested answers

19
Handling Imprecision Incompleteness

Incompleteness in data
Databases are being populated by
Entry by lay people
Automated extraction
E.g. entering an accord without mentioning
Honda

Imprecision in queries
Queries posed by lay users
Who combine querying and browsing

General Solution Expected Relevance Ranking
Challenge Automated Non-intrusive assessment
of Relevance and Density functions
20
Handling Imprecision Incompleteness

Write a Comment

User Comments (0)

About PowerShow.com

Supporting Queries with Imprecise Constraints PowerPoint PPT Presentation