Expressing and Optimizing the Similaritybased Queries In SQL - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Expressing and Optimizing the Similaritybased Queries In SQL

Description:

Photo. Age. ID. Animal. UDP: written by users, supported by DBMS. UDT: ... A quick review of RDBMS and SQL. http://mason.gmu.edu/~xywang/infs797/lecture1.pdf ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 20
Provided by: LG31
Category:

less

Transcript and Presenter's Notes

Title: Expressing and Optimizing the Similaritybased Queries In SQL


1
Expressing and Optimizingthe Similarity-based
QueriesIn SQL
  • by Like Gao (lgao_at_gmu.edu)
  • ISE Department
  • GMU
  • Advisor Prof. X. Sean Wang

2
What is similarity-based search?
3
Example time series case
  • Similarity Euclidean Distance, Correlation
    Coefficient
  • Similarity-based Search

Dissimilar Distance2.5
Similar Distance0.5
t
t
Pattern set
Query object
Nearest
Near
4
Example not similarity-based search!
5
Similarity-based Search is content based!
6
Query optimization in DBMS? What is
available/unavailable?
7
Query Optimization
Select ID From Employee Where Salary gt
100K And Age lt 40
Query Optimization is to find the query
execution plan that has the smallest cost
Employee
Supported by current DBMS!
8
UDT, UDF(P) are supported
Select ID From Animal Where Photo is most like
???.jpg And animal in Photo is puppy And Age
lt 4
UDP written by users, supported by DBMS
Animal
UDT supported by DBMS i.e., BLOB
9
But query optimization?
Select ID From Animal Where UDP1
(similarity-based search 1) And UDP2
(similarity-based search 2) And Age lt 4
The optimization for similarity-based queries can
not be properly optimized by current
DBMS! (traditional DBMS evaluates UDP as early as
possible!)
10
Summary
RDBMS
Build-in Optimization Algorithms
?
Similarity-Based Queries (SQL)
Answer
Similarity-functions (UDP), provided by users
11
Expressing and Optimizingthe Similarity-based
QueriesIn SQL
End of Part 1
12
Outline
  • RDBMS and SQL
  • Query Optimization
  • Similarity-Based Search
  • Expressing and Optimizing Similarity- Based
    Queries

13
RDBMS and SQL
  • RDBMS (Relational Database Management System)
  • Based on E. F. Codds relational model (theory),
    1969
  • Venders IBM DB2, ORACLE, Microsoft SQL Server,
    etc.
  • http//www.cs.yale.edu/homes/avi/db-book/i
    ndex.html
  • SQL (Structured Query Language)
  • The standard language used by RDBMS
  • http//sqlzoo.net/
  • A quick review of RDBMS and SQL
  • http//mason.gmu.edu/xywang/infs797/lectu
    re1.pdf

14
Query Optimization
If most of employees salaries are very likely
gt100K, ? sequential scan
Select ID From Employee Where Salary gt 100K
If 95 of the employees salaries are unlikely
gt100K, ? using the index on Salary
Execution Engine
Optimizer ?SSCAN ?Index
Plan
IDs
Query Optimization is to find the plan that has
the smallest execution cost
Employee Data
Statistics About Employee
15
Similarity-Based Search
  • Nearest Neighbor
  • Given a set of pattern objects P and a query
    object q, an object pi in P such that sim(pi,q) lt
    sim(pj,q), for any j?i.
  • k-nearest neighbors
  • Near Neighbors
  • Given a set of pattern objects P, a query object
    q and a threshold TH, all object pi in P such
    that sim(pi,q) lt TH.
  • Both nearest and near neighbors

16
Example time series case
  • Similarity Euclidean Distance, Correlation
    Coefficient
  • Similarity-based Search

similar
dissimilar
t
t
pi
Nearest
q
P
pi1
Near
pi2
17
Example not similarity-based search!
18
Similarity-Based Queries in SQL
  • Current DBMS support UDT and UDF.
  • (Uuser, D defined, T type and F function)
  • Express the similarity predicate as UDP
    (user-defined-predicate)
  • NN(query object, one given object, pattern set,
    K, TH) ? T/F?

PatternSet
Select P.ID From PatternSet P Where NN(q,
P.ID, Set1, 1, 0.2) And P.ID lt 1000
19
Optimizing
Select P.ID From PatternSet P Where NN(q,
P.ID, Set1, 1, 0.2) And P.ID lt 1000
Only a few P.ID lt 1000
Plan A 1) P.ID lt 1000 2) NN(q, P.ID,
Set1, 1, 0.2)
Many P.ID lt 1000
Plan B 1) NN(q, P.ID, Set1, 1, 0.2) 2)
P.ID lt 1000
Write a Comment
User Comments (0)
About PowerShow.com