Keyword Search - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Keyword Search

Description:

Query 2: 'lyrics how come by D12' Query 3: 'album by D12 and Eminem' Tuple Tree 1: b2 ... 'jojo leave lyrics') b3 and s3 score higher than (b3, bs3, s3)! Tuple ... – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 32
Provided by: suz3
Category:

less

Transcript and Presenter's Notes

Title: Keyword Search


1
Keyword Search
  • Su Zhan
  • suzhan_at_fudan.edu.cn

2
Introduction
  • What is keyword search
  • Present works
  • Effective keyword search in relational databases.
  • BLINKS ranked keyword searches on graphs.

3
Effective keyword search in relational databases.
  • Fang Liu
  • Clement T. Yu
  • Weiyi Meng
  • Abdur Chowdhury

4
Outline
  • Introduction
  • Answer generation
  • Background in IR ranking
  • Novel ranking strategy for relational databases
  • Experiment results
  • Conclusion

5
Introduction
6
Introduction
  • Suppose a user is looking for albums titled off
    the wall and he/she cannot remember the exact
    title.
  • Select from Album B
  • Where Contains(B.title, off wall, 1) gt 0
  • Order by score(1) desc
  • Or
  • off wall

7
Introduction
  • Query 1 off wall
  • Query 2 lyrics how come by D12
  • Query 3 album by D12 and Eminem
  • Tuple Tree 1 b2
  • Tuple Tree 2
  • a1? ab1? b1? bs1? s1
  • Tuple Tree 3
  • a1? ab1? b1? ab2 ? a2

8
Introduction
  • 3 key steps for processing a given keyword query
  • Generate all candidate answers, each of which is
    a tuple tree by joining tuples from multiple
    tables.
  • Then compute a single score for each answer. The
    scores should be defined in such a way so that
    the most relevant answers are ranked as high as
    possible.
  • And finally return answers with semantics.
  • This paper focuses on search effectiveness, that
    is, step (2).

9
Answer Generation
  • Schema Graph
  • Tuple Tree
  • Keyword Query
  • Answer
  • Query Tuple Set RQ
  • Free Tuple Set RF
  • Answer Graph

10
(No Transcript)
11
Background in IR ranking
  • Ranking Model in IR
  • 11-point precision and recall
  • Mean average precision
  • Reciprocal rank

12
Background in IR ranking
  • Ranking Model in IR

13
Novel ranking strategy for relational Database
  • Let T be a tuple tree and D1, D2, , Dm be all
    text column values in T. We define each text
    column value Di as a document and T as a
    super-document. Then we can compute a similarity
    value between the query Q and the super-document
    T as shown in Formula 3 to rank tuple trees.
  • Our focus is on weight(k,T)

14
Novel ranking strategy for relational Database
  • Four Normalizations
  • Tuple Tree Size Normalization
  • Document Length Normalization
  • Document Frequency Normalization
  • Inter-Document Weight Normalization

15
Novel ranking strategy for relational Database
  • (jojo leave lyrics)
  • b3 and s3 score higher than (b3, bs3, s3)!
  • Tuple Tree Size Normalization

16
Novel ranking strategy for relational Database
  • (how come)
  • Title and Lyrics score the same!
  • Global average?
  • Document Length Normalization

17
Novel ranking strategy for relational Database
  • idf
  • Document Frequency Normalization

18
Novel ranking strategy for relational Database
  • A term tends to appear more frequently in a T
    with a larger size.
  • Inter-Document Weight Normalization

19
Novel ranking strategy for relational Database
  • Schema terms in query
  • Value terms
  • Schema terms
  • Schema-based document frequency
  • Assign the largest document frequency value among
    all terms to df
  • Assign 1 to tf
  • What if k is both value term and schema term?

20
Novel ranking strategy for relational Database
  • Phrase-based Ranking
  • Utilize phrase-based ranking to improve
    effectiveness.
  • If a sub-query of Q, Pki,ki1,..kj, where iltj,
    appears in a document D, and ki-1 does not appear
    in an adjacent location to ki in this occurrence
    of P in D, and kj1 does not appear in an
    adjacent location to kj in this occurrence of P
    in D, then we define it as an occurrence of the
    phrase P in D.

21
Novel ranking strategy for relational Database
22
Novel ranking strategy for relational Database
  • Suppose Q1, 2, 3, 4 and a document D in T is
    .. 1, 2, 3 .. 2, 3, 4 .. 2, 3, 4 .. 1, 2 .. 1
    ...
  • 1, 2, 3 and 2, 3, 4 overlap
  • Choose the phrase with the highest weight.

23
(No Transcript)
24
Novel ranking strategy for relational Database
  • concept set(CQ)
  • Phrase model
  • a document that contains only some highly
    weighted terms
  • Concept ranking model
  • concept similarity value Sim(CQ, T).
  • if (1) Sim(CQ, T1) gt sim(CQ,T2)
  • or (2) Sim(CQ,T1)Sim(CQ,T2) and Sim(Q,T1) gt
    Sim(Q,T2), then T1 is ranked higher than T2.

25
Experiment results
26
Experiment results
27
Experiment results
28
Experiment results
29
Experiment results
30
Conclusion
  • Four normalizations
  • tuple tree size normalization
  • document length normalization,
  • document frequency normalization
  • inter-document weight normalization.
  • The results show that
  • all the four new normalization factors are
    critical to search effectiveness
  • phrase-based search and concept-based search
    improve effectiveness significantly
  • our strategy is significantly better than related
    works and significantly outperforms Google.

31
BLINKS Ranked Keyword Searches on Graphs
  • Hao Hey
  • Haixun Wang
  • Jun Yang
  • Philip S. Yu
Write a Comment
User Comments (0)
About PowerShow.com