Mehran Sahami - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Mehran Sahami

Description:

new valentine one. 2. 0.736. I love you valentine. 4. 0.758. valentines day greeting cards ... valentine day card. 3. 0.832. 2003 valentine's day. valentines ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: nlgCsie
Category:

less

Transcript and Presenter's Notes

Title: Mehran Sahami


1
A Webbased Kernel Function for Measuring the
Similarity of Short Text Snippets
  • Mehran Sahami

Timothy D. Heilman
2
Introduction
  • Wish to determine how similar two short text
    snippets are.
  • High degree of semantic similarity
  • United Nations Secretary General vs
  • Kofi Annan
  • AI vs Articial Intelligence
  • Share terms
  • graphical models vs
  • graphical interface

5
3
Related Work
  • Query expansion techniques
  • Other means of determining query similarity
  • Set overlap (intersection)
  • SVM for text classification
  • Latent Semantic Kernels (LSK)
  • Semantic Proximity Matrix
  • Cross-lingual techniques

10
4
A New Similarity Function
  • represent a short text snippet (query) to a
    search engine S
  • be the set of n retrieved documents
  • Compute the TFIDF term vector for each
    document
  • Truncate each vector to include its m highest
    weighted term

15
5
Normalize
  • Let be the centroid of the L2 normalized
    vector
  • Let QE(x) be the L2 normalization of the centroid
    C(x)

20
6
Kernel Function
25
7
Initial Results with Kernel
  • Three genres of text snippet matching
  • Acronyms
  • Individuals and their positions
  • Multi-faceted terms

30
8
Acronyms
35
9
Individuals and their positions
40
10
Multi-faceted terms
45
11
Related Query Suggestion
  • Kernel function for
  • u is any newly issued user query
  • A repository Q of approximately 116 million
    popular user queries issued in 2003, determined
    by sampling anonymized web search logs from the
    Google search engine

50
12
Algorithm
  • Given user query and list of matched queries
    from repository
  • Output list of queries to suggest
  • Initialize suggestion list
  • Sort kernel scores in descending order
    to produce an ordered list
    of corresponding queries
  • MAX is set to the maximum number of suggestions

55
13
Post-Filter
q denotes the number of terms in query q
60
14
Evaluation of Query Suggestion System
  • suggestion is totally off topic.
  • suggestion is not as good as original query.
  • suggestion is basically same as original query.
  • suggestion is potentially better than original
    query.
  • suggestion is fantastic - should suggest this
    query since it might help a user find what
    they're looking for if they issued it instead of
    the original query.

65
15
Evaluations
70
16
Average ratings at various kernel thresholds
75
17
Average ratings versus average number of query
suggestions
80
18
Application in QA
  • K("Who shot Abraham Lincoln", "John Wilkes
    Booth") 0.730
  • K("Who shot Abraham Lincoln", "Abraham Lincoln")
    0.597

85
19
Conclusion
  • A new kernel function for measuring the semantic
    similarity between pairs of short text snippets
  • The first is improvement in the generation of
    query expansions with the goal of improving the
    match score for the kernel function

20
Term Weighting Scheme
  • The weight associated with the term in
    document is defined to be
  • Where is the frequency of in
  • N is the total number of ducuments , and is
    the total number of documents that contain

21
Lp Norm
  • Given by
  • Most common cases
  • P1 ,This is the L1 norm, which is also called
    Manhattan distance
  • P2 ,This is the L2 norm, which is also called
    the Euclidean distance
  • P , This is the L norm, also called the
    infinity norm or the Chebyshev norm
Write a Comment
User Comments (0)
About PowerShow.com