Query Models - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Query Models

Description:

Example - Seeking Saturday entertainment. Queries: Dinner AND sports AND symphony ... The 'Holy Grail' of information retrieval. Issues in Natural Language ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 39
Provided by: clgiles
Category:
Tags: models | query

less

Transcript and Presenter's Notes

Title: Query Models


1
Query Models
  • Use
  • Types
  • What do search engines do

2
What we have covered
  • What is IR
  • Evaluation
  • Tokenization and properties of text
  • Web crawling
  • This time
  • Query models

3
Index
Query Engine
Interface
Indexer
Users
Crawler
Web
A Typical Web Search Engine
4
Why the interest in Queries?
  • Queries are ways we interact with IR systems
  • Expression of an information need
  • Nonquery methods?
  • Types of queries?

5
Issues with Query Structures
  • Matching and ranking criteria
  • Given a query, what documents are retrieved?
  • In what order (rank)?

6
Types of Query Structures
  • Query Models (languages) most common
  • Boolean Queries
  • Extended-Boolean Queries
  • Natural Language Queries
  • Vector queries
  • Others?

7
Simple query language Boolean
  • Earliest query model
  • Terms Connectors (or operators)
  • terms
  • words
  • normalized (stemmed) words
  • phrases
  • thesaurus terms
  • connectors
  • AND
  • OR
  • NOT

8
Simple query language Boolean
  • Geek-speak
  • Variations are still used in search engines!
  • Ex X AND Y, Y AND X

9
Truth Tables Boolean Logic
Presence of P, P 1 Absence of P, P 0 True
1 False 0
10
Problems with Boolean Queries
  • How do you express your need in a Boolean
    Query???? (geekspeak)
  • No good way to weight terms for significance
  • Want music by Beethoven, preferably a sonata
  • Query?
  • Ranking?
  • Binary

11
Problems with Boolean Queries
  • Incorrect interpretation of Boolean connectives
    AND and OR
  • Example - Seeking Saturday entertainment
  • Queries
  • Dinner AND sports AND symphony
  • Dinner OR sports OR symphony
  • Dinner AND sports OR symphony

12
Order of precedence of operators
  • Example of query. Is
  • A AND B
  • the same as
  • B AND A
  • Why?

13
Sample Boolean Queries
  • Cat
  • Cat OR Dog
  • Cat AND Dog
  • (Cat AND Dog)
  • (Cat AND Dog) OR Collar
  • (Cat AND Dog) OR (Collar AND Leash)
  • (Cat OR Dog) AND (Collar OR Leash)

14
Satisfaction of Boolean Query
  • (Cat OR Dog) AND (Collar OR Leash)
  • Each of the following column combinations works
  • Cat x x x x
  • Dog x x x x x
  • Collar x x x x
  • Leash x x x x

Others?
15
Satisfaction of Boolean Query
  • (Cat OR Dog) AND (Collar OR Leash)
  • None of the following column combinations work
  • Cat x x
  • Dog x x
  • Collar x x
  • Leash x x

16
Boolean Logic
B
A
17
Order of Preference
  • Define order of preference
  • EX a OR b AND c
  • Infix notation
  • Parenthesis evaluated 1st with left to right
    precedence of operators
  • Next NOTs are applied
  • Then ANDs
  • Then ORs
  • a OR b AND c becomes
  • a OR (b AND c)

18
Infix Notation
  • Usually expressed as INFIX operators in IR
  • ((a AND b) OR (c AND b))
  • NOT is UNARY PREFIX operator
  • ((a AND b) OR (c AND (NOT b)))
  • AND and OR can be n-ary operators
  • (a AND b AND c AND d)
  • Some rules - (De Morgan revisited)
  • NOT(a) AND NOT(b) NOT(a OR b)
  • NOT(a) OR NOT(b) NOT(a AND b)
  • NOT(NOT(a)) a

19
DNFs and CNFs
  • All queries can be rewritten as
  • Disjunctive Normal Forms (DNFs)
  • Conjunctive Normal Forms (CNFs)
  • DNF Constituents
  • Terms (words or phrases)
  • Conjuncts (terms joined by ANDs)
  • Disjuncts (conjuncts joined by ORs)
  • Ex (A AND B) OR (A AND NOTC)
  • CNF Constituents
  • Terms (words or phrases)
  • Disjuncts (terms joined by ORs)
  • Conjuncts (disjuncts joined by ANDs)
  • Ex (A OR B) AND (A OR NOTC)

20
Effect of CNFs
  • All complex Boolean queries can be simplified
  • Why do reference librarians like CNFs?
  • ANDs reduce the size of the set returned and are
    easily expandable
  • So do minuss

21
Boolean Logic
t1
t2
D9
D2
D1
m3
m5
m6
m1 t1 t2 t3
D4
D11
m2 t1 t2 t3
D5
m3 t1 t2 t3
D3
m1
D6
m4 t1 t2 t3
m2
m4
D10
m5 t1 t2 t3
m6 t1 t2 t3
m7
m8
m7 t1 t2 t3
D8
D7
m8 t1 t2 t3
t3
22
Boolean Searching
Formal Query cracks AND beams AND
Width_measurement AND Prestressed_concrete
Measurement of the width of cracks in
prestressed concrete beams
Cracks
Width measurement
Beams
Relaxed Query (C AND B AND P) OR (C AND B AND
W) OR (C AND W AND P) OR (B AND W AND P)
Prestressed concrete
23
Pseudo-Boolean Queries
  • A new notation, from web search
  • cat dog collar leash
  • Does not mean the same thing!
  • Need a way to group combinations.
  • Phrases
  • stray cat AND frayed collar
  • stray cat frayed collar

24
Information need
Collections
text input
25
Result Sets
  • Run a query, get a result set
  • Two choices
  • Reformulate query, run on entire collection
  • Reformulate query, run on result set
  • Example Dialog query
  • (Redford AND Newman)
  • -gt S1 1450 documents
  • (S1 AND Sundance)
  • -gtS2 898 documents

26
Information need
Collections
text input
Reformulated Query
27
Ordering (ranking) of Retrieved Documents
  • Pure Boolean has no ordering
  • Term is there or its not
  • In practice
  • order chronologically
  • order by total number of hits on query terms
  • What if one term has more hits than others?
  • Is it better to have one of each term or many of
    one term?

28
Boolean Query - Summary
  • Advantages
  • simple queries are easy to understand
  • relatively easy to implement
  • Disadvantages
  • difficult to specify what is wanted
  • too much returned, or too little
  • ordering not well determined
  • Dominant language in commercial systems until the
    WWW

29
Vector Space Model
  • Documents and queries are represented as vectors
    in term space
  • Terms are usually stems
  • Documents represented by binary vectors of terms
  • Queries represented the same as documents
  • Query and Document weights are based on length
    and direction of their vector
  • A vector distance measure between the query and
    documents is used to rank retrieved documents

30
Document Vectors
  • Documents are represented as bags of words
  • Words are terms with no order
  • Represented as vectors when used computationally
  • A vector is like an array of floating point
    values
  • Has direction and magnitude
  • Each vector holds a place for every term in the
    collection
  • Therefore, most vectors are sparse

31
Queries
  • Vocabulary (dog, house, white)
  • Queries
  • dog (1,0,0)
  • house (0,1,0)
  • white (0,0,1)
  • house and dog (1,1,0)
  • dog and house (1,1,0)
  • Show 3-D space plot

32
Documents (queries) in Vector Space
t3
D1
D9
D11
D5
D3
D10
D2
D4
t1
D7
D6
D8
t2
33
Documents in 3D Space
Assumption Documents that are close together
in space are similar in meaning.
34
Vector Query Problems
  • Significance of queries
  • Can different values be placed on the different
    terms eg. 2dog 1house
  • Scaling size of vectors
  • Number of words in the dictionary?
  • 100,000

35
Proximity Searches
  • Proximity terms occur within K positions of one
    another
  • pen w/5 paper
  • A Near function can be more vague
  • near(pen, paper)
  • Sometimes order can be specified
  • Also, Phrases and Collocations
  • United Nations Bill Clinton
  • Phrase Variants
  • retrieval of information information
    retrieval

36
Filters
  • Filters Reduce set of candidate docs
  • Often specified simultaneous with query
  • Usually restrictions on metadata
  • restrict by
  • date range
  • internet domain (.edu .com .berkeley.edu)
  • author
  • size
  • limit number of documents returned

37
Natural Language Queries
  • The Holy Grail of information retrieval
  • Issues in Natural Language Processing
  • syntax
  • semantics
  • pragmatics
  • speech understanding
  • speech generation

38
What do search engines do?
  • Tags
  • Title
  • Meta
  • Term frequency and location
  • Popularity

39
UC Berkeley Search Engine Guide
http//www.lib.berkeley.edu/TeachingLib/Guides/Int
ernet/SearchEngines.html
40
UC Berkeley Search Engine Guide
http//www.lib.berkeley.edu/TeachingLib/Guides/Int
ernet/SearchEngines.html
41
Search Engine Queries
42
OldSearch Engine Query Differences
43
(No Transcript)
44
Older Search engine query models
45
Search engine query models
46
Types of Query Structures
  • Query Models (languages) most common
  • Boolean Queries
  • Old model
  • Vector queries
  • Very common
  • Holy grail of search
  • Natural Language Queries
  • Batch lookup - its all there before you query!
Write a Comment
User Comments (0)
About PowerShow.com