INFO624 - Week 4 Query Languages and Query Operations - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

INFO624 - Week 4 Query Languages and Query Operations

Description:

Query is a representation of the user's information needs ... Amphetamine-Related Disorders. Cocaine-Related Disorders. Marijuana Abuse. Automatic Expansion ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 38
Provided by: xlin2
Category:

less

Transcript and Presenter's Notes

Title: INFO624 - Week 4 Query Languages and Query Operations


1
INFO624 - Week 4Query Languages and Query
Operations
  • Dr. Xia Lin
  • Associate Professor
  • College of Information Science and Technology
  • Drexel University

2
Query
  • Query is a representation of the users
    information needs
  • It may not represent the information needs
    exactly because
  • Information needs are difficult to describe --
    semantic difficulty
  • Query must be in a format acceptable to the
    retrieval system -- syntactic difficulty

3
Content-based queries
Pattern Matching
word matching
Words Phrases Proximity
Prefix/suffix Wildcard search Error
handling Extended patterns
Boolean
Vector
Natural Language
4
Boolean Queries
  • Request
  • What are the likely problems when someone gets
    hurt on his knees when playing basketball?
  • Write your best Boolean query for this request
  • If the query returns zero hits, how do you modify
    the query?
  • If the query returns too many hits, how do you
    modify the query?

5
  • How does AskJeeves translate the request?
  • What are the likely problems when someone gets
    hurt on his knees when playing basketball?

6
  • Construct your best Boolean query for this
    request
  • I am doing a research on personal space
    boundaries. I want to know if there are any sex
    or race differences in personal space boundaries.

7
Interaction with Queries
  • Starts with a SEED query
  • The System responds with a list of related terms
  • Adds selected terms from the list to the query
  • The system updates the list of related terms
  • Repeat as needed

8
Example MedLine Search Assistant
9
Association-based Queries
  • Find documents similar to this document.
  • Find documents that links to this document
  • Explicitly
  • Implicitly

10
Field-based Queries
11
  • Field-based queries will likely improve search
    precision.
  • Field-based queries require that the data source
    has a fixed structure and are indexed by the
    structure.

12
Citation-based Queries
  • Retrieve all documents that document A cites.
  • Find all documents that cite document A.
  • Find all documents that cite this author
  • Find all document that cite both document A and
    document B
  • Find documents that cites both author A and
    author B

13
Co-Citation
  • The college has more than 20 years tradition on
    Co-citation research.
  • Co-citation is the mentioning of any two earlier
    documents in the bibliographic references of a
    later third document.

Document 1
cites
Later Document 3
?
cites
Document 2
14
Co-Citation Analysis
  • The count of mentions may grow over time as new
    writings appear. Thus, co-citation counts can
    reflect citers changing perceptions of documents
    as more or less strongly related.
  • Documents shown to be related by their
    co-citation counts can be mapped as proximate in
    intellectual space.

15
Co-Citation Mapping
  • Detects patterns in the frequency with which any
    works by any two authors are jointly cited in
    later works.
  • Only recurrent co-citation is significant The
    more times authors are cited together, the more
    strongly related they are in the eyes of citers.

16
A Map of Information Scientists
17
AuthorLinks
18
Link-Based Queries
  • Hypertext Structure
  • Is a link a query?
  • http//www.google.com/search?hlenqinformationr
    etrieval
  • This is called query-mediated link.
  • It is also called soft link.
  • Is a query a link?
  • Many pages are dynamically generated from a
    database or a search engine.
  • Your review pages

19
Queries, Links, Is there a difference SIGCHI97
  • An experiment was conducted to compare browsing
    behavior in query- and link-based interfaces.
    Results suggest that query-mediated links are as
    effective as explicit queries, and that
    strategies adopted by users affect performance.
    This work has implications for the design of
    information exploration interfaces.

20
Query Structure
  • Hierarchical Structure
  • What does the user want when searching for
    substance abuse
  • We may not know, but adding narrower terms of
    substance abuse will likely get better results
  • Alcohol Abuse
  • Drug Abuse
  • Alcohol-Related Disorders
  • Amphetamine-Related Disorders
  • Cocaine-Related Disorders
  • Marijuana Abuse

21
Automatic Expansion
  • If there is a defined hierarchy, several search
    strategies may be defined to expand the query
  • Search with the query term only
  • Search with the query term and all the terms in
    its upper hierarchy
  • Search with the query term and all the terms in
    its lower hierarchy.
  • Search with the query terms and its all the
    sibling terms

22
(No Transcript)
23
Query Operations
  • Query execution
  • Query expansion
  • Query translation

24
Query Expansion
  • Improve the initial query through automatically
  • restructuring the query or
  • adding other new terms or
  • Adjusting weights of each terms.

25
  • Restructuring the query
  • Identify key concepts through natural language
    processing
  • Identify any field information that may be
    contained in the query
  • Is this an author?
  • Is this a journal?
  • Reverse term orders in the query

26
  • Adding new terms
  • Synonyms
  • Hierarchical terms
  • Scope terms
  • Does query Football retrieve information on
    football or on soccer?
  • Relevant terms
  • Selected terms from relevant documents
  • Terms co-occur most often with the query terms

27
  • Adjusting term weighting
  • If relevant documents are known, increase the
    weights for terms assigned to the relevant
    documents and decrease the weights to terms
    assigned to non-relevant documents.
  • Adjust term weights in a topic tree
  • Fruit
  • Fruit, 0.9 apple, 0.7 orange, 0.7 banana,
    0.6 . Macintosh, 0.1 Computer -.4.

28
Query Translation
  • From natural language to queries
  • AskJeeves
  • From queries in one system to queries in another
    system
  • From one natural language to another natural
    language
  • Altavista

29
Other types of representation for users needs?
  • Mind-reading?
  • Non-text queries?
  • Gesture/motion?

30
IBM Visualization Space
  • This information system understands the user.
  • It "hears" users' voice commands and "sees"their
    gestures and body positions. Interactions are
    natural, more like human-to-human interactions.

31
Multimedia Queries
  • Content-based
  • Text indexing
  • Attribute-based
  • Color, size, type, time period,
  • Structure-based
  • Location, shape, layout, etc.
  • Cluster-based
  • Semantic groups, physical groups,
    structure-groups,
  • Example find a photo that has the White House in
    the center.

32
Project Discussion
  • Idea 1 Install and implement an IR system
  • Focus on system and technology
  • Need to have a collection
  • Need to have hand-on experience with systems
  • Idea 2 Conduct an evaluation experiment on one
    or two selected IR systems
  • Focus on interfaces and users
  • Idea 3 Customize an IR system
  • Focus on functionality and customization

33
Project Evaluation
  • Topics
  • Relevance
  • Problems identified
  • Technical difficulties
  • Solutions/ideas
  • The process
  • Design
  • Implementation

34
  • The report
  • Background
  • Written
  • Oral

35
Midterm
  • Concepts
  • What is information retrieval?
  • Data, information, text, and documents
  • Two abstractions principles
  • Users information needs
  • Queries and query formats
  • Precision and Recall
  • Relevance

36
Midterm
  • Procedures problem solving
  • How to translate a request into a query?
  • How to expand queries
  • for better recall or better precision?
  • How to create an inverted indexing?
  • How to create a vector space ?
  • How to calculate similarities of documents?
  • How to match a query to documents in a vector
    space?

37
  • Discussions
  • Challenges of IR
  • Advantages and disadvantages of Boolean search
    (vector space, automatic indexing,
    association-based queries, etc.)
  • Evaluation of IR systems
  • With or without using precision/recall.
  • Difference between data retrieval and information
    retrieval
Write a Comment
User Comments (0)
About PowerShow.com