Amanda Spink : Analysis of Web Searching and Retrieval - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Amanda Spink : Analysis of Web Searching and Retrieval

Description:

Dr. McCain - Winter 2004. 2. Background. Amanda Spink. Self-described areas of work: ... Over 140 papers published. 5th in journal article production, ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 26
Provided by: pagesD
Category:

less

Transcript and Presenter's Notes

Title: Amanda Spink : Analysis of Web Searching and Retrieval


1
Amanda Spink Analysis of Web Searching and
Retrieval
  • Larry Reeve
  • INFO861 - Topics in Information Science
  • Dr. McCain - Winter 2004

2
Background
  • Amanda Spink
  • Self-described areas of work
  • Information Retrieval
  • Web Retrieval
  • Human Information Behavior / Information Seeking
  • Medical Informatics
  • Ph.D. 1993 Rutgers University
  • Thesis - Feedback in Information Retrieval
  • Studied under Tefko Saracevic

3
Background
  • Amanda Spink
  • Over 140 papers published
  • 5th in journal article production,
  • 18th in citation production among U.S. IS faculty
  • Institute for Information Science most highly
    cited paper in Web Retrieval
  • Real Life, Real Users, Real needs A Study and
    Analysis of User Queries on the Web (2000)

4
Background
  • Amanda Spink
  • Associate Professor at University of Pittsburgh
  • School of Information Sciences
  • Prior faculty positions
  • Pennsylvania State University
  • School of Information Science Technology
  • Web Research Group
  • University of North Texas
  • School of Library and Information Sciences

5
Background
  • Tefko Saracevic
  • Associate Dean
  • School of Communication, Information and Library
    Studies, Rutgers University
  • Related research
  • Test and Evaluation of IR systems
  • Relevance in Information Science
  • Analysis of web queries

6
Web Searching and Retrieval
  • Analyze user queries
  • Important for building future IR systems on Web
  • Focus on search terms
  • Failure analysis in query construction
  • Term Relevance Feedback (TRF)
  • Topics / Classification
  • Use of language

7
Studies Conducted
  • U.S. Excite (www.excite.com)
  • 51K study
  • 51,473 queries
  • 18,113 users
  • March 9, 1997
  • 1M study
  • 1,025,910 queries
  • 211,063 users
  • September 16, 1997

8
Studies Conducted
  • European - AllTheWeb.com
  • 1 million queries
  • 200,000 users
  • Logs from two days
  • February 6, 2001
  • May 28, 2002
  • Most users from Norway and Germany

9
Studies Conducted
  • Issues with Web transaction logs
  • Where does session start and end?
  • Temporal boundary Spink found 15 mins avg,
  • Others found 5mins, 12mins, 32mins, and 2 hours
  • Numerical boundary 100 entries
  • How to eliminate non-individual users
  • Meta-search engines, other agents
  • No user insight into users process

10
Findings
  • Relevance Feedback
  • Advanced Search Techniques
  • Term Characteristics
  • Query Classification
  • American vs. European

11
Findings Relevance Feedback
  • Term Relevance Feedback (TRF) rarely used
  • 51K study
  • 1,597 queries from 823 users (lt5 of queries)
  • Those using TRF had longer sessions
  • Successful 60 of time
  • Implications
  • Failure rate of 40 may be too high
  • IR designers could automatically perform TRF

12
Findings Relevance Feedback
  • Mediated searching
  • 11 of search terms come from TRF
  • 37 from users, 63 from mediators
  • 2/3 of TRF contributed positively

13
Findings Relevance Feedback
  • Identified 6 session states
  • Initial Query, Modified Query, Next Page,
  • New Query, Relevance Feedback, Prev Query
  • Identified 4 session patterns
  • Using the 6 session states
  • Implication IR designers should accommodate
    these states and patterns

14
Findings Relevance Feedback
  • Relevance Feedback Session Patterns

15
Findings Advanced Search Techniques
  • Includes
  • Boolean operators
  • Modifiers , -
  • Quotes (phrases)
  • Not often used by Web users, but used more by
    mediated search
  • Boolean lt10, Modifiers 9, 6 phrases
  • Used incorrectly
  • Boolean AND50, OR28, AND NOT19
  • Modifiers 75 of time
  • Phrases 8
  • Users and advanced techniques do not get along!

16
Findings Advanced Search Techniques
  • Boolean, most common problems
  • Not capitalizing AND
  • Confusing AND operator with and conjunction
  • e.g. Science and Technology
  • Science AND Technology
  • Modifiers, most common problems
  • Prefix rather than mathematical postix
  • news weather rather than newsweather
  • No space required, as is required with Boolean

17
Findings Term Characteristics
  • Terms per query
  • 1 26.6, 2 31.5, 3 18.2, gt7 1.8
  • Mediated searching 7-15 terms
  • Distribution of terms not quite Zipf
  • Top terms account for 10 of all terms
  • Single-use terms account for 9 of all terms
  • Not understood why this occurs

18
Findings Query Classification
Classification of queries based on Rutgers Web
Classification
19
Findings Query Classification
  • What users are looking for is not what is on Web
  • Distribution of content
  • 83 Commercial, 6 Educational, 3 Health
  • Example 10 of searches are for Health
  • Searchers find classifications understandable
  • IR system presentation design

20
Findings American European Searching
  • Commonalities
  • Three or fewer terms
  • American 80, European 85
  • Predominantly use English terms
  • Relevance judgments less than 15 minutes viewing
    retrieved documents
  • Information seeking sessions short

21
Findings American European Searching
  • Differences
  • Categories
  • American Entertainment, Sex, Commerce
  • European People-places-things, Computers,
    Commerce
  • American searchers spent more time searching
    e-commerce sites than European counterparts
  • Did not examine
  • Use of advanced techniques
  • Relevance feedback
  • First in initial set of studies?

22
Findings Summary
  • Number of query terms is about 2
  • TRF is not used often
  • Boolean operators and modifiers not used often
    difficulty in using them correctly
  • Users do not spend much time making relevancy
    judgments
  • Term frequency distribution is a few terms used
    often, many terms used only once

23
Findings Summary
  • Most users had single query only and did not
    follow up with successive queries
  • Average viewing of 2 pages
  • 50 did not access beyond first page more than
    75 did not go beyond 2 pages

24
Implications / Further Research
  • Improve use of advanced search techniques
  • UI changes, Venn Diagrams
  • Improve use of relevance feedback
  • Automatic generation of TRF results
  • Improve classification of results
  • UI changes, result overview
  • Improve understanding of language use
  • Adapt IR designs to language
  • Examine cultural differences
  • TRF, advanced search techniques (same or
    different)

25
Amanda Spink - Web Searching and Retrieval
  • Questions
Write a Comment
User Comments (0)
About PowerShow.com