WordSieve: Learning Task Differentiating Keywords Automatically - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

WordSieve: Learning Task Differentiating Keywords Automatically

Description:

Not used as much for Personal Information Retrieval. Higher overhead than TFIDF ... rec.autos. rec.motorcycles. talk.politics.guns. talk.politics.misc. sci. ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 20
Provided by: travis66
Category:

less

Transcript and Presenter's Notes

Title: WordSieve: Learning Task Differentiating Keywords Automatically


1
WordSieve Learning Task Differentiating Keywords
Automatically
  • Travis Bauer
  • Sandia National Laboratories
  • (Research discussed today was done at Indiana
    University)

2
Learning Task ContextsCalvin
  • Learn what characterizes a users task contexts
  • Unobtrusive Observing
  • Keyword Extraction
  • Index based on Context

3
Currently Used Algorithms
  • TFIDF
  • Latent Semantic Analysis
  • Log-Entropy

4
Currently Used Algorithms
  • TFIDF
  • "One of the most successful and well tested
    techniques in Information Retrieval." - Pazanni
  • Syskill Webert (Pazanni '96)
  • Hierarchical Feature Map (Merkl '97)
  • Learning in Document Filtering (Callen '98)
  • Topic Detection (Shultz '99)
  • Remembrance Agent (Rhodes '00)
  • Lexical Signatures (Park '02)
  • Latent Semantic Analysis
  • Log-Entropy

5
Currently Used Algorithms
  • TFIDF
  • Latent Semantic Analysis
  • Well known, popular, well covered in the
    literature
  • Grading Essay Tests
  • Taking Physics tests
  • Taking synonym exams
  • Cross Linguistic IR (Dumais '97)
  • Assigning papers for peer review (Dumais '92)
  • Information Filtering (Foltz '90)
  • Log-Entropy

6
Currently Used Algorithms
  • TFIDF
  • Latent Semantic Analysis
  • Log-Entropy
  • Not used as much for Personal Information
    Retrieval
  • Higher overhead than TFIDF
  • Indexes based on the distribution of terms across
    documents potentially better performance

7
Comparison to Current Techniques
  • Current Techniques
  • Static Corpora
  • Comprehensive Statistics
  • WordSieve
  • Neural Network-like processing
  • Stream of data
  • Local learning
  • Competitive Learning

8
Good Discriminator of Context
9
WordSieve Concept
User Browsing
Attributes Term Activation Priming
10
WordSieve 1
Words Absent in Document Sequences
User Profile
Context Profile
Words Occurring in Document Sequences
Words Currently Occurring Frequently
11
WordSieve 2
User Profile
Words Reflecting Context
Context Profile
Words Currently Occurring Frequently
12
Web Browsing Data Set
  • Sixteen Users
  • Four Topics, 10 minutes Each
  • Political Life Al Gore
  • Political Life George Bush
  • Traditional Indonesian Cooking
  • Traditional Thai Cooking

Categorized Document Set
Automatically Generated Queries
13
Browsing Results
14
Contributions
  • It is possible to extract context differentiating
    terms from document streams using unsupervised
    competitive learning.
  • Comprehensive statistics are not necessary in the
    described situations given an ordering of the
    documents.
  • Performance is comprable to LSI and better than
    Log-Entropy and TFIDF

15
Potential Next Steps
  • WordSieve
  • Automate Parameter Optimization
  • Co-occurrance of terms
  • Other Domains
  • Multi-dimensional data stream
  • Machine Vision

16
Support
  • This work was conducted under the advisement of
    David Leake at Indiana University.
  • It was sponsored in part by the GAANN fellowship.
  • The original version of the personal information
    agent was designed and written with partial
    support from NASA under award No NCC 2-1035

17
For More Information
  • Travis Bauer
  • www.cs.indiana.edu/trbauer/publications.htm

18
Usenet Data Set
  • Three sets of 5 newsgroups
  • alt.atheismtalk.religion.miscsoc.religion.christ
    ianrec.sport.baseballrec.sport.hockey
  • comp.os.ms-windows.misccomp.sys.ibm.pc.hardwarec
    omp.sys.mac.hardwarerec.autosrec.motorcycles
  • talk.politics.gunstalk.politics.miscsci.electron
    icssci.medsci.space

Categorized Document Set
Automatically Generated Queries
19
Usenet Results
Write a Comment
User Comments (0)
About PowerShow.com