Opinion Retrieval from Blogs - PowerPoint PPT Presentation

About This Presentation
Title:

Opinion Retrieval from Blogs

Description:

1 Department of Computer Science, ... Pearson's Chi-square test ... A support vector machine (SVM) classifier. Objective sentences. Subjective sentences ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 22
Provided by: q3
Learn more at: https://www.cs.uic.edu
Category:

less

Transcript and Presenter's Notes

Title: Opinion Retrieval from Blogs


1
Opinion Retrieval from Blogs
Wei Zhang1 Clement Yu1 Weiyi Meng2
wzhang_at_cs.uic.edu yu_at_cs.uic.edu
meng_at_cs.binghamton.edu 1 Department of Computer
Science, University of Illinois at Chicago 2
Department of Computer Science, Binghamton
University
CIKM 2007
1
2
Outline
  • Overview of the opinion retrieval
  • Topic retrieval
  • Opinion identification
  • Ranking documents by opinion similarity
  • Experimental results

CIKM 2007
2
3
Overview of the Opinion Retrieval
  • Opinion retrieval
  • Given a query, find documents that have
    subjective opinions about the query
  • A query book
  • Relevant This is a very good book.
  • Irrelevant This book has 123 pages.

4
Overview of the Opinion Retrieval
  • Introduced at TREC 2006 Blog Track
  • 14 groups, 57 submitted runs in TREC 2006
  • 20 groups, 104 runs in TREC 2007 (on going)
  • Key problems
  • Opinion features
  • Query-related opinions
  • Rank the retrieved documents

5
Our Algorithm
6
Topic Retrieval
  • Retrieve query-relevant documents
  • No opinion involved
  • Features
  • Phrase recognition
  • Query expansion
  • Two document-query similarities

7
Topic Retrieval Phrase Recognition
  • Semantic relationship among the words
  • For phrase similarity calculation purpose
  • 4 types
  • Proper noun University of Lisbon
  • Dictionary phrase computer science
  • Simple phrase white car
  • Complex phrase small white car

8
Topic Retrieval Query Expansion
  • Find the synonyms
  • wto ?? world trade organization
  • Same importance
  • Add additional terms
  • wto ? negotiate, agreements, Tariffs,

9
Topic Retrieval - Similarity
  • Sim(Query, Doc) ltSim_P, Sim_Tgt
  • Phrase similarity
  • Having or not having a phrase
  • Sim_P sum ( idf(P_i) )
  • Term similarity
  • Sum of the Okapi scores of all the query terms
  • Document ranking
  • D1 is ranked higher than D2, if
  • (Sim_P1gtSim_P2) OR (P1P2 AND T1gtT2)

10
Opinion Identification
Subjective training data
Objective training data
Feature Selection
retrieved documents
opinionative documents
SVM classifier
From topic retrieval
To opinion ranking
11
Opinion Identification Training Data
  • Subjective training data
  • Review web sites
  • Documents having opinionative phrases
  • Objective training data
  • Dictionary entries
  • Documents not having opinionative phrases

12
Opinion Identification Feature Selection
  • The words expressing opinions
  • Pearsons Chi-square test
  • Test of the independence between subjectivity
    label and words via contingency table
  • Count the number of sentences
  • Unigrams and bigrams

13
Opinion Identification Classifier
  • A support vector machine (SVM) classifier

Objective sentences
Subjective sentences
Features
Feature vector representation
Training
SVM classifier
14
Opinion Identification Classifier
  • Apply the SVM classifier

Document
SVM classifier
Sentence 1
Label 1objective
Sentence 2
Label 2subjective


Sentence n
Label nobjective
15
Opinion Similarity - Query-Related Opinions
  • Find the query-related opinions

query
opinionative sentence
text window
document
document
16
Opinion Similarity Similarity 1
  • Assumption 1
  • Higher topic relevance
  • ?Higher rank
  • OSim_ir Sim(Query, Doc)

17
Opinion Similarity Similarity 2
  • Assumption 2
  • More query-related opinions
  • ?Higher rank
  • OSim_stcc total number of sentences
  • OSim_stcs total score of sentences

18
Opinion Similarity Similarity 3
  • A linear combination of 1 and 2
  • a Osim_ir (1-a) OSim_stcc
  • b Osim_ir (1-b) OSim_stcs

19
Opinion Similarity Experimental Results
  • TREC 2006 Blog Track data
  • 50 queries, 3.2 million Blog documens
  • UIC at TREC 2006 Blog Track
  • Title-only queries scored the first
  • 28 - 32 higher than best TREC 2006 scores
  • Good things learned
  • More training data
  • Combined similarity function

20
Conclusions
  • Designed and implemented an opinion retrieval
    system. IR text classification for opinion
    retrieval
  • The best known retrieval effectiveness on TREC
    2006 blog data
  • Extend to polarity classification
    positive/negative/mixed
  • Plan to improve feature selection

21
Questions?
  • wzhang_at_cs.uic.edu
  • http//www.cs.uic.edu/wzhang/
Write a Comment
User Comments (0)
About PowerShow.com