WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence

About This Presentation

Title:

WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence

Description:

Expand with synonyms & antonyms from Wordnet. 5. WIDIT Lab, Indiana University ... Expand the seed set with synonyms & antonyms from lexical sources (AV1) ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 25

Provided by: SLIS69

Category:

more less

Transcript and Presenter's Notes

Title: WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence

1
WIDIT in TREC-2008 Blog TrackLeveraging
multiple sources of opinion evidence

Kiduk Yang
WIDIT Laboratory
School of Library Information Science
Indiana University

2
Blog Track Challenge

Targeted Opinion Detection
Subjective language is context-dependent
Both objective and subjective documents are
composed of a mixture of subjective and objective
language
Must associate opinion to the target
Blogosphere Characteristics
Highly personalized ? non-standard use of
language
Interactive ? opinion may span a fraction of the
posting
Blogware ? embedded noise
Spam

3
Research Questions Opinion Detection

What are the evidences of opinion?
Opinion Terminology
Words often used in expressing an opinion
e.g., Skype sucks, Skype rocks, Skype is
cool
Opinion Collocations
Collocations that mark an opinion
e.g., I think tomato is a fruit, Tomato is a
vegetable to me
Opinion Morphology
Word morphing to emphasize an opinion
e.g., Vista is soooo buggy, Vista is metacool
How can they be leveraged?
Opinion classification via Supervised Learning
Document scoring using Opinion Lexicons
How can they be combined to detect opinionated
blogs?
Weighted sum optimized via Dynamic Tuning

4
WIDIT Blog System Architecture
Wilsons Lexicons
Netlingo Terms
BlogData
IMDbData
Blogsw/o Noise
Noise Reduction
Blogs
Opinion Lexicons
Document Indexing
Opinion Reranking
OnTopicResults
OpinionResults
InvertedIndex
DynamicTuning
Topic Reranking
Retrieval
InitialResults
Fusion
ExpandedQuery
LongQuery
ShortQuery
PolarityDetection
FusionResult
PolarityResult
Query Indexing
Topics
5
WIDIT Approach Opinion Lexicons

Lexicon-based Opinion Detection
Construct Opinion Lexicons from multiple sources
of opinion evidence
Opinion Terminology, Opinion Collocations,
Opinion Morphology
Score documents using Opinion Lexicons
Opinion Terminology
Wilsons Lexicons
A subset of Wilsons subjectivity terms
4747 strong 2190 weak subjective terms with
polarity
240 emphasis terms, 88 negation n-grams
High Frequency (HF) Lexicon
For each of IMDb movie 2006 blog training data
Extract high frequency terms from positive
training data (e.g., movie review)
Exclude terms that occur in negative training
data (e.g., movie plot summary)
Select a set of opinion terms
Combine the IMDb blog term sets
Assign polarity strength to each term
Expand with synonyms antonyms from Wordnet

6
WIDIT Approach Opinion Lexicons

Opinion Collocations
I-You (IU) Lexicon
For each of movie review positive blog training
data
Extract n-grams that begin/end with IU anchors
(e.g., I, You, my, your, me)
Select a set of opinion collocations
Combine the movie blog term sets
Assign strength polarity to each collocation
Add verb conjugations noun plurals
Expand with HF Wilson terms
Acronym Lexicon
Select opinion collocations from netlingo
acronyms
e.g., afaik (as far as I know), imho (in my
humble opinion)
Assign strength polarity to each collocation

7
WIDIT Approach Opinion Lexicons

Opinion Morphology
When expressing opinion, people become creative
and tend to use uncommon/rare terms
(Wiebe,Wilson, Bruce, Bell, Martin, 2004)
LF Lexicon LF Regex
Compile a set of Low Frequency (LF) terms in the
blog collection
Exclude terms that occur frequently in negative
training data
Construct regular expressions (LF regex) to
identify Opinion Morph (OM) terms
Based on examination of HF terms LF patterns
Compound words (e.g., crazygood, ohmygod)
Repeat-character words (e.g., sooo, fantaaastic)
Morph-spelled words (e.g., luv, hizzarious)
Apply regex to LF term set
Iteratively refine regex based on the examination
of regex results
Exclude regex matches from LF term set
Select OM terms (LF lexicon) from the remaining
set

8
WIDIT Approach Opinion Reranking

Opinion Reranking factors
Opinion Terminology
Wilsons lexicon, HF lexicon
Opinion Collocations
AC lexicon, IU lexicon
Opinion Morphology
LF lexicon, LF regex
Opinion Reranking (OR) Method
Compute OR scores for each document
Document-length normalized frequency
Rerank topic-reranked documents using
combined OR score topic-reranking groups

9
Opinion Reranking

Adjective-Verb (AV) Module
Hypothesis
Opinion blogs have a high density of Opinion
Adjectives Verbs
Method
Construct AV lexicons
Manually compile a AV seed set
e.g., good, bad, support, against, like, hate
Expand the seed set with synonyms antonyms from
lexical sources (AV1)
Expand AV1 with similar AV terms using
Distributional Similarity (AV2)
Compute AV scores
AV1 score Document-length normalized frequency
AV1 terms near query title string in document
AV2 score AV2 density in document
AV2 term frequency / total adjectiveverb
frequency

10
Opinion Reranking

AV expansion by Distributional Similarity
Objective
Find a cluster of similar words given a seed set
of Opinion AV
Hypothesis
Similar words have similar distributional
(co-occurrence) patterns.
Learning Subjective Language (Wiebe et al.,
2004)
Method
Split the training data into a training set and a
validation set
Find terms that co-occur with seed set terms in
the training set
Refine the expanded term set E(n)
Classify the validation set with E(1)..E(n)
Select E(k), which has the highest classification
performance
Manually filter E(k) to create the final Opinion
AV lexicon

11
WIDIT Approach Polarity Detection

For each opinion-reranked document,
Compute positive negative polarity scores
Combine polarity scores using D-tuned formula
fsc(p), fsc(n)
Apply polarity detection heuristic
Positive polarity if
most of opinion factors are positive,
fsc(p)-fsc(n) gt threshold
fsc(p) gtgt fsc(n)
Negative polarity if
most of opinion factors are negative,
fsc(n)-fsc(p) gt threshold
fsc(n) gtgt fsc(p)
Mixed polarity otherwise

12
WIDIT Approach Dynamic Tuning

Reranking formula
RS aNSorig ß?(wiNSi)
wi weight of reranking factor i
NSi normalized score of factor i
(Si Smin) / (Smax Smin)
a weight of original score
ß weight of overall reranking score
How to determine a, ß, wi?
Too many parameters for exhaustive combinations
Linear combination may not suffice
Dynamic Tuning
Real-time display of parameter tuning effect on
performance
To guide the user towards local optimum
By harnessing both human intelligence (pattern
recognition) w/ computational power of machine

13
WIDIT Approach Dynamic Tuning

Opinion Reranking

14
WIDIT Approach Dynamic Tuning

Polarity Detection

15
WIDIT Approach Fusion

Weighted Sum Fusion Formula
FS ?(wiNSi)
Fusion Type
Normalized sum (Min-Max) fusion wi 1
MAP fusion wi MAP of training runs
D-tuned fusion
Fusion Combinations
By Query Length
Short, Long, Long w/ nouns
By Term Weight
Okapi, SMART
Fusion Levels
Baseline results
Topic-reranked results
Opinion-reranked results

wi weight of system i (relative
contribution of each system) NSi normalized
score of a document by system i (Si
Smin) / (Smax Smin)
16
Result At a Glance

Opinion Finding

17
Result At a Glance

Polarity Ranking (positive)

18
Result At a Glance

Polarity Ranking (negative)

19
Reranking Effect
20
Reranking Dynamic Tuning Effect
21
Opinion Reranking Factors
22
Relative Short Query Performance of WIDIT Opinion
Finding System

Improvements over Baseline for Short Query
Opinion Detection Performances by TREC-2007
participants
Good OnTopic retrieval ? good opinion retrieval
- but not necessarily due to oprinion reranking
23
Concluding Remarks

Noise Reduction
Positive effect on retrieval performance
Reranking Dynamic Tuning
Most influential components of the system
Fusion
Combining multiple complementary sources is
effective for opinion detection
Future Study
Method fusion Lexicon Machine Learning NLP
Automatic construction of lexicons
Query Expansion optimization
Failure Analysis

24
Questions?

Wilsons lexicon
http//www.cs.pitt.edu/mpqa/opinionfinderrelease/
Movie Review Data
http//www.cs.cornell.edu/people/pabo/movie-review
-data/
Movie Plot Summaries
http//www.imdb.com/Sections/Plots/
Netlingo Terms
http//www.netlingo.com/emailsh.cfm
WIDIT Lexicons
http//elvis.slis.indiana.edu/lexlist.htm

Write a Comment

User Comments (0)

About PowerShow.com

WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence - PowerPoint PPT Presentation

WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence

Expand with synonyms & antonyms from Wordnet. 5. WIDIT Lab, Indiana University ... Expand the seed set with synonyms & antonyms from lexical sources (AV1) ... – PowerPoint PPT presentation