WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence

Description:

Expand with synonyms & antonyms from Wordnet. 5. WIDIT Lab, Indiana University ... Expand the seed set with synonyms & antonyms from lexical sources (AV1) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 25
Provided by: SLIS69
Category:

less

Transcript and Presenter's Notes

Title: WIDIT in TREC2008 Blog Track: Leveraging multiple sources of opinion evidence


1
WIDIT in TREC-2008 Blog TrackLeveraging
multiple sources of opinion evidence
  • Kiduk Yang
  • WIDIT Laboratory
  • School of Library Information Science
  • Indiana University

2
Blog Track Challenge
  • Targeted Opinion Detection
  • Subjective language is context-dependent
  • Both objective and subjective documents are
    composed of a mixture of subjective and objective
    language
  • Must associate opinion to the target
  • Blogosphere Characteristics
  • Highly personalized ? non-standard use of
    language
  • Interactive ? opinion may span a fraction of the
    posting
  • Blogware ? embedded noise
  • Spam

3
Research Questions Opinion Detection
  • What are the evidences of opinion?
  • Opinion Terminology
  • Words often used in expressing an opinion
  • e.g., Skype sucks, Skype rocks, Skype is
    cool
  • Opinion Collocations
  • Collocations that mark an opinion
  • e.g., I think tomato is a fruit, Tomato is a
    vegetable to me
  • Opinion Morphology
  • Word morphing to emphasize an opinion
  • e.g., Vista is soooo buggy, Vista is metacool
  • How can they be leveraged?
  • Opinion classification via Supervised Learning
  • Document scoring using Opinion Lexicons
  • How can they be combined to detect opinionated
    blogs?
  • Weighted sum optimized via Dynamic Tuning

4
WIDIT Blog System Architecture
Wilsons Lexicons
Netlingo Terms
BlogData
IMDbData
Blogsw/o Noise
Noise Reduction
Blogs
Opinion Lexicons
Document Indexing
Opinion Reranking
OnTopicResults
OpinionResults
InvertedIndex
DynamicTuning
Topic Reranking
Retrieval
InitialResults
Fusion
ExpandedQuery
LongQuery
ShortQuery
PolarityDetection
FusionResult
PolarityResult
Query Indexing
Topics
5
WIDIT Approach Opinion Lexicons
  • Lexicon-based Opinion Detection
  • Construct Opinion Lexicons from multiple sources
    of opinion evidence
  • Opinion Terminology, Opinion Collocations,
    Opinion Morphology
  • Score documents using Opinion Lexicons
  • Opinion Terminology
  • Wilsons Lexicons
  • A subset of Wilsons subjectivity terms
  • 4747 strong 2190 weak subjective terms with
    polarity
  • 240 emphasis terms, 88 negation n-grams
  • High Frequency (HF) Lexicon
  • For each of IMDb movie 2006 blog training data
  • Extract high frequency terms from positive
    training data (e.g., movie review)
  • Exclude terms that occur in negative training
    data (e.g., movie plot summary)
  • Select a set of opinion terms
  • Combine the IMDb blog term sets
  • Assign polarity strength to each term
  • Expand with synonyms antonyms from Wordnet

6
WIDIT Approach Opinion Lexicons
  • Opinion Collocations
  • I-You (IU) Lexicon
  • For each of movie review positive blog training
    data
  • Extract n-grams that begin/end with IU anchors
    (e.g., I, You, my, your, me)
  • Select a set of opinion collocations
  • Combine the movie blog term sets
  • Assign strength polarity to each collocation
  • Add verb conjugations noun plurals
  • Expand with HF Wilson terms
  • Acronym Lexicon
  • Select opinion collocations from netlingo
    acronyms
  • e.g., afaik (as far as I know), imho (in my
    humble opinion)
  • Assign strength polarity to each collocation

7
WIDIT Approach Opinion Lexicons
  • Opinion Morphology
  • When expressing opinion, people become creative
    and tend to use uncommon/rare terms
    (Wiebe,Wilson, Bruce, Bell, Martin, 2004)
  • LF Lexicon LF Regex
  • Compile a set of Low Frequency (LF) terms in the
    blog collection
  • Exclude terms that occur frequently in negative
    training data
  • Construct regular expressions (LF regex) to
    identify Opinion Morph (OM) terms
  • Based on examination of HF terms LF patterns
  • Compound words (e.g., crazygood, ohmygod)
  • Repeat-character words (e.g., sooo, fantaaastic)
  • Morph-spelled words (e.g., luv, hizzarious)
  • Apply regex to LF term set
  • Iteratively refine regex based on the examination
    of regex results
  • Exclude regex matches from LF term set
  • Select OM terms (LF lexicon) from the remaining
    set

8
WIDIT Approach Opinion Reranking
  • Opinion Reranking factors
  • Opinion Terminology
  • Wilsons lexicon, HF lexicon
  • Opinion Collocations
  • AC lexicon, IU lexicon
  • Opinion Morphology
  • LF lexicon, LF regex
  • Opinion Reranking (OR) Method
  • Compute OR scores for each document
  • Document-length normalized frequency
  • Rerank topic-reranked documents using
  • combined OR score topic-reranking groups

9
Opinion Reranking
  • Adjective-Verb (AV) Module
  • Hypothesis
  • Opinion blogs have a high density of Opinion
    Adjectives Verbs
  • Method
  • Construct AV lexicons
  • Manually compile a AV seed set
  • e.g., good, bad, support, against, like, hate
  • Expand the seed set with synonyms antonyms from
    lexical sources (AV1)
  • Expand AV1 with similar AV terms using
    Distributional Similarity (AV2)
  • Compute AV scores
  • AV1 score Document-length normalized frequency
  • AV1 terms near query title string in document
  • AV2 score AV2 density in document
  • AV2 term frequency / total adjectiveverb
    frequency

10
Opinion Reranking
  • AV expansion by Distributional Similarity
  • Objective
  • Find a cluster of similar words given a seed set
    of Opinion AV
  • Hypothesis
  • Similar words have similar distributional
    (co-occurrence) patterns.
  • Learning Subjective Language (Wiebe et al.,
    2004)
  • Method
  • Split the training data into a training set and a
    validation set
  • Find terms that co-occur with seed set terms in
    the training set
  • Refine the expanded term set E(n)
  • Classify the validation set with E(1)..E(n)
  • Select E(k), which has the highest classification
    performance
  • Manually filter E(k) to create the final Opinion
    AV lexicon

11
WIDIT Approach Polarity Detection
  • For each opinion-reranked document,
  • Compute positive negative polarity scores
  • Combine polarity scores using D-tuned formula
  • fsc(p), fsc(n)
  • Apply polarity detection heuristic
  • Positive polarity if
  • most of opinion factors are positive,
  • fsc(p)-fsc(n) gt threshold
  • fsc(p) gtgt fsc(n)
  • Negative polarity if
  • most of opinion factors are negative,
  • fsc(n)-fsc(p) gt threshold
  • fsc(n) gtgt fsc(p)
  • Mixed polarity otherwise

12
WIDIT Approach Dynamic Tuning
  • Reranking formula
  • RS aNSorig ß?(wiNSi)
  • wi weight of reranking factor i
  • NSi normalized score of factor i
  • (Si Smin) / (Smax Smin)
  • a weight of original score
  • ß weight of overall reranking score
  • How to determine a, ß, wi?
  • Too many parameters for exhaustive combinations
  • Linear combination may not suffice
  • Dynamic Tuning
  • Real-time display of parameter tuning effect on
    performance
  • To guide the user towards local optimum
  • By harnessing both human intelligence (pattern
    recognition) w/ computational power of machine

13
WIDIT Approach Dynamic Tuning
  • Opinion Reranking

14
WIDIT Approach Dynamic Tuning
  • Polarity Detection

15
WIDIT Approach Fusion
  • Weighted Sum Fusion Formula
  • FS ?(wiNSi)
  • Fusion Type
  • Normalized sum (Min-Max) fusion wi 1
  • MAP fusion wi MAP of training runs
  • D-tuned fusion
  • Fusion Combinations
  • By Query Length
  • Short, Long, Long w/ nouns
  • By Term Weight
  • Okapi, SMART
  • Fusion Levels
  • Baseline results
  • Topic-reranked results
  • Opinion-reranked results

wi weight of system i (relative
contribution of each system) NSi normalized
score of a document by system i (Si
Smin) / (Smax Smin)
16
Result At a Glance
  • Opinion Finding

17
Result At a Glance
  • Polarity Ranking (positive)

18
Result At a Glance
  • Polarity Ranking (negative)

19
Reranking Effect
20
Reranking Dynamic Tuning Effect
21
Opinion Reranking Factors
22
Relative Short Query Performance of WIDIT Opinion
Finding System

Improvements over Baseline for Short Query
Opinion Detection Performances by TREC-2007
participants
Good OnTopic retrieval ? good opinion retrieval
- but not necessarily due to oprinion reranking
23
Concluding Remarks
  • Noise Reduction
  • Positive effect on retrieval performance
  • Reranking Dynamic Tuning
  • Most influential components of the system
  • Fusion
  • Combining multiple complementary sources is
    effective for opinion detection
  • Future Study
  • Method fusion Lexicon Machine Learning NLP
  • Automatic construction of lexicons
  • Query Expansion optimization
  • Failure Analysis

24
Questions?
  • Wilsons lexicon
  • http//www.cs.pitt.edu/mpqa/opinionfinderrelease/
  • Movie Review Data
  • http//www.cs.cornell.edu/people/pabo/movie-review
    -data/
  • Movie Plot Summaries
  • http//www.imdb.com/Sections/Plots/
  • Netlingo Terms
  • http//www.netlingo.com/emailsh.cfm
  • WIDIT Lexicons
  • http//elvis.slis.indiana.edu/lexlist.htm
Write a Comment
User Comments (0)
About PowerShow.com