Opinion Observer: Analyzing and Comparing Opinions on the Web - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Opinion Observer: Analyzing and Comparing Opinions on the Web

Description:

Grouping Synonyms. Grouping features with similar meanings. ... Employ WordNet to check if any synonym groups/sets exist among the features. ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 23
Provided by: Hexi
Category:

less

Transcript and Presenter's Notes

Title: Opinion Observer: Analyzing and Comparing Opinions on the Web


1
Opinion Observer Analyzing and Comparing
Opinionson the Web
  • Bing Liu, Minqing Hu, Junsheng Cheng
  • Department of Computer Science
  • University of Illinois at Chicago

WWW05
2
Abstract
  • This paper focuses on online customer reviews of
    products.
  • Two contributions
  • propose a novel framework for analyzing and
    comparing consumer opinions of competing
    products. A prototype system Opinion Observer
    is also implemented.
  • a new technique based on language pattern mining
    is proposed to extract product features from Pros
    and Cons in a particular type of reviews.

3
Introduction
  • Opinion Observer
  • We propose a new technique to identify product
    features from Pros and Cons in this format Pros,
    Cons and detailed review.
  • Pros and Cons tend to be very brief.
  • e.g., heavy, bad picture quality, battery life
    too short
  • We do not analyze detailed reviews.

4
Related Work
  • Hu, M., and Liu, B. 2004.
  • Perform the same tasks based on unsupervised
    itemset mining.
  • Morinaga, S., Yamanishi, K., Tateishi, K., and
    Fukushima, T. 2002.
  • Compare information of different products in a
    category through search to find the reputation of
    the products.
  • Bourigault, D. 1995Daille, B. 1996Jacquemin,
    C., Bourigault, D. 2001Justeson, J., Katz, S.
    1995
  • Terminology finding tasks.
  • Using noun phrases are not sufficient for finding
    product features.
  • Bunescu, R., Mooney, R. 2004Etzioni et al.
    2004Freitag, D., McCallum, A. 2000Lafferty, J.,
    McCallum, A., Pereira, F. 2001Rosario, B., and
    Hearst, M. 2004
  • Entity extraction tasks.
  • Product features are usually not named entities.
    Also, our extraction work uses short sentence
    segments rather than full sentences.

5
Related Work (cont.)
  • Hearst, M. 1992Das, S. and Chen, M., 2001Tong,
    R. 2001Turney, P. 2002Pang, B., Lee, L., and
    Vaithyanathan, S., 2002Dave, K., Lawrence, S.,
    and Pennock, D. 2003Agrawal, R., Rajagopalan,
    S., Srikant, R., Xu, Y. 2003Hatzivassiloglou,
    V., and Wiebe, J. 2000Wiebe, J., Bruce, R., and
    OHara, T. 1999
  • Sentiment classification tasks.
  • They do not identify features commented by
    customers or what customers praise or complain
    about.

6
System Architecture
7
Visualizing Opinion Comparison
8
Problem Statement
  • P P1, P2, , Pn a set of products.
  • Ri r1, r2, , rk a set of reviews of
    product Pi.
  • Explicit feature a product feature appears in
    rj.Implicit feature not appear in rj but is
    implied.
  • Battery life is too short
  • Too big ? size
  • In order to visually compare consumer opinions on
    a set of products, we need to analyze the reviews
    in Ri of each product Pi
  • (1) to find all the explicit and implicit product
    features on which reviewers have expressed their
    (positive or negative) opinions.
  • (2) to produce the positive opinion set and the
    negative opinion set for each feature.

9
Automated Opinion Analysis
  • Observation Each sentence segment contains at
    most one product feature. Sentence segments are
    separated by ,, ., and, and but.
  • Cons Pros

10
Prepare a Training Dataset
  • Manually labeling a large number of reviews
  • POS tagging, remove digits.
  • ltNgt Battery ltNgt usage
  • ltVgt included ltNgt MB ltVgtis ltAdjgt stingy
  • Replace feature words with feature.
  • ltNgt feature ltNgt usage
  • ltVgt included ltNgt feature ltVgt is ltAdjgt stingy
  • Use 3-gram to produce shorter segments.
  • ltVgt included ltNgt feature ltVgt is ltAdjgt stingy
    ? ltAdjgt included ltNgt feature ltVgt is
    ltNgt feature ltVgt is ltAdjgt stingy
  • Distinguish duplicate tags
  • ltN1gt feature ltN2gt usage
  • Perform word stemming
  • The resulting sentence (3-gram) segments are
    saved in a transaction file.

11
Rule Generation
  • Association rule mining model
  • I i1, , in a set of items.
  • D a set of transactions. Each transaction
    consists of a subset of items in I.
  • Association rule X ? Y, where X ? I, Y ? I, and
    X nY Ø.
  • The rule X ? Y holds in D with confidence c if c
    of transactions in D that support X also support
    Y.
  • The rule has support s in D if s of transactions
    in D contain X ? Y.
  • We use the association mining system CBA (Liu,
    B., Hsu, W., Ma, Y. 1998) to mine rules.
  • We use 1 as the minimum support.
  • Some example rules
  • ltN1gt, ltN2gt ? feature
  • ltVgt, easy, to ? feature
  • ltN1gt ? feature, ltN2gt
  • ltN1gt, feature ? ltN2gt

12
Post-processing
  • We only need rules that have feature on the
    RHS.
  • We need to consider the sequence of items in the
    LHS.
  • e.g., ltVgt, easy, to ? feature should be
    easy, to, ltVgt ? feature
  • Checking each rule against the transaction file
    to find the possible sequences.
  • Remove those derived rules with confidence lt 50.
  • Finally, we generate language patterns.
  • e.g.,

13
Extraction of Product Features
  • The resulting patterns are used to match and
    identify candidate features from new reviews
    after POS tagging.
  • A generated pattern does not need to match a part
    of a sentence segment with the same length as the
    pattern.
  • e.g., pattern ltNN1gt feature ltNN2gt can match
    the segment size of printout.
  • If a sentence segment satisfies multiple
    patterns, we normally use the pattern that gives
    the highest confidence.
  • For those sentence segments that no pattern
    applies, we use nouns or noun phrases as
    features.
  • In the cases that a sentence segment has only a
    single word, e.g., heavy and big, we treat
    these single words as candidate features.

14
Feature Refinement
  • Two main mistakes made during extraction
  • Feature conflict
  • There is a more likely feature in the sentence
    segment but not extracted by any pattern.
  • e.g., slight hum from subwoofer when not in
    usehum is found to be the feature but not
    subwoofer.
  • How to find this? subwoofer was found as
    candidate features in other reviews, but hum
    was never.

15
Feature Refinement (cont.)
  • Refinement strategies
  • Frequent-noun
  • 1. The generated product features together with
    their frequency counts are saved in a candidate
    feature list.
  • 2. For each sentence segment, if there are two or
    more nouns, we choose the most frequent noun in
    the candidate feature list.
  • Frequent-term
  • For each sentence segment, we simply choose the
    word/phrase (it does not need to be a noun) with
    the highest frequency in the candidate feature
    list.

16
Mapping to Implicit Features
  • In tagging the training data for mining rules, we
    also tag the mapping of candidate features to
    their actual features.
  • e.g., when we tag heavy in the sentence segment
    below as a feature word we also record a mapping
    of heavy to ltweightgt. too heavy
  • Rule mining can be used to generate mapping
    rules.

17
Grouping Synonyms
  • Grouping features with similar meanings.
  • e.g., photo, picture and image all refers
    to the same feature in digital camera reviews.
  • Employ WordNet to check if any synonym
    groups/sets exist among the features.
  • Choose only the top two frequent senses of a word
    for finding its synonyms.

18
Experiments
  • Training and test review data
  • We manually tagged a large collection of reviews
    of 15 electronic products from epinions.com.
  • 10 of them are used as the training data to mine
    patterns, and the rest are used as testing.
  • Evaluation measure
  • recall (r) and precision (p)
  • n the total number of reviews of a particular
    product.
  • ECi the number of extracted features from
    review i that are correct.
  • Ci the number of actual features in review i.
  • Ei the number of extracted features from review
    i.

19
Experiments (cont.)
  • The frequent-term strategy gives better results
    than the frequent-noun strategy.
  • some features are not expressed as nouns.
  • POS tagger makes mistakes.

20
Experiments (cont.)
  • There are still some adjectives and verbs appear
    as implicit features.
  • The techniques in FBS are not suitable for Pros
    and Cons, which are mostly short phrases or
    incomplete sentences

21
Experiments (cont.)
  • The results for Pros are better than those for
    Cons.
  • people tend to use similar words like
    excellent, great, good in Pros. In
    contrast, the words that people use to complain
    differ a lot in Cons.
  • pattern for Pros 117 pattern for Cons 22

22
Conclusions
  • We proposed a novel visual analysis system to
    compare consumer opinions of multiple products.
  • We designed a supervised pattern discovery method
    to automatically identify product features from
    Pros and Cons in reviews.
  • Future work
  • improve the automatic techniques.
  • study the strength of opinions.
  • investigate how to extract useful information
    from other types of opinion sources.
Write a Comment
User Comments (0)
About PowerShow.com