Opinion Observer: Analyzing and Comparing Opinions on the Web - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Opinion Observer: Analyzing and Comparing Opinions on the Web

Description:

Grouping Synonyms. Grouping features with similar meanings. ... Employ WordNet to check if any synonym groups/sets exist among the features. ... – PowerPoint PPT presentation

Number of Views:181

Avg rating:3.0/5.0

Slides: 23

Provided by: Hexi

Category:

more less

Transcript and Presenter's Notes

Title: Opinion Observer: Analyzing and Comparing Opinions on the Web

1
Opinion Observer Analyzing and Comparing
Opinionson the Web

Bing Liu, Minqing Hu, Junsheng Cheng
Department of Computer Science
University of Illinois at Chicago

WWW05
2
Abstract

This paper focuses on online customer reviews of
products.
Two contributions
propose a novel framework for analyzing and
comparing consumer opinions of competing
products. A prototype system Opinion Observer
is also implemented.
a new technique based on language pattern mining
is proposed to extract product features from Pros
and Cons in a particular type of reviews.

3
Introduction

Opinion Observer
We propose a new technique to identify product
features from Pros and Cons in this format Pros,
Cons and detailed review.
Pros and Cons tend to be very brief.
e.g., heavy, bad picture quality, battery life
too short
We do not analyze detailed reviews.

4
Related Work

Hu, M., and Liu, B. 2004.
Perform the same tasks based on unsupervised
itemset mining.
Morinaga, S., Yamanishi, K., Tateishi, K., and
Fukushima, T. 2002.
Compare information of different products in a
category through search to find the reputation of
the products.
Bourigault, D. 1995Daille, B. 1996Jacquemin,
C., Bourigault, D. 2001Justeson, J., Katz, S.
1995
Terminology finding tasks.
Using noun phrases are not sufficient for finding
product features.
Bunescu, R., Mooney, R. 2004Etzioni et al.
2004Freitag, D., McCallum, A. 2000Lafferty, J.,
McCallum, A., Pereira, F. 2001Rosario, B., and
Hearst, M. 2004
Entity extraction tasks.
Product features are usually not named entities.
Also, our extraction work uses short sentence
segments rather than full sentences.

5
Related Work (cont.)

Hearst, M. 1992Das, S. and Chen, M., 2001Tong,
R. 2001Turney, P. 2002Pang, B., Lee, L., and
Vaithyanathan, S., 2002Dave, K., Lawrence, S.,
and Pennock, D. 2003Agrawal, R., Rajagopalan,
S., Srikant, R., Xu, Y. 2003Hatzivassiloglou,
V., and Wiebe, J. 2000Wiebe, J., Bruce, R., and
OHara, T. 1999
Sentiment classification tasks.
They do not identify features commented by
customers or what customers praise or complain
about.

6
System Architecture
7
Visualizing Opinion Comparison
8
Problem Statement

P P1, P2, , Pn a set of products.
Ri r1, r2, , rk a set of reviews of
product Pi.
Explicit feature a product feature appears in
rj.Implicit feature not appear in rj but is
implied.
Battery life is too short
Too big ? size
In order to visually compare consumer opinions on
a set of products, we need to analyze the reviews
in Ri of each product Pi
(1) to find all the explicit and implicit product
features on which reviewers have expressed their
(positive or negative) opinions.
(2) to produce the positive opinion set and the
negative opinion set for each feature.

9
Automated Opinion Analysis

Observation Each sentence segment contains at
most one product feature. Sentence segments are
separated by ,, ., and, and but.
Cons Pros

10
Prepare a Training Dataset

Manually labeling a large number of reviews
POS tagging, remove digits.
ltNgt Battery ltNgt usage
ltVgt included ltNgt MB ltVgtis ltAdjgt stingy
Replace feature words with feature.
ltNgt feature ltNgt usage
ltVgt included ltNgt feature ltVgt is ltAdjgt stingy
Use 3-gram to produce shorter segments.
ltVgt included ltNgt feature ltVgt is ltAdjgt stingy
? ltAdjgt included ltNgt feature ltVgt is
ltNgt feature ltVgt is ltAdjgt stingy
Distinguish duplicate tags
ltN1gt feature ltN2gt usage
Perform word stemming
The resulting sentence (3-gram) segments are
saved in a transaction file.

11
Rule Generation

Association rule mining model
I i1, , in a set of items.
D a set of transactions. Each transaction
consists of a subset of items in I.
Association rule X ? Y, where X ? I, Y ? I, and
X nY Ø.
The rule X ? Y holds in D with confidence c if c
of transactions in D that support X also support
Y.
The rule has support s in D if s of transactions
in D contain X ? Y.
We use the association mining system CBA (Liu,
B., Hsu, W., Ma, Y. 1998) to mine rules.
We use 1 as the minimum support.
Some example rules
ltN1gt, ltN2gt ? feature
ltVgt, easy, to ? feature
ltN1gt ? feature, ltN2gt
ltN1gt, feature ? ltN2gt

12
Post-processing

We only need rules that have feature on the
RHS.
We need to consider the sequence of items in the
LHS.
e.g., ltVgt, easy, to ? feature should be
easy, to, ltVgt ? feature
Checking each rule against the transaction file
to find the possible sequences.
Remove those derived rules with confidence lt 50.
Finally, we generate language patterns.
e.g.,

13
Extraction of Product Features

The resulting patterns are used to match and
identify candidate features from new reviews
after POS tagging.
A generated pattern does not need to match a part
of a sentence segment with the same length as the
pattern.
e.g., pattern ltNN1gt feature ltNN2gt can match
the segment size of printout.
If a sentence segment satisfies multiple
patterns, we normally use the pattern that gives
the highest confidence.
For those sentence segments that no pattern
applies, we use nouns or noun phrases as
features.
In the cases that a sentence segment has only a
single word, e.g., heavy and big, we treat
these single words as candidate features.

14
Feature Refinement

Two main mistakes made during extraction
Feature conflict
There is a more likely feature in the sentence
segment but not extracted by any pattern.
e.g., slight hum from subwoofer when not in
usehum is found to be the feature but not
subwoofer.
How to find this? subwoofer was found as
candidate features in other reviews, but hum
was never.

15
Feature Refinement (cont.)

Refinement strategies
Frequent-noun
1. The generated product features together with
their frequency counts are saved in a candidate
feature list.
2. For each sentence segment, if there are two or
more nouns, we choose the most frequent noun in
the candidate feature list.
Frequent-term
For each sentence segment, we simply choose the
word/phrase (it does not need to be a noun) with
the highest frequency in the candidate feature
list.

16
Mapping to Implicit Features

In tagging the training data for mining rules, we
also tag the mapping of candidate features to
their actual features.
e.g., when we tag heavy in the sentence segment
below as a feature word we also record a mapping
of heavy to ltweightgt. too heavy
Rule mining can be used to generate mapping
rules.

17
Grouping Synonyms

Grouping features with similar meanings.
e.g., photo, picture and image all refers
to the same feature in digital camera reviews.
Employ WordNet to check if any synonym
groups/sets exist among the features.
Choose only the top two frequent senses of a word
for finding its synonyms.

18
Experiments

Training and test review data
We manually tagged a large collection of reviews
of 15 electronic products from epinions.com.
10 of them are used as the training data to mine
patterns, and the rest are used as testing.
Evaluation measure
recall (r) and precision (p)
n the total number of reviews of a particular
product.
ECi the number of extracted features from
review i that are correct.
Ci the number of actual features in review i.
Ei the number of extracted features from review
i.

19
Experiments (cont.)

The frequent-term strategy gives better results
than the frequent-noun strategy.
some features are not expressed as nouns.
POS tagger makes mistakes.

20
Experiments (cont.)

There are still some adjectives and verbs appear
as implicit features.
The techniques in FBS are not suitable for Pros
and Cons, which are mostly short phrases or
incomplete sentences

21
Experiments (cont.)

The results for Pros are better than those for
Cons.
people tend to use similar words like
excellent, great, good in Pros. In
contrast, the words that people use to complain
differ a lot in Cons.
pattern for Pros 117 pattern for Cons 22

22
Conclusions

We proposed a novel visual analysis system to
compare consumer opinions of multiple products.
We designed a supervised pattern discovery method
to automatically identify product features from
Pros and Cons in reviews.
Future work
improve the automatic techniques.
study the strength of opinions.
investigate how to extract useful information
from other types of opinion sources.