Thumbs up Sentiment Classification using Machine Learning Techniques - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Thumbs up Sentiment Classification using Machine Learning Techniques

Description:

3) Turney's (2002) Work on Classification of Reviews ... 3. The Movie-Review Domain. 3. The Movie-Review Domain (Cont.) 4. A Closer Look At the Problem ... – PowerPoint PPT presentation

Number of Views:1128

Avg rating:3.0/5.0

Slides: 21

Provided by: MicrosoftC144

Category:

more less

Transcript and Presenter's Notes

Title: Thumbs up Sentiment Classification using Machine Learning Techniques

1
Thumbs up?Sentiment Classificationusing Machine
Learning Techniques

Bo Pang and Lillian Lee
Shivakumar Vaithyanathan

2
1. Introduction

To Examine the Effectiveness of Applying Machine
Learning Techniques to the Sentiment
Classification Problem
Sentiment seems to require more understanding
than the usual topic-based classification.

3
2. Previous Work(on Non-Topic-Based Text
Categorization)

The Source or Source Style (Biber 1988)
- Author, Publisher, Native-Language Background,
Brow,
(Mosteller Wallace 1984 Argamon-Engelson et
al. 1998 Tomokiyo Jones 2001 Kessler et al.
1997)
Genre Categorization and Subjectivity Detection
- Subjective genres such as editorial,
(Karlgren Cutting 1994 Kessler et al. 1997
Finn et al. 2002)
- To find features indicating that subjective
language is being used
(Hatzivassiloglou Wiebe 2000 Wiebe et al.
2001)
Techniques for these do not address our specific
classification task of determining what that
opinion actually is.

4
2. Previous Work(on Sentiment-Based
Classification)

The Semantic Orientation of Individual Words or
Phrases
- Using Linguistic Heuristics or a Pre-selected
Set of Seed Words
(Hatzivassiloglou McKeown 1997 Turney
Littman 2002)
2) Sentiment-Based Categorization of Entire
Documents
The Use of Models inspired by cognitive
linguistics
(Hearst 1992 Sack 1994)
The Manual or Semi-Manual Construction of
Discriminant-Word Lexicons
(Huettner Subasic 2000 Das Chen 2001 Tong
2001)
3) Turneys (2002) Work on Classification of
Reviews
A Specific Unsupervised Learning Technique based
on the Mutual Information between Document
Phrases and the Words excellent and poor

5
3. The Movie-Review Domain

This domain is experimentally convenient
There are large on-line collections of such
reviews.
Machine-Extractable Rating Indicator
Data Source
The Internet Movie Database(IMDb) archive of the
rec.arts.movies.reviews newsgroup

6
3. The Movie-Review Domain (Cont.)

To Select Only Reviews where the Author Rating
was Expressed
Automatically Extracted Ratings were converted
into one of three categories Positive, Negative,
or Neutral.
To Impose a Limit of Fewer than 20 Reviews per
Author per Sentiment Category.
A Corpus of 752 Negative and 1301 Positive
reviews, with a total of 144 reviewers represented

7
4. A Closer Look At the Problem
8
5. Machine Learning Methods

The Standard Bag-of-Features Framework
f1, , fm a Predefined set of m features that
can appear in a document
ni(d) the number of times fi occurs in document
d
? d (n1(d), n2(d), . . . , nm(d))
1) Naive Bayes
2) Maximum Entropy
3) Support Vector Machines

9
5.1 Naïve Bayes

To Assign to a given Document d the Class c
arg maxc P(c d)
Bayes rule
Naïve Bayes(NB) Classifier

10
5.2 Maximum Entropy

An Alternative Technique which has proven
Effective in a number of Natural Language
Processing Applications(Berger et al. 1996)
- Z(d) a Normalization Function
- Fi,c is a feature/class function for feature fi
and class c
- ?i,c Feature-Weight Parameter

11
5.3 Support Vector Machines

Large-Margin Classifers
A Hyperplane that not only Separates the Document
Vectors(? w) in one class from those in the
other, but for which the separation, or margin,
is as large as possible.
Let cj ?1,-1 be the Correct Class of Document
dj
The ajs are obtained by solving a dual
optimization problem.
Those? d such that aj is greater than zero are
called support vectors, since they are the only
document vectors contributing to ? w.
Classification of test instances consists simply
of determining which side of ? ws hyperplane
they fall on.

12
6. Evalution6.1 Experimental Set-up

To Create a Data Set with Uniform Class
Distribution,
Select 700 Positive-Sentiment and 700
Negative-Sentiment Documents
Divide this Data into Three Equal-Sized Folds,
Maintaining Balanced Class Distributions in each
Fold.
To Attempt to Model the Potentially Important
Contextual Effect of Negation,
Add the Tag NOT to Every Word between a Negation
Word (not, isnt, didnt, etc.) and the
first Punctuation Mark following the Negation
Word

13
6. Evalution6.1 Experimental Set-up (Cont.)

To Focus on Features based on Unigrams (with
negation tagging) and Bigrams
(1) The 16165 Unigrams appearing at least 4 times
in our 1400-Document Corpus (lower count cutoffs
did not yield significantly different results)
(2) The 16165 Bigrams occurring most Often in the
Same Data (the selected bigrams all occurred at
least seven times)
We did not Add Negation Tags to the Bigrams,
since we Consider Bigrams (and n-grams in
general) to be an Orthogonal Way to Incorporate
Context.

14
6. Evalution6.2 Results

Initial unigram results
The Random-Choice Baseline of 50
Two Human-Selected-Unigram Baselines of 58 and
64
The 69 Baseline Achieved via Limited Access to
the Test-Data Statistics

15
6. Evalution6.2 Results (Cont.)

The Random-Choice Baseline of 50
Two Human-Selected-Unigram Baselines of 58 and
64
The 69 Baseline Achieved via Limited Access to
the Test-Data Statistics
Initial unigram results
Sentiment categorization is more difficult than
topic classification.

16
6. Evalution6.2 Results (Cont.)

Feature frequency vs. presence
The definition of the MaxEnt feature/class
functions Fi,c only reflects the presence or
absence of a feature.
Better Performance (much better performance for
SVMs) is achieved by accounting only for Feature
Presence, not Feature Frequency.

17
6. Evalution6.2 Results (Cont.)

Bigrams
Bigram information does not improve performance
beyond that of unigram presence.
Relying just on bigrams causes accuracy to
decline by as much as 5.8 percentage points.

18
6. Evalution6.2 Results (Cont.)

Parts of speech
The accuracy improves slightly for Naive Bayes
but declines for SVMs, and the performance of
MaxEnt is unchanged.
The 2633 adjectives provide less useful
information than unigram presence.
Simply using the 2633 most frequent unigrams is a
better choice, yielding performance comparable to
that of using (the presence of) all 16165.

19
6. Evalution6.2 Results (Cont.)

Position
We tagged each word according to whether it
appeared in the first quarter, last quarter, or
middle half of the document14.
The results didnt differ greatly from using
unigrams alone.

20
7. Discussion

Naive Bayes tends to do the worst and SVMs tend
to do the best.
Unigram Presence Information turned out to be the
most effective.
The superiority of Presence Information in
comparison to Frequency Information in our
setting contradicts previous observations made in
topic-classification work.
thwarted expectations narrative
Some form of discourse analysis is necessary
(using more sophisticated techniques than our
positional feature mentioned above), or at least
some way of determining the focus of each
sentence.