Thumbs up Sentiment Classification using Machine Learning Techniques - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Thumbs up Sentiment Classification using Machine Learning Techniques

Description:

3) Turney's (2002) Work on Classification of Reviews ... 3. The Movie-Review Domain. 3. The Movie-Review Domain (Cont.) 4. A Closer Look At the Problem ... – PowerPoint PPT presentation

Number of Views:1128
Avg rating:3.0/5.0
Slides: 21
Provided by: MicrosoftC144
Category:

less

Transcript and Presenter's Notes

Title: Thumbs up Sentiment Classification using Machine Learning Techniques


1
Thumbs up?Sentiment Classificationusing Machine
Learning Techniques
  • Bo Pang and Lillian Lee
  • Shivakumar Vaithyanathan

2
1. Introduction
  • To Examine the Effectiveness of Applying Machine
    Learning Techniques to the Sentiment
    Classification Problem
  • Sentiment seems to require more understanding
    than the usual topic-based classification.

3
2. Previous Work(on Non-Topic-Based Text
Categorization)
  • The Source or Source Style (Biber 1988)
  • - Author, Publisher, Native-Language Background,
    Brow,
  • (Mosteller Wallace 1984 Argamon-Engelson et
    al. 1998 Tomokiyo Jones 2001 Kessler et al.
    1997)
  • Genre Categorization and Subjectivity Detection
  • - Subjective genres such as editorial,
  • (Karlgren Cutting 1994 Kessler et al. 1997
    Finn et al. 2002)
  • - To find features indicating that subjective
    language is being used
  • (Hatzivassiloglou Wiebe 2000 Wiebe et al.
    2001)
  • Techniques for these do not address our specific
    classification task of determining what that
    opinion actually is.

4
2. Previous Work(on Sentiment-Based
Classification)
  • The Semantic Orientation of Individual Words or
    Phrases
  • - Using Linguistic Heuristics or a Pre-selected
    Set of Seed Words
  • (Hatzivassiloglou McKeown 1997 Turney
    Littman 2002)
  • 2) Sentiment-Based Categorization of Entire
    Documents
  • The Use of Models inspired by cognitive
    linguistics
  • (Hearst 1992 Sack 1994)
  • The Manual or Semi-Manual Construction of
    Discriminant-Word Lexicons
  • (Huettner Subasic 2000 Das Chen 2001 Tong
    2001)
  • 3) Turneys (2002) Work on Classification of
    Reviews
  • A Specific Unsupervised Learning Technique based
    on the Mutual Information between Document
    Phrases and the Words excellent and poor

5
3. The Movie-Review Domain
  • This domain is experimentally convenient
  • There are large on-line collections of such
    reviews.
  • Machine-Extractable Rating Indicator
  • Data Source
  • The Internet Movie Database(IMDb) archive of the
    rec.arts.movies.reviews newsgroup

6
3. The Movie-Review Domain (Cont.)
  • To Select Only Reviews where the Author Rating
    was Expressed
  • Automatically Extracted Ratings were converted
    into one of three categories Positive, Negative,
    or Neutral.
  • To Impose a Limit of Fewer than 20 Reviews per
    Author per Sentiment Category.
  • A Corpus of 752 Negative and 1301 Positive
    reviews, with a total of 144 reviewers represented

7
4. A Closer Look At the Problem
8
5. Machine Learning Methods
  • The Standard Bag-of-Features Framework
  • f1, , fm a Predefined set of m features that
    can appear in a document
  • ni(d) the number of times fi occurs in document
    d
  • ? d (n1(d), n2(d), . . . , nm(d))
  • 1) Naive Bayes
  • 2) Maximum Entropy
  • 3) Support Vector Machines

9
5.1 Naïve Bayes
  • To Assign to a given Document d the Class c
    arg maxc P(c d)
  • Bayes rule
  • Naïve Bayes(NB) Classifier

10
5.2 Maximum Entropy
  • An Alternative Technique which has proven
    Effective in a number of Natural Language
    Processing Applications(Berger et al. 1996)
  • - Z(d) a Normalization Function
  • - Fi,c is a feature/class function for feature fi
    and class c
  • - ?i,c Feature-Weight Parameter

11
5.3 Support Vector Machines
  • Large-Margin Classifers
  • A Hyperplane that not only Separates the Document
    Vectors(? w) in one class from those in the
    other, but for which the separation, or margin,
    is as large as possible.
  • Let cj ?1,-1 be the Correct Class of Document
    dj
  • The ajs are obtained by solving a dual
    optimization problem.
  • Those? d such that aj is greater than zero are
    called support vectors, since they are the only
    document vectors contributing to ? w.
  • Classification of test instances consists simply
    of determining which side of ? ws hyperplane
    they fall on.

12
6. Evalution6.1 Experimental Set-up
  • To Create a Data Set with Uniform Class
    Distribution,
  • Select 700 Positive-Sentiment and 700
    Negative-Sentiment Documents
  • Divide this Data into Three Equal-Sized Folds,
    Maintaining Balanced Class Distributions in each
    Fold.
  • To Attempt to Model the Potentially Important
    Contextual Effect of Negation,
  • Add the Tag NOT to Every Word between a Negation
    Word (not, isnt, didnt, etc.) and the
    first Punctuation Mark following the Negation
    Word

13
6. Evalution6.1 Experimental Set-up (Cont.)
  • To Focus on Features based on Unigrams (with
    negation tagging) and Bigrams
  • (1) The 16165 Unigrams appearing at least 4 times
    in our 1400-Document Corpus (lower count cutoffs
    did not yield significantly different results)
  • (2) The 16165 Bigrams occurring most Often in the
    Same Data (the selected bigrams all occurred at
    least seven times)
  • We did not Add Negation Tags to the Bigrams,
    since we Consider Bigrams (and n-grams in
    general) to be an Orthogonal Way to Incorporate
    Context.

14
6. Evalution6.2 Results
  • Initial unigram results
  • The Random-Choice Baseline of 50
  • Two Human-Selected-Unigram Baselines of 58 and
    64
  • The 69 Baseline Achieved via Limited Access to
    the Test-Data Statistics

15
6. Evalution6.2 Results (Cont.)
  • The Random-Choice Baseline of 50
  • Two Human-Selected-Unigram Baselines of 58 and
    64
  • The 69 Baseline Achieved via Limited Access to
    the Test-Data Statistics
  • Initial unigram results
  • Sentiment categorization is more difficult than
    topic classification.

16
6. Evalution6.2 Results (Cont.)
  • Feature frequency vs. presence
  • The definition of the MaxEnt feature/class
    functions Fi,c only reflects the presence or
    absence of a feature.
  • Better Performance (much better performance for
    SVMs) is achieved by accounting only for Feature
    Presence, not Feature Frequency.

17
6. Evalution6.2 Results (Cont.)
  • Bigrams
  • Bigram information does not improve performance
    beyond that of unigram presence.
  • Relying just on bigrams causes accuracy to
    decline by as much as 5.8 percentage points.

18
6. Evalution6.2 Results (Cont.)
  • Parts of speech
  • The accuracy improves slightly for Naive Bayes
    but declines for SVMs, and the performance of
    MaxEnt is unchanged.
  • The 2633 adjectives provide less useful
    information than unigram presence.
  • Simply using the 2633 most frequent unigrams is a
    better choice, yielding performance comparable to
    that of using (the presence of) all 16165.

19
6. Evalution6.2 Results (Cont.)
  • Position
  • We tagged each word according to whether it
    appeared in the first quarter, last quarter, or
    middle half of the document14.
  • The results didnt differ greatly from using
    unigrams alone.

20
7. Discussion
  • Naive Bayes tends to do the worst and SVMs tend
    to do the best.
  • Unigram Presence Information turned out to be the
    most effective.
  • The superiority of Presence Information in
    comparison to Frequency Information in our
    setting contradicts previous observations made in
    topic-classification work.
  • thwarted expectations narrative
  • Some form of discourse analysis is necessary
    (using more sophisticated techniques than our
    positional feature mentioned above), or at least
    some way of determining the focus of each
    sentence.
Write a Comment
User Comments (0)
About PowerShow.com