Recognising Emotional and Evaluative Language in Text - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Recognising Emotional and Evaluative Language in Text

Description:

Collated a corpus of movie reviews from an iMDB archive ... Accuracy of Emoticon-trained classifiers across the domains of news articles and movie reviews ... – PowerPoint PPT presentation

Number of Views:1760
Avg rating:3.0/5.0
Slides: 60
Provided by: jonath47
Category:

less

Transcript and Presenter's Notes

Title: Recognising Emotional and Evaluative Language in Text


1
Recognising Emotional and Evaluative Language in
Text
  • Jonathon Read
  • j.l.read_at_sussex.ac.ukhttp//www.sussex.ac.uk/User
    s/jlr24/

2
Jon Read
  • Brighton, England
  • DPhil Student at the University of Sussex,
    supervised by Dr John Carroll
  • Research Interests
  • Evaluative and Emotional language
  • Sentiment Analysis

3
Presentation Outline
  • Recognising emotion in text
  • Examples of dependencies in machine learning
    techniques for sentiment classification
  • Using diverse training data for sentiment
    classification
  • Future work and directions for DPhil thesis

4
Recognising Emotion in Text
  • Masters Project, Summer 2004
  • How can we computationally recognise the
    emotional (affective) states of authors from
    their text?

5
Test Data Acquisition
  • Recently researchers have compiled collections of
    blog posts, labelled by the authors (Mishne 2005)
  • Corpora annotated or labelled with emotion was
    uncommon at the time
  • Built an original collection
  • Fifty-Word Fiction
  • 155 texts, 756 sentences, 7750 words

6
A Two Factor Structure of Affect
arousedastonishedsurprised
active enthusiastic excited
happypleasedkindly
distressed hostile nervous
calmplacidrelaxed
sadlonelygrouchy
drowsy dull sleepy
quiescentquietstill
Watson and Tellegen (1985)
7
Affect Annotation Experiment
1 month, 49 coders, 3,301 annotations
8
Affect Annotation Experiment
  • Expert coder
  • Expert choice most frequently chosen class
  • Human coders assessed
  • Kappa Coefficient of Agreement (Carletta 1996)
  • Human coders annotations were ignored if K

9
Sentiment Annotations
10
Affect Annotations
11
Annotations Usefulness
  • Number of classes (s)
  • Number of annotations made (n)
  • Number of annotations made to the most annotated
    class (a)

12
Sentiment Annotations
13
Affect Annotations
14
Affect Annotation Experiment
  • Supervised model impractical
  • Base a semi-supervised modelon SO-PMI-IR (Turney
    2002)
  • Output represents a location on an axis
  • Paradigm words provide seeds

15
SO-PMI-IR
  • Semantic Orientation using Pointwise-Mutual
    Information and Information Retrieval
  • Turney 2002
  • Accuracy of 74.4 in recognising the sentiment
    (positive or negative) of product reviews in a
    variety of domains

16
AO-PMI-IR
  • Evaluate SO-PMI-IR for each dimension in the
    Two-Factor Structure of Affect

17
AO-PMI-IR
18
PMI-IR and Very-Large Corpora
  • Turney (2002) used the World-Wide-Web as to
    obtain frequency counts, via AltaVista
  • More recently AltaVista no longer provide a NEAR
    operator in their search engine
  • Waterloo MultiText System ( 1 terabyte)

19
Paradigm Word Selection
  • Mood words from theTwo-Factor Structure of
    Affect
  • Obviously ambiguous words dropped
  • (active, content, dull, still, strong)
  • Remaining words used as starting points to derive
    a list of synonyms using WordNet

20
Experiment Baselines
  • Baseline 1
  • Prior knowledge of distribution choosing the
    most frequently occurring type, which is
    unclassifiable
  • Baseline 2
  • Choosing a class at random

21
SO-PMI-IR Results
22
AO-PMI-IR Results
23
Misclassifications
  • Distribution of misclassifications inspected
  • SO-PMI-IR fairly uniform
  • AO-PMI-IR algorithm biased towards Low Positive
    Affect
  • This class describes a lack of affect (e.g. being
    asleep)
  • Few mismatches against opposite poles of the same
    axis

24
Accuracy vs. Annotator Agreement
25
AO-PMI-IR Summary
  • An algorithm for the recognition of affect
    (emotion) in text, using point-wise mutual
    information
  • Limited success, but outperforms a naïve baseline
  • Can perhaps inform a more thorough approach to
    recognising affect

26
Sentiment Classification
  • Determining an authors general feeling toward
    their subject that is, is a unit of text
    generally positive, or generally negative?
  • Filtering flames (Spertus 1997)
  • Recommender systems (Pang et al. 2002)
  • Analysis of market trends (Dave et al. 2002)

27
Supervised Approach
  • Pang et al. 2002
  • Collated a corpus of movie reviews from an iMDB
    archive
  • Naïve Bayes, Maximum Entropy and Support Vector
    Machine classifiers
  • Trained using unigram and bigram features
  • Best result from an SVM at around 83
  • Pang and Lee 2004
  • Disregarding objective sentences (Wiebe et al.
    2004)
  • Improves to around 87

28
Supervised Approach
  • Engström 2004
  • A bag-of-words approach is topic dependent
  • Turney 2002
  • Movie reviewunpredictable plot ? positive
  • Automobile reviewunpredictable steering ?
    negative

29
Dependencies inSentiment Classification
  • Experimental Set-up
  • Classification Tools
  • Naïve Bayes
  • SVMlight (Joachim 1999)
  • Feature Selection
  • Unigram presence (Pang et al. 2002)
  • Evaluation
  • 3-fold cross validation
  • Significance determined using paired-sample
    t-test
  • Each experiment involves training on one subset
    and testing on the others

30
Dependencies inSentiment Classification
  • Cross training/testing by topic
  • Datasets from business news (Newswire)
  • Finance (FIN)
  • Mergers and Acquisitions (MA)
  • Mixed (MIX)

31
Dependencies in Sentiment Classification
32
Dependencies inSentiment Classification
  • Cross training/testing by domain
  • Datasets
  • Business news (Newswire)
  • Movie Reviews (Polarity 1.0) (Pang et al. 2002)

33
Dependencies inSentiment Classification
34
Dependencies inSentiment Classification
  • Cross training/testing by time-period
  • Datasets
  • Movie Reviews before 2002 (Polarity 1.0)
  • Movie Reviews after 2002 (Polarity 2004)
  • Available for download at http//www.sussex.ac.uk/
    Users/jlr24/data

35
Dependencies inSentiment Classification
36
Dependencies inSentiment Classification
  • The performance of machine-learning techniques
    for sentiment classification is dependent on a
    good match between the training and test data,
    with respect to-
  • Topic
  • Domain and
  • Time-period

37
Using Emoticons for Sentiment Classification
  • Dependency can perhaps be solved by acquiring a
    large and diverse collection of general text
    annotated for sentiment
  • Emoticons can perhaps be assumed to mark-up text
    according to its sentiment, if we assume
  • -) is positive
  • -( is negative

38
Using Emoticons for Sentiment Classification
  • Usenet articles downloaded if they contained one
    of these listed emoticons

39
Using Emoticons forSentiment Classification
  • Extracted a paragraph from an article if it
    contained a smile or a frown, and was English
    text
  • 26,000 article extracts
  • 50/50 split between positive and negative
  • 748,685 words
  • Available for download at http//www.sussex.ac.uk/
    Users/jlr24/data

40
Optimisation on Emoticons
  • Emoticon corpus optimised for sentiment
    classification task
  • 4,000 articles held-out
  • Increasing articles in training set from 2,000 to
    22,000 in increments of 500
  • Increasing context from 10 to 1,000 tokens in
    increments of 10
  • Window around an emoticon
  • Before an emoticon

41
Optimisation on Emoticons
  • Optimal parameters
  • Naïve Bayes
  • Training 22,000 articles
  • Context 130 tokens in a window
  • SVM
  • Training 20,000 articles
  • Context 150 tokens in a window

42
Initial Results
  • Predicting sentiment of article extracts(10-fold
    cross-validation)
  • Naïve Bayes 61.5
  • SVM 70.1
  • Predicting sentiment of movie reviews
  • Naïve Bayes 59.1
  • SVM 52.1

43
Optimisation on Reviews
  • Optimisation repeated using held-out movie
    reviews from Polarity 1.0
  • Naïve Bayes
  • Training 21,000 articles
  • Context 50 tokens in a window
  • SVM
  • Training 20,000 articles
  • Context 510 tokens before

44
Experiments and Results
  • Accuracy of Emoticon-trained classifiers across
    business news topics

45
Experiments and Results
  • Accuracy of Emoticon-trained classifiers across
    the domains of news articles and movie reviews

46
Experiments and Results
  • Accuracy of Emoticon-trained classifiers across
    movie reviews from different time periods

47
Performance Summary
  • Good at predicting Usenet article extracts
  • Okay at predicting movie reviews
  • Bad at predicting newswire articles
  • Performance reasonably consistent over time
    periods

48
Coverage ofEmoticons Classifier
  • Coverage of unique token types is low
  • More training texts may improve coverage
  • Other sources
  • Online bulletin boards
  • Chat forums
  • Web logs
  • Google Groups
  • Usenet

49
Noise in EmoticonsTraining Data
Optimising the SVM Classifier against Movie
Reviews
50
Noise in EmoticonsTraining Data
  • Mixed SentimentSorry about venting my
    frustration here but I just lost it. -( Happy
    thanks giving everybody -)
  • SarcasmThank you so much, thats really
    encouraging -(
  • Spelling mistakesThe movies where for me a
    major desapointment -(

51
Future work and directions
  • Collect more examples of text marked-up with
    emoticons
  • Experiment with techniques to automatically
    remove noisy examples from the data
  • Investigate the nature of dependency in sentiment
    classification

52
Future work and directions
  • What is the nature of these dependencies?
  • It seems classifiers may be learning authors
    sentiment toward concepts, rather than the
    language associated with communication emotion
    and evaluation
  • Classifiers are not learning authors sentiment
    toward named entities
  • Perhaps classifiers learn the words associated
    with the sentiment of named entities the
    ice-axe effect?

53
Future work and directions
  • Refined dependency experiments
  • Tag movie reviews for precise year and perform
    cross training/testing biased on year
  • Remove named entities from training data
  • Remove all-but-one review from each author
  • Remove all-but-one review of a given movie
  • If accuracy is reduced this can be taken as
    evidence of dependencies

54
Future work and directions
  • Feature Engineering for Machine Learning
  • OddsRatio can be employed to mark features with
    temporal senses Liebscher and Belew (2005)
  • Richly-engineered features based on linguistic
    theory
  • Automatic feature induction, maximising
    performance whilst minimising dependency

SHWARZENEGGER
SPORTS MOVIES POLITICS
TIME
55
Future work and directions
  • Improving the automatic acquisition of sentiment
    lexicons
  • SO-PMI-IR
  • An independent measure, but performs with varying
    success in different domains (Turney 2002)
  • Identify the topic of a problem text, and
    supplement the paradigm words with these
    keywords?
  • Distributional Similarity
  • Distributional similarity (see a survey by Weeds
    (2003)) has been shown to be a reasonable
    approximation of semantic similarity (Curran and
    Moens 2002)

56
Future work and directions
  • Finding a metric
  • SO-PMI-IR and Distributional Similarity do not
    describe metric spaces
  • A true metric may increase performance

negative neutral positive
worst ? worse ? okay ? better ? best
57
Appraisal Theory
  • An approach to exploring, describing and
    explaining the way language is used to evaluate,
    to adopt stances, to construct textual personas
    and to manage interpersonal positionings and
    relationships.
  • http//www.grammatics.com/appraisal/
  • J. R. Martin and P.R.R. White. 2005. Language
    of Evaluation Appraisal in English.

58
Appraisal Theory
59
Thank-you!
  • Email j.l.read_at_sussex.ac.uk
  • Homepage www.sussex.ac.uk/Users/jlr24
Write a Comment
User Comments (0)
About PowerShow.com