Title: Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques
1Experimental Study on Sentiment Classification of
Chinese Review using Machine Learning Techniques
- Jun Li and Maosong Sun
- Department of Computer Science and Technology
- Tsinghua University, Beijing, China
- IEEE NLP-KE 2007
2Outline
- Introduction
- Corpus
- Features
- Performance Comparison
- Analysis and Conclusion
3Introduction
- Why do we perform the task ?
- Much of the attention has centered on feature
based sentiment extraction - Sentence-level analysis is useful, but it
involves complex processing and usually format
dependent (liu et al www05) - Sentiment Classification using machine learning
techniques - based on the overall sentiment of a text
- Easily transfer to new domains with a training
set. - Applications
- Split reviews into the sets of positive and
negative - Monitor bloggers mood trend
- Filter subjective web pages
4Corpus
- From www.ctrip.com
- Average length 69.6 words with std 89.0
- 90 of the reviews are less than 155 words
- including some English words
5Review rating distribution score threadhold
- 4.5 and up are considered positive, 2.0 and below
are considered negative. - 12,000 reviews as training set, 4,000 reviews as
test set
6Features text representation
- Text representation schemes
- Word-Based Unigram (WBU), widely used
- Word-Based Bigram (WBB)
- Chinese Character-Based Bigram (CBB)
- Chinese Character-Based Trigram (CBT)
Unique feature Total feature Avg-Len/Std
WBU 21,074 830,826 69.3/88.9
WBB 251,289 818,832 68.3/88.9
CBB 128,049 1,053,860 88.0/112.8
CBT 340,501 918,841 76.8/99.8
Table 1. Statistics of training set with four
text representation schemes
7Features representation in a graph model
- Features representation (n2) in a graph
model.
D
f1
f2
fk-1
x1
x2
x3
xk
xk-1
8Features - weight
9Performance Comparison - methods
- Support Vector Machines (SVM)
- Naïve Bayes (NB)
- Maximum Entropy (ME)
- Artifical Neural Network (ANN)
- two layers feed-forward
- Baseline Naive Counting
- Predict by comparsion of number of sentiment
words. - Heaivly depends on the sentiment dictionary
- micro-averaging F1 0.7931, macro-averaging F1
0.7573.
10Performance Comparison - WBU
SVM, NB, ME, ANN using WBU as features with
different feature weights
11Performance Comparison - WBU
Four methods using WBU as features
12Performance Comparison - WBB
Four methods using WBB as features
13Performance Comparison CBB CBT
Four methods using CBB as features
Four methods using CBT as features
14Performance Comparison
15Analysis
- On the average, NB outperforms all the other
classifiers using WBB and CBT - N-gram based features relaxes conditional
independent assumption of Naive Bayes Model - capture real integral semantic content
- People like to use combination of words to
express positive and negative sentiment.
16Conclusion
- (1) On the average, NB outperforms all the
classifiers when using WBB, CBT as text
representation scheme with bool weighing under
different feature dimensionality reduced by
chi-max, and is more stable than others. - (2) Compared with WBU, WBB and CBB have more
strong meaning as semantic unit for classifiers. - (3) at most time, tfidf-c is much better for SVM
and ME. - (4) Considering SVM achieve the best performance
under all conditions and is the most popular
method. We recommend using WBB, CBB to represent
text with tfidf-c as feature weighting to obtain
a better performance relative to WBU.
17Thank you!
Dataset and software is avaiable at
http//nlp.csai.tsinghua.edu.cn/lj/