Title: Criticism Mining:
1Criticism Mining
- Exploring Book Movie and Music Reviews Using
Text Mining Techniques - Xiao Hu J. Stephen Downie and M. Cameron Jones
- Graduate School of Library and Information
Science - University of Ilinois at Urbana-Champaign
2Emerging Domain
- Many networked resources now provide critical
consumer-generated reviews of humanities
materials - Public and private
- blogs
- mailing lists
- wikis
- Online stores
- Review websites
3Emerging Opportunity
- Many of these reviews are quite detailed
covering not only the reviewers personal
opinions but also important background and
contextual information about the works under
discussion.
4Emerging Need
- Humanities scholars should be given the ability
to easily gather up and then analytically examine
these reviews to determine for example how
users are impacted and influenced by humanities
materials.
5Addressing the Need
- The authors have conducted a series of very
promising large-scale experiments that bring to
bear powerful text mining techniques to the
problem of criticism analysis.
6A Possible Solution to the Need
- Our experimental results concerning the
application of the Naïve Bayes text mining
technique to the criticism analysis domain
indicate that criticism mining is not only
feasible but also worthy of further exploration
and refinement.
7Experimental Goals
- Our principal experimental goal was to build and
then evaluate a prototype criticism mining system
that could automatically predict the - genre of the work being reviewed
- quality rating assigned to the reviewed item
- difference between book reviews and movie
reviews especially for items in the same genre - difference between fiction and non-fiction book
reviews
8Data Sets
- Source epinions.com
- Book reviews 1800
- Movie reviews 1650
- Music review 1800
- Each review contains a quality rating using 1-5
stars - Each review is associated with a genre
9(No Transcript)
10An example of a review from epinions.com
11Genre Experiments
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Quality Rating Experiments
- Three levels of granularity
- Fine 1 2 3 4 and 5 stars
- Medium 12 vs. 45 stars
- Large 1 vs. 5 stars
17(No Transcript)
18(No Transcript)
19Book vs. Movie Experiments
- Two levels of interest
- All genres
- Pairing by similar genres
20(No Transcript)
21Fiction vs. Non-Fiction Experiments
22(No Transcript)
23Conclusions (1)
- Consumer-generated reviews of humanities
materials represent a valuable research resource
for humanities scholars. - Our series of experiments on the automated
classification of reviews verify that important
information about the materials being reviewed
can be found using text mining techniques.
24Conclusions (2)
- All our experiments were highly successful in
terms of both classification accuracy and the
logical placement of confusion in the confusion
matrices. - Thus the development of criticism mining
techniques based upon the relatively simple Naïve
Bayes model has been shown to be simultaneously
viable and robust. - This finding promises to make the ever-growing
consumer-generated review resources useful to
humanities scholars.
25Future Work (1)
- A broadening of our understanding by exploring
the application of text mining techniques beyond
the Naïve Bayes model - decision trees
- neural nets
- support vector machines
- We will also work towards the development of a
system to automatically mine arbitrary bodies of
critical review text such as blogs mailing
lists and wikis.
26Future Work (2)
- We also hope to construct content and
ethnographic analyses to help answer the why
questions that pertain to the results. - Final comment
- In the end it is all about the why questions!