Orientation Mining Driven Approach to Analyze Web Public Sentiment - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Orientation Mining Driven Approach to Analyze Web Public Sentiment

Description:

To achieve the goal of text orientation analysis, two ways are proposed. ... VSM is first used in the SMART Information Retrieval System. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 29
Provided by: jsy2
Category:

less

Transcript and Presenter's Notes

Title: Orientation Mining Driven Approach to Analyze Web Public Sentiment


1
Orientation Mining Driven Approach to Analyze Web
Public Sentiment
2
Abstract
  • Internet provides an unique opportunity to
    express and spread public sentiment.
  • Since web public sentiment reflects people's
    attitude to society and politics, the public
    opinions orientation is significant to
    decision-makers.
  • The research utilizes VSM (vector space model) to
    present the text orientation of web information
    and offer data-mining approaches to analyze
    public opinions orientation.

3
Abstract
  • To achieve the goal of text orientation analysis,
    two ways are proposed.
  • a novel text orientation analysis method is
    described to analyze the orientation of original
    web postings and their replies.
  • an improved single-pass clustering algorithm is
    introduced to cluster the subject of web
    discussion and discover the hot topics.
  • The research constructs a prototype system, named
    WPSAS (web public sentiment analysis system), as
    experimental platform to validate the presented
    methodology.

4
Introduction
  • Public opinion may be partial or impartial, also
    may be of fairness or prejudice.
  • Once public sentiment becomes public opinion, it
    will do powerful impact to society.
  • The goal of web public sentiment analysis is to
    discover the orientation of public sentiment
    before it changes to negative public opinion or
    public emergency, which greatly influence
    realworld society.

5
Introduction
  • Web public sentiment activity can be divided as
    Internet news and Netizens discussion.
  • Bulletin Board System (BBS) becomes the most
    important approach for Netizen express their
    sentiment on Internet/network.
  • The research takes BBS as the data source to
    analyze web public sentiment.

6
Conference for web public sentiment analysis
  • Text Retrieval Conference (TREC), Special
  • Interest Group on Information Retrieval (SIGIR),
    Topic Detection and Tracking (TDT). The most
    well-known study on web public sentiment analysis
    system is TDT project 2
  • It has five basic study missions story
    segmentation, topic tracking, topic detection,
    first-story detection and link detection.

7
VSM
  • Vector space model (or term vector model) is an
    algebraic model for representing text documents
    (and any objects, in general) as vectors of
    identifiers, such as, for example, index terms.
  • Vector space model was firstly brought forward by
    Sahon G. in 1975.

8
VSM
  • VSM is used in information filtering, information
    retrieval, indexing and relevancy rankings.
  • VSM is first used in the SMART Information
    Retrieval System.

9
SVM for analyzing web public sentiment
  • However, to analyze web public sentiment, TF is
    too simple to calculate the weight of Internet
    opinion.
  • The weight of a term is determined by three
    factors
  • how often the term di occurs in the topic j (the
    term frequency tfji),
  • how often it occurs in the whole document
    collection
  • how strong it presents the emotion.

10
SVM for analyzing web public sentiment
  • the weight of a term di of topic Tj is
  • M is the number of collective topics, mi is the
    number of topics which contain di.
  • where eji ? (0, 1) is the parameter describing
    emotional strength, T is the number of original
    posting which contained in the topic Tj .

11
Flow of WPSAS
  • Initialization stage
  • Computation stage
  • Encapsulation stage
  • Analysis stage

12
Text Orientation Mining
13
Orientation vector matching
  • Five types according to the matching result
  • Fully support Orientation vector of replying
    postings is fully consistent to orientation
    vector of original posting
  • Partially support Orientation vector of replying
    postings is partially consistent to orientation
    vector of original posting, and there is no
    contrary weight on the same item, such as -1 to
    1
  • Partially object there is contrary weight on the
    same item between orientation vector of replying
    postings and orientation vector of original
    posting

14
Orientation vector matching
  • Fully object Every item weight of orientation
    vector of replying postings is opposite to
    orientation vector of original posting
  • Neutral All item weight of orientation vector of
    replying postings is 0.

15
Orientation Mining Algorithm
16
Orientation Mining Algorithm
17
  • Hot Topic Clustering

18
single-pass clustering
  • uses the first text as the basis to cluster
  • the rest texts are sequentially compared with
    their similarities
  • If the similarity reaches the specified
    threshold, the texts are clustered into a group,
    and their features are recalculated as the basis
    for matching with other texts
  • Or else, textual information is directly
    extracted as basis of a new kind of topic.

19
single-pass clustering
  • The computation complexity of single-pass
    clustering is O (nk), where k is the number of
    the class.
  • However, for web public sentiment analysis, there
    are two shortcomings
  • this method is greatly dependent on the order,
    that clustering on the same object by different
    order will get different result.
  • it is easy to be tendentious to some big category
    where clustering, because of the uneven
    distribution of categories.

20
  • New clustering algorithm based on multi-comparing

21
Data Flow of WPSAS
22
Chinese word segmentation
  • WPSAS adopts famous ICTCLAS (Institute of
    Computing Technology, Chinese Lexical Analysis
    System) to help us achieve goal of Chinese word
    segmentation, which is developed by Institute of
    Computing Technology of Chinese Academy of
    Sciences.
  • main features of ICTCLAS include Chinese word
    segmentation, part-of-speech tagging, named
    entity recognition, new words identification and
    user dictionary.
  • The latest version of ICTCLAS is ICTCLAS3.0

23
Experimental Data
  • Two Web forums are selected for the public
    sentiment analysis
  • Tencent forum (??) (http//bbs.news.qq.com/b-10000
    83746)
  • Tianya forum (??) (http//www.tianya.cn).

24
Emotional words statistic-based vs VSM-based
orientation
  • Recall of supporting postings (Sup_recall),
  • accuracy of supporting postings (Sup_accuracy),
  • recall of negative postings (Neg_recall)
  • accuracy of negative postings (Neg_accuracy)
  • overall accuracy

25
WPSAS vs machinelearning-based
26
Time cost
27
Conclusion
  • The research presents data mining approaches to
    discover network public opinion.

28
Conclusion
  • The contributions
  • a novel text orientation analysis method
  • to analyze the orientation of original web
    postings and their replies
  • an improved single-pass clustering algorithm
  • to cluster the subject of web discussion and
    discover hot topics
  • a web public sentiment analysis system, named
    WPSAS,
  • to analyze public opinion of web forum and
    validate the presented methodology
Write a Comment
User Comments (0)
About PowerShow.com