CoCQA: CoTraining Over Questions and Answers with an Application to Predicting Question Subjectivity

About This Presentation

Title:

CoCQA: CoTraining Over Questions and Answers with an Application to Predicting Question Subjectivity

Description:

CoCQA: Co-Training Over Questions and Answers. with an Application to Predicting ... Sentiment Analysis: (Pang and Lee, 2004) (Yu and Hatzivassiloglou, 2003) ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 32

Provided by: baol4

Learn more at: http://www.mathcs.emory.edu

Category:

more less

Transcript and Presenter's Notes

Title: CoCQA: CoTraining Over Questions and Answers with an Application to Predicting Question Subjectivity

1
CoCQA Co-Training Over Questions and
Answerswith an Application to Predicting
Question Subjectivity Orientation

Baoli Li, Yandong Liu, and Eugene Agichtein
Emory University

2
Community Question Answering

An effective way of seeking information from
other users
Can be searched for resolved questions

3
Community Question Answering (CQA)

Yahoo! Answers
Users
Asker post questions
Answerer post answers
Voter vote for existing answers
Questions
Subject
Detail
Answers
Answer text
Votes
Archive millions of questions and answers

4
Lifecycle of a Question in CQA
Choose a category
Compose the question
Open question
Examine
Answer
Answer
Answer
Close question Choose best answers Give ratings
Find the answer?
Yes
Question is closed by system. Best answer is
chosen by voters
No
5
Problem Statement

How can we exploit structure of CQA to improve
question classification?
Case Study Question Subjectivity Prediction
Subjective questions seek answers containing
private states such as personal opinion,
judgment, and experience
Objective questions are expected to be answered
with reliable or authoritative information

6
Example Questions

Subjective
Has anyone got one of those home blood pressure
monitors? and if so what make is it and do you
think they are worth getting?
Objective
What is the difference between chemotherapy and
radiation treatments?

7
Motivation

Guiding the CQA engine to process questions more
intelligently
Some Applications
Ranking/filtering answers
Improving question archive search
Evaluating answers provided by users
Inferring user intent

8
Challenges

Some challenges in online real question analysis
Typically complex and subjective
Can be ill-phrased and vague
Not enough annotated data

9
Key Observations

Can we utilize the inherent structure of the CQA
interactions, and use the unlimited amounts of
unlabeled data to improve classification
performance?

10
Natural Approach Co-Training

Introduced by
Combining labeled and unlabeled data with
co-training, Blum and Mitchell, 1998
Two views of the data
E.g. content and hyperlinks in web pages
Provide complementary information for each other
Iteratively construct additional labeled data
Can often significantly improve accuracy

11
Questions and Answers Two Views

Example
Q Has anyone got one of those home blood
pressure monitors? and if so what make is it and
do you think they are worth getting?
A My mom has one as she is diabetic so its
important for her to monitor it she finds it
useful.
Answers usually match/fit question
My mom she finds
Askers can usually identify matching answers by
selecting the best answer

12
CoCQA A Co-Training Framework over Questions and
Answers
Unlabeled Data ?????????? ??????????
Unlabeled Data ?????????? ??????????
Labeled Data
Labeled Data
CQ
Q
Q
CA
A
A
Classify
---- ----
Validation (Holdout training data)
Stop
13
Details of CoCQA implementation

Base classifier
LibSVM
Term Frequency as Term Weight
Also tried Binary, TFIDF
Select top K examples with highest confidence
Margin value in SVM

14
Feature Set

Character 3-grams
has, any, nyo, yon, one
Words
Has, anyone, got, mom, she, finds
Word with Character 3-grams
Word n-grams (nlt3, i.e. Wi, WiWi1,
WiWi1Wi2)
Has anyone got, anyone got one, she finds it
Word and POS n-gram (nlt3, i.e. Wi, WiWi1, Wi
POSi1, POSiWi1 , POSiPOSi1, etc.)
NP VBP, She PRP, VBP finds

15
Overview of Experimental Setup

Datasets
From Yahoo! Answers
Manually labeled data by Amazon Mechanical Turk
Metrics
Compare CQA to state-of-the semi-supervised method

16
Dataset

1,000 Labeled Questions from Yahoo! Answers
5 categories (Arts, Education, Science, Health
Sports)
200 questions from each category
10,000 Unlabeled Questions from Yahoo! Answers
2,000 questions from each category
Data available at
http//ir.mathcs.emory.edu/shared

17
Manual Labeling

Annotated using Amazons Mechanical Turk service
Each question was judged by 5 Mechanical Turk
workers
25 questions included in each HIT task
Worker needs to pass the qualification test
Majority vote to derive gold standard
Discarded small fraction (22 out of 1000) of
nonsensical questions such as Upward Soccer
Shorts? and 11?fdgdgdfg by manual inspection

18
Example HIT task
19
Subjectivity Statistics by Category
Objective
Subjective
20
Evaluation Metric

Macro-Averaged F-1
Prediction performance on both subjective
questions and objective questions is equally
important
F-1
Averaged over subjective and objective classes

21
Experimental Settings

5 fold cross validation
Methods Compared
Supervised LibSVM (Chang and Lin, 2001)
Generalized Expectation (GE) (Mann and McCallum,
2007)
CoCQA our method
Base classifier LibSVM
View 1 question text View 2 answer text

22
F1 for Supervised Learning
F1 with different sets of features
23
Semi Supervised Learning Adding unlabeled data
Comparison between Supervised, GE and CoCQA
24
CoCQA with varying K( new examples added in
each iteration)
25
CoCQA for varying iterations
26
CoCQA for varying amount of labeled data
27
Conclusions and Future Work

Problem Non-topical text classification in CQA
CoCQA a co-training framework that can exploit
information from both question and answers
Case study subjectivity classification for real
questions in CQA
We plan to explore
more sophisticated features
related variants of semi-supervised learning
other applications (Sentiment classification)

28
Thank you!Baoli Li csblli_at_gmail.comYandong
Liu yandong.liu_at_emory.eduEugene
Agichtein eugene_at_mathcs.emory.edu
29
Performance of Subjective vs. Objective classes