QuestionAnswering on YahooAnswers: Preliminary Results - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

QuestionAnswering on YahooAnswers: Preliminary Results

Description:

Question-Answering on Yahoo!Answers: Preliminary Results. Rong Tang. Sheila Denn ... Stratified by 25 top-level categories assigned by Yahoo!Answers. Data coding ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 23
Provided by: Sheil76
Category:

less

Transcript and Presenter's Notes

Title: QuestionAnswering on YahooAnswers: Preliminary Results


1
Question-Answering on Yahoo!Answers Preliminary
Results
  • Rong Tang
  • Sheila Denn
  • OCLC/ALISE LIS Research Grant Presentation
  • ALISE 2009
  • January 23, 2009

2
Background
  • Yahoo!Answers
  • Social QA
  • 25 pre-defined categories
  • Users post questions, answer questions, rate
    answers, provide comments
  • One best answer chosen by the asker or through
    vote
  • Users may provide comments

3
(No Transcript)
4
(No Transcript)
5
Rating/Voting/Commenting
6
Our Research Project
  • Funded by OCLC/ALISE Grant Program and Simmons
    College Presidents Fund for Research
  • Project Staff
  • Rong Tang (PI)
  • Sheila Denn (Co-PI)
  • Sam Kalat (technology consultant, programmer)
  • Laura Saunders (Research Assistant)
  • The project wiki page documents the relevant
    literature and project progression, with
    extensive meeting notes on coding decisions

7
Research Questions
  • Are existing question taxonomies (such as those
    in Graesser et al. (1994) and Freed (1994)) valid
    in a social QA environment?
  • What are the relationships between the linguistic
    characteristics, functional properties, and
    subject content of the questions and the kinds of
    responses that they receive?
  • What are the characteristics of answers that are
    chosen as best answers?
  • What is the role of the social function vs. the
    information function in social QA?
  • What are the implications of the above for
    provision of library and information services?

8
Previous Research
  • Question classification
  • Wh- questions (Robinson Rackstraw, 1972)
  • Conceptual question categories (Lehnert, 1978)
  • Content-based question categories (Graesser, et
    al., 1994)
  • Reference question classification (Pomerantz,
    2005)
  • Questions in Dynamic Semantics (Aloni, Butler,
    Dekker, 2007)
  • Answer classification
  • Much less research here than with question
    classification
  • Answer selection rules (Lehnert, 1978)
  • Criteria based on Yahoo!Answers comments (Kim et
    al., 2007)

9
Previous Research (cont.)
  • Formal studies of Online QA
  • Answerers specialists vs. synthesists
    (Gazan, 2006)
  • Questioners seekers vs. sloths (Gazan, 2007)
  • Question purpose (Graesser, et al., 1994)
  • Filling knowledge gaps
  • Establishing and monitoring common ground
  • Coordinating social action
  • Directing the conversation and controlling
    attention

10
Research Plan
  • Data collection and sampling
  • Gathered a stratified random sample of 3,000
    question-answer sets, including any comments
  • Stratified by 25 top-level categories assigned by
    Yahoo!Answers
  • Data coding
  • Content analysis at multiple levels
  • Syntactic
  • Semantic
  • Pragmatic

11
Research Plan (cont.)
  • Data Analysis
  • Descriptive statistics will be produced for
  • Frequency of answers provided per question
  • Average length of time to first answer
  • Distribution of subject categories
  • Distribution of question and answer types
  • Distribution of chosen answer types
  • Correlation analysis will be performed for
  • Linguistic characteristics of questions and
    answers
  • Functional categories of questions and answers
  • Subject categories of questions and answers

12
Progress to Date
  • Sample has been collected
  • Preliminary coding has begun
  • Syntactic coding of questions is complete
  • Wh- questions
  • Inversion questions
  • Other questions
  • Multiparts
  • Double coding
  • Syntactic coding of question descriptions is
    complete
  • Number of questions included in description text
  • Type of questions

13
Data Coding
  • Two coders perform coding individually then go
    over the coding to reach consensus on final
    coding of each question
  • Use of informal language presents a challenge for
    coding
  • Is it a question if it doesnt include a question
    mark? Is it a question simply because it has a
    question mark in the end?
  • Should WTF be coded a what question or other
    question? Or not at all?
  • Coding multiparts of a question, eg., Why do
    husbands feel they have to lie to other women
    about being married, and when the other woman
    finds out?
  • Double coding questions such as "Is there
    anywhere you can listen to citizen band radio
    online?"

14
Preliminary Results
15
Number of Answers Per Question
16
Length to Receive 1st Answer
17
Wh-question frequency
  • What Questions

18
Wh-question frequency
  • Why Questions

19
Wh-question frequency
  • How Questions

20
Wh-question frequency
  • Inversion Questions

21
Next Steps
  • Start semantic and pragmatic analysis of
    questions
  • Start answer analysis
  • Start comment coding
  • Explore the association and features of Q and A
    and C
  • Develop a conceptual and analytical model for
    social QA

22
Questions?
Write a Comment
User Comments (0)
About PowerShow.com