An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Arguments

1 / 29
About This Presentation
Title:

An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Arguments

Description:

w3c-news_at_w3c.org. w3c-rdfcore-wg_at_w3c.org. lists-000-9978864. lists-001-0094883. lists-003-9630221 ... AP. 0.73. 0.45. 0.56. 0.00. 0.13. 1.00. 0.24. 0.47. 0.53 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 30
Provided by: rba

less

Transcript and Presenter's Notes

Title: An Exploratory Study of the W3C Mailing List Test Collection for Retrieval of Emails with Pro/Con Arguments


1
An Exploratory Study of the W3C Mailing List
Test Collection for Retrieval of Emails with
Pro/Con Arguments
  • Yejun Wu Douglas W. Oard
  • University of Maryland, College Park
  • Ian Soboroff
  • National Institute of Standards and
    Technology

July 27-28, 2006
CEAS, Mountain View, CA
2
Outline
2
  • Build the test collection
  • Evaluate the test collection (intrinsic
    evaluation)
  • Use the test collection (extrinsic evaluation)
  • Next steps to improve the test collection

3
W3C Mailing List Corpus
3
w3c.org
NIST (6/2004)
html-tidy_at_w3c.org semantic-web_at_w3c.org w3c-news_at_w3
c.org w3c-rdfcore-wg_at_w3c.org
lists-000-9978864 lists-001-0094883 lists-003-96
30221
Webpages
Unique DocIDs
Parsing
lists-000-9978864 lists-001-0094883 lists-003-96
30221
174,311 emails 515MB
4
IR Test Collection Design
4
Query Formulation
Information seeking Documents Information
needs Interactive process
Automatic Search
Docs
Interactive Selection
Measure system 2 variations system, user
5
IR Test Collection Design
5
Topic Statement (by Assessors)
Freeze user.
Query Formulation
Test Collection Documents Topic statements
Relevance judgments Metric
Automatic Search
Docs
Ranked Lists
Relevance Judgments (by Assessors)
Evaluation
Evaluation Metric (Mean Average Precision)
6
DOCNO"lists-000-9978864 RECEIVED"Sat Mar 18
085628 2000" ISORECEIVED"20000318135628" SENT"
Fri, 10 Mar 2000 132629 -0500
(EST)" ISOSENT"20000310182629" NAME"Kerri
Golden" EMAIL"KGolden_at_Hynet.com" SUBJECT"RTF
Word 2000 spec?" ID"C14D28BA032AD3118BE000104B87D
DEC18A39A_at_solomon.hynet.com" EXPIRES"-1 TOhtml
-tidy_at_w3.org We are trying to convert Word
2000 docs to XML. Our converter worked fine for
W97 documents, but W2000 has a much different RTF
format (tables especially). Does anyone know
where I can get a hold of a spec for this version
of RTF? thanks Kerri Golden kgolden_at_hynet.com





6
7
Topic Statement
7
  • TopicID DS8
  • Query html vs. xhtml
  • Narrative A relevant message will compare the
    advantages/disadvantages of the two standards.

8
8
Pool Top 50 Docs/Run for Relevance Judgments

Team1 Run1
Team2 Run3
Team 12 Run2
lists-000-9978864 lists-000-7643767 lists-011-6087
388 lists-012-1019722 lists-008-2365001
lists-009-8065221 lists-006-2570023 lists-000-9978
864 lists-012-2365001 lists-005-5500248
lists-000-7643767 lists-012-2365001 lists-004-0205
442 lists-003-6603021 lists-009-8065221
...
1 2 3 4 50
Average 529 emails/topic
Researchers as assessors
Relevance Judgments
lists-000-9978864 Topic ?, Pro/Con
? lists-000-7643767 Topic ?, Pro/Con
? Lists-008-2365001 Topic ?, Pro/Con ?
12 teams3 runs 36 runs
9
Use of Test Collection
9
Measure systems of ranked retrieval
Prec. 1.00 1.00 0.60 0.57 0.50
Rel? ? ? ? ? ?
-------------------------------------------- Avg.
Prec. (AP) 0.73
Difference is not significant
(two-tailed, plt0.05)
10
Emerging Topic Types
10
  • Type/Category Method, tip, solution
  • Example1
  • Query Annotea installation
  • Narrative A relevant message will provide at
    least a tip on Annotea installation.
  • Example2
  • Query file upload http
  • Narrative A relevant message will discuss
    methods of doing file uploads using http.

11
Topic Type Analysis
11
Find categories amenable to pro/con classification
12
Measuring Agreement
12
lists-000-9874732 lists-001-0683001 lists-003-0000
221 lists-004-8436200 lists-002-8833514
lists-000-9874732 lists-001-0683001 lists-003-0000
221 lists-004-8436200 lists-002-8833514
? ? ?
? ? ?
Judge1 Judge1 Judge1 Judge1
Judge2 R NR
Judge2 R a b
Judge2 NR c d
Chance corrected overlap
Cohens Kappa
ab
cd
ac bd abcdN
a
Overlap
Kappa
b
a
c
Perfect
Non
Perfect
Chance
Inverse
1
0
1
-1
0
13
Assessor Agreement by Category
13
Overlap
Kappa
Correlation b/t Overlap and kappa gt0.9,
significant at plt0.01
14
Effect of Disagreement on Ranking
14
Primary Judge
Secondary Judge
W3C Tau
Topical relevance 0.763 (Significant)
Pro/con relevance 0.776 (significant)
Typical text retrieval Identical if gt 0.9
1 2 3 4 5
1 2 3 4 5
Important difference in relevance judgment
Kendalls Tau 1-
1- 3/5 0.4
15
15
Outline
  • Intrinsic evaluation
  • -- topic type analysis
  • -- inter-assessor agreement analysis
  • Extrinsic evaluation
  • --Use W3C to evaluate a topic pro/con
    system

16
Experiment Design Round Robin
16
?
Pro/ Con
Non- Pro/ Con
Pro/Con feature
Topic
48 Training Topics


48-fold Cross- Validation
Pro/ Con
Non- Pro/ Con
Pro/Con feature
Topic
?
Top N terms (N100)
INQUERY Query
1 Evaluation Topic
Search
Ranked List
Query relevance set (Relevance Judgments)
Evaluation
MAP
17
Compare Two Systems
Topic Retrieval (Baseline) Query 100 topic terms Browser technology support incompatibility MAP 0.2743
Topic Pro/Con Retrieval (Rocchio) Query 30 topic terms Browser technology support incompatibility 70 pro/con terms advantage, strength, weakness MAP 0.2857
4.3 relative improvement. Sig. (plt0.05, Wilcoxon signed-rank test)
18
All Topics
19
Topic Type A
20
Topic Type B
21
Topic Type C
22
Topic Type D
23
Topic Type E
24
Effects of Topic and Topic Types
24
Overlap
Kappa
  • Two-way ANOVA
  • Topic difficulty levels 27 improved, 16 hurt, 7
    unused
  • Topic types A, B, C, D, E, F

Topic Topic Type
Pro/Con Relevance Agreement Sig. (plt0.05)
Topical Relevance Agreement
Topic Topic Type
Pro/Con Relevance Agreement Sig. (plt0.05)
Topical Relevance Agreement
25
Conclusion Test Collection Evaluation
25
  • Test collection generally useful
  • Important differences in judgments
  • Relevance judgments could be improved
  • Topic type factor of agreement of pro/con
    relevance
  • Categories less of a pro/con nature
  • -- B (method, tip, solution) not lead to
    pro/con
  • -- C (discuss an issue) vague
  • Rocchio style system 4.2 improvement in MAP
  • Major improvements in A and E
  • Pro/con relevance judgments useful.

26
Future Work Better Test Collection Design
26
  • Balance topic types
  • -- half in A.
  • -- F (reason, design rationale) 1 topic.
  • Study information needs and search process
  • Improve the process
  • --e.g., better defining topics for pro/con
  • Use within-category topics for training
  • -- examine the quality of training data by
    category
  • Other classification methods SVM, Naïve Bayes
  • Separate models for detecting pros and cons.
  • THANKS!

27
Pro/Con Feature Selection

Topic1
Topic2
Topic48
20
15
8
Pro/Con docs

18
30
5
Non Pro/Con
log(151)

log(201)
Topic Weight
log(51)
advantage
TF381
TF401
TF301
Pro/Con docs

TF101
TF101
Non Pro/Con
TF281
39/20 log21 log-------
11/30
31/8 log6 log--------
29/5
41/15 log16log--------
11/18




strength Microsoft Html
opinion
advantage strength weakness hate opinion wow
1 2 3 4 5 100
28
Feature Selection
28
  • Pro/con feature vector term weight

log odds ratio
Pos Pro/Con relevant documents Neg Non Pro/Con
relevant documents
29
Rocchio-style Implementation
29
  • Appropriate for topic and pro/con retrieval.
  • Baseline classifier to test the utility of test
    collection
  • Expanded query
  • Q0 initial query Q1 expanded query.
  • Ri vectors from positive docs
  • Si vectors from negative docs
  • ?, ? parameters
Write a Comment
User Comments (0)