Ranking Users for Intelligent Message Addressing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Ranking Users for Intelligent Message Addressing

1
Ranking Users for Intelligent Message Addressing

Vitor R. Carvalho and William Cohen
Carnegie Mellon University
Glasgow, April 2nd 2008

2
Outline

Intelligent Message Addressing
Models
Data Experiments
Email Auto-completion
Mozilla Thunderbird Extension
Learning to Rank Results

3
(No Transcript)
4
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
William Cohen ltwcohen_at_cs.cmu.edugt
Add
Akiko Matsui ltakiko_at_cs.cmu.edugt
Add
Yifen Huang lthyfen_at_andrew.cmu.edugt
Add
5
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
William Cohen ltwcohen_at_cs.cmu.edugt
Add
Akiko Matsui ltakiko_at_cs.cmu.edugt
Add
Yifen Huang lthyfen_at_andrew.cmu.edugt
Add
6
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Akiko Matsui ltakiko_at_cs.cmu.edugt
Add
Yifen Huang lthyfen_at_andrew.cmu.edugt
Add
7
einat lteinat_at_cs.cmu.edugt Add
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
8
einat lteinat_at_cs.cmu.edugt Add
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
9
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
10
Tom Mitchell lttom_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Frank Lin ltfrank_at_cs.cmu.edugt
Add
11
Tom Mitchell lttom_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Frank Lin ltfrank_at_cs.cmu.edugt
Add
12
The Task Intelligent Message Addressing

Predicting likely recipients of email messages
given
(1) contents of message being composed
(2) other recipients already specified
(3) a few initial letters of the intended
recipient contact (intelligent auto-completion).

13
What for?

Identifying people related to specific topics (or
have specific relevant skills.)
Relation to Expert Finding
Email message ? (long) query
Email addresses ? experts
Improved Email Address Auto-completion
Prevent high-cost management errors
People just forget to add important recipients
preventing costly misunderstandings
communication delays
missed opportunities.

Dom et al, 03 Campbell et al,03
Particularly in large corporations
14
How Frequent are These Errors?

Grep for forgot, sorry or accident
in the Enron Email corpus - half a million real
email messages from a large corporation.
Sorry, I forgot to CC you his final offer
Oops, I forgot to send it to Vince.
Adding John to the discussion..(sorry John)
Sorry....missed your name on the cc list!.
More frequent than expected
at least 9.27 of the users forgot to add a
desired email recipient.
At least 20.52 of the users were not included as
recipients (even though they were intended
recipients) in at least one received message.
Lowerbound

15
Two Ranking Tasks
TOCCBCC Prediction
CCBCC Prediction
16
Models

Non-textual Models
Frequency only
Recency only
Expert Finding Models Balog et al, 2006
M1 Candidate Model
M2 Document Model
Rocchio (TFIDF)
K-Nearest Neighbors (KNN)
Rank Aggregation of the above

17
Non-Textual Models

Frequency model
Rank by total number of messages in training set
Recency Model

Exponential decay on chronologically ordered
messages.
18
Expert Search Models

M1 Candidate Model Balog et al, 2006
M2 Document Model Balog et al, 2006

f(doc,ca) is estimated as user centric (UC) or
document centric (DC)
19
Other Models

Rocchio (TFIDF) Joachims, 1997 Salton
Buckley, 1988
K-Nearest Neighbors Yang Liu, 1999

20
Model Parameters

Chosen from preliminary tests.
Recency b 100 10,20,50,100,200,500
KNN, K30 3,5,10,20,30,40,50,10
0
Rocchios b 0 -0,0.1,0.25,0.5

21
Data Enron Email Collection

Some good reasons
Large, half a million messages
Natural work-related email, not email lists
Public and free
Different roles managers, assistants, etc.
Unfortunates
No clear message thread information
No complete Address Book information
no first/last/full names of many recipients

22
Enron Data Preprocessing

Setup a realistic temporal setup (per user)
For each user, 10 (most recent) sent messages
will be used as test
36 users
All users had their Address Books (AB) extracted

CCBCC
TOCCBCC
23
Enron Data Preprocessing

Bag-of-words representation
Message were represented as the union of BOW of
body and BOW of subject
Removed inconsistencies and repeated messages
Disambiguated Several Enron addresses
Stop words removed, No stemming
Self-addressed messages were removed

24
Threading

No explicit thread information in Enron Try to
reconstruct.
Build Message Thread Set MTS(msg)
set of messages with same subject as the
current one.

25
Results
26
Results
27
Results
28
Rank Aggregation
Ranking combined by Reciprocal Rank
29
Rank Aggregation Results
30
Observations

Threading improves MAP for all models
KNN seems is best choice overall
document-model with focus on a few top docs
Data Fusion method for rank aggregation improved
performance significantly
Base systems making different types of mistakes

31
Intelligent Email Auto-completion
TOCCBCC
CCBCC
32
Intelligent Email Auto-completion
33
Mozilla Thunderbird extension (Cut Once)
Suggestions Click to add
34
Mozilla Thunderbird extension (Cut Once)

Interested?
Just google
mozilla extension carnegie mellon
User Study using Cut Once
Insteadwrite-then-address behavior

35
Related Work

Expert finding in Email
Dom et al.(SIGMOD-03), Campbell et al(CIKM-03)
Soboroff, Craswell, de Vries (TREC-Enterprise
2005-06-07) Expert finding task on the W3C
corpus
CC Prediction
Short paper with initial idea. one single user,
limited evaluation, not public data Pal
McCallum, 06

36
Can we do better ranking?

Learning to Rank machine learning to improve
ranking
Feature-based ranking function
Many recently proposed methods
RankSVM
ListNet
RankBoost
Perceptron Variations
Online, scalable.

Joachims, KDD-02
Cao et al., ICML-07
Freund et al, 2003
Elsas, Carvalho Carbonell, WSDM-08
37
Learning to Rank Recipients

Ranking scores as features
Textual Scores (KNN)
Network Scores
Frequency score
Recency score
Co-Occurrence Features

Combine textual scores with other network
features
Textual Feature (KNN scores)
Network Features
38
Learning to Rank Recipients Results
39
Conclusions

Problem Predicting recipients of email messages
Useful for email auto-completion, finding related
people, and management addressing errors
Evidence from Large email collection
2 subtasks TOCCBCC and CCBCC
Various models KNN best model in general
Rank Aggregation improved performance
Improvements in Email-auto completion
Thunderbird Extension (Cut Once)
Promising Results on learning to rank recipients

40
Thank you
41
Thank you
42
Comments
(Thanks, reviewers!)

No account for email structural info (body ?
subject ? quoted)
Identifying Name entities (Dear Mr. X, etc.)
Implicitly doing, but could be better
Enron did not provide many first/last names
Fair estimation of f(doc,ca) on email?
Might explain weaker performance of M2 models.

Write a Comment

User Comments (0)

About PowerShow.com

Ranking Users for Intelligent Message Addressing PowerPoint PPT Presentation