Ranking Users for Intelligent Message Addressing - PowerPoint PPT Presentation

Loading...

PPT – Ranking Users for Intelligent Message Addressing PowerPoint presentation | free to download - id: 4100b-NGYzY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Ranking Users for Intelligent Message Addressing

Description:

Relation to Expert Finding. Email message (long) query. Email addresses experts ... Instead...write-then-address behavior. Related Work. Expert finding in Email ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 42
Provided by: vitorrocha
Learn more at: http://www.cs.cmu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Ranking Users for Intelligent Message Addressing


1
Ranking Users for Intelligent Message Addressing
  • Vitor R. Carvalho and William Cohen
  • Carnegie Mellon University
  • Glasgow, April 2nd 2008

2
Outline
  • Intelligent Message Addressing
  • Models
  • Data Experiments
  • Email Auto-completion
  • Mozilla Thunderbird Extension
  • Learning to Rank Results

3
(No Transcript)
4
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
William Cohen ltwcohen_at_cs.cmu.edugt
Add
Akiko Matsui ltakiko_at_cs.cmu.edugt
Add
Yifen Huang lthyfen_at_andrew.cmu.edugt
Add
5
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
William Cohen ltwcohen_at_cs.cmu.edugt
Add
Akiko Matsui ltakiko_at_cs.cmu.edugt
Add
Yifen Huang lthyfen_at_andrew.cmu.edugt
Add
6
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Akiko Matsui ltakiko_at_cs.cmu.edugt
Add
Yifen Huang lthyfen_at_andrew.cmu.edugt
Add
7
einat lteinat_at_cs.cmu.edugt Add
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
8
einat lteinat_at_cs.cmu.edugt Add
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
9
Ramesh Nallapati ltramesh_at_cs.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
10
Tom Mitchell lttom_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Frank Lin ltfrank_at_cs.cmu.edugt
Add
11
Tom Mitchell lttom_at_cs.cmu.edugt
Add
Andrew Arnold ltaard_at_andrew.cmu.edugt
Add
Jon Elsas ltjelsas_at_cs.cmu.edugt
Add
Frank Lin ltfrank_at_cs.cmu.edugt
Add
12
The Task Intelligent Message Addressing
  • Predicting likely recipients of email messages
    given
  • (1) contents of message being composed
  • (2) other recipients already specified
  • (3) a few initial letters of the intended
    recipient contact (intelligent auto-completion).

13
What for?
  • Identifying people related to specific topics (or
    have specific relevant skills.)
  • Relation to Expert Finding
  • Email message ? (long) query
  • Email addresses ? experts
  • Improved Email Address Auto-completion
  • Prevent high-cost management errors
  • People just forget to add important recipients
  • preventing costly misunderstandings
  • communication delays
  • missed opportunities.

Dom et al, 03 Campbell et al,03
Particularly in large corporations
14
How Frequent are These Errors?
  • Grep for forgot, sorry or accident
  • in the Enron Email corpus - half a million real
    email messages from a large corporation.
  • Sorry, I forgot to CC you his final offer
  • Oops, I forgot to send it to Vince.
  • Adding John to the discussion…..(sorry John)
  • Sorry....missed your name on the cc list!.
  • More frequent than expected
  • at least 9.27 of the users forgot to add a
    desired email recipient.
  • At least 20.52 of the users were not included as
    recipients (even though they were intended
    recipients) in at least one received message.
  • Lowerbound

15
Two Ranking Tasks
TOCCBCC Prediction
CCBCC Prediction
16
Models
  • Non-textual Models
  • Frequency only
  • Recency only
  • Expert Finding Models Balog et al, 2006
  • M1 Candidate Model
  • M2 Document Model
  • Rocchio (TFIDF)
  • K-Nearest Neighbors (KNN)
  • Rank Aggregation of the above

17
Non-Textual Models
  • Frequency model
  • Rank by total number of messages in training set
  • Recency Model

Exponential decay on chronologically ordered
messages.
18
Expert Search Models
  • M1 Candidate Model Balog et al, 2006
  • M2 Document Model Balog et al, 2006

f(doc,ca) is estimated as user centric (UC) or
document centric (DC)
19
Other Models
  • Rocchio (TFIDF) Joachims, 1997 Salton
    Buckley, 1988
  • K-Nearest Neighbors Yang Liu, 1999

20
Model Parameters
  • Chosen from preliminary tests.
  • Recency b 100 10,20,50,100,200,500
  • KNN, K30 3,5,10,20,30,40,50,10
    0
  • Rocchios b 0 -0,0.1,0.25,0.5

21
Data Enron Email Collection
  • Some good reasons
  • Large, half a million messages
  • Natural work-related email, not email lists
  • Public and free
  • Different roles managers, assistants, etc.
  • Unfortunates
  • No clear message thread information
  • No complete Address Book information
  • no first/last/full names of many recipients

22
Enron Data Preprocessing
  • Setup a realistic temporal setup (per user)
  • For each user, 10 (most recent) sent messages
    will be used as test
  • 36 users
  • All users had their Address Books (AB) extracted

CCBCC
TOCCBCC
23
Enron Data Preprocessing
  • Bag-of-words representation
  • Message were represented as the union of BOW of
    body and BOW of subject
  • Removed inconsistencies and repeated messages
  • Disambiguated Several Enron addresses
  • Stop words removed, No stemming
  • Self-addressed messages were removed

24
Threading
  • No explicit thread information in Enron Try to
    reconstruct.
  • Build Message Thread Set MTS(msg)
  • set of messages with same subject as the
    current one.

25
Results
26
Results
27
Results
28
Rank Aggregation
Ranking combined by Reciprocal Rank
29
Rank Aggregation Results
30
Observations
  • Threading improves MAP for all models
  • KNN seems is best choice overall
  • document-model with focus on a few top docs
  • Data Fusion method for rank aggregation improved
    performance significantly
  • Base systems making different types of mistakes

31
Intelligent Email Auto-completion
TOCCBCC
CCBCC
32
Intelligent Email Auto-completion
33
Mozilla Thunderbird extension (Cut Once)
Suggestions Click to add
34
Mozilla Thunderbird extension (Cut Once)
  • Interested?
  • Just google
  • mozilla extension carnegie mellon
  • User Study using Cut Once
  • Instead…write-then-address behavior

35
Related Work
  • Expert finding in Email
  • Dom et al.(SIGMOD-03), Campbell et al(CIKM-03)
  • Soboroff, Craswell, de Vries (TREC-Enterprise
    2005-06-07…) Expert finding task on the W3C
    corpus
  • CC Prediction
  • Short paper with initial idea. one single user,
    limited evaluation, not public data Pal
    McCallum, 06

36
Can we do better ranking?
  • Learning to Rank machine learning to improve
    ranking
  • Feature-based ranking function
  • Many recently proposed methods
  • RankSVM
  • ListNet
  • RankBoost
  • Perceptron Variations
  • Online, scalable.

Joachims, KDD-02
Cao et al., ICML-07
Freund et al, 2003
Elsas, Carvalho Carbonell, WSDM-08
37
Learning to Rank Recipients
  • Ranking scores as features
  • Textual Scores (KNN)
  • Network Scores
  • Frequency score
  • Recency score
  • Co-Occurrence Features

Combine textual scores with other network
features
Textual Feature (KNN scores)
Network Features
38
Learning to Rank Recipients Results
39
Conclusions
  • Problem Predicting recipients of email messages
  • Useful for email auto-completion, finding related
    people, and management addressing errors
  • Evidence from Large email collection
  • 2 subtasks TOCCBCC and CCBCC
  • Various models KNN best model in general
  • Rank Aggregation improved performance
  • Improvements in Email-auto completion
  • Thunderbird Extension (Cut Once)
  • Promising Results on learning to rank recipients

40
Thank you
41
Thank you
42
Comments
(Thanks, reviewers!)
  • No account for email structural info (body ?
    subject ? quoted)
  • Identifying Name entities (Dear Mr. X, etc.)
  • Implicitly doing, but could be better
  • Enron did not provide many first/last names
  • Fair estimation of f(doc,ca) on email?
  • Might explain weaker performance of M2 models.
About PowerShow.com