Title: Fast Learning of Document Ranking Functions with the Committee Perceptron
1Fast Learning of Document Ranking Functions with
the Committee Perceptron
- Jonathan Elsas
- LTI Student Research Symposium
- Sept. 14, 2007
2Briefly
- Joint work with Vitor Carvalho and Jaime
Carbonell - Submitted to Web Search and Data Mining
conference (WSDM 2008) - http//wsdm2008.org
3Evolution of Features in IR
- In the beginning, there was TF
- It became clear that other features were needed
for effective document ranking - IDF, document length
- Along came HTML
- doc. structure link network features
- Now, we have collective annotation
- social book-marking features
4Challenges
- Which features are important? How to best choose
the weights for each feature? - With just a few features, manual tuning or
parameter sweeps sufficed. - This approach becomes impractical with more than
5-6 features.
5Learning Approach to Setting Feature Weights
- Goal Utilize existing relevance judgments to
learn optimal weight setting - Recently has become a hot research area in IR.
Learning to Rank - (See SIGIR 2007 Learning To Rank
workshophttp//research.microsoft.com/users/LR4IR
-2007/)
6Pair-wise Preference Learning
- Learning a document scoring function
-
- Treated as a classification problem on pairs of
documents -
- Resulting scoring function is used as the learned
document ranker.
Correct
Incorrect
7Perceptron Algorithm
- Proposed in 1958 by Rosenblatt
- Online algorithm (instance-at-a-time)
- Whenever a ranking mistake is made, update the
hypothesis - Provable mistake bounds convergence
8Perceptron Algorithm Variants
- Pocket Perceptron (Gallant, 1990)
- Keep the one-best hypothesis
- Voted Perceptron (Freund Schapire, 1999)
- Keep all the intermediate hypotheses and combine
them at the end - Often in practice, average hypotheses
9Committee Perceptron Algorithm
- Ensemble method
- Selectively chooses N best hypotheses encountered
during training - Significant advantages over previous perceptron
variants - Many ways to combine output of hypotheses
- Voting, score averaging, hybrid approaches
- Weight by a retrieval performance metric
10Committee Perceptron Training
Training Data
Committee
Current Hypothesis
11Committee Perceptron Training
Training Data
Committee
Current Hypothesis
12Committee Perceptron Training
Training Data
Committee
Current Hypothesis
13Committee Perceptron Training
Training Data
Committee
Current Hypothesis
14Evaluation
- Compared Committee Perceptron to
- RankSVM (Joachims et. al., 2002)
- RankBoost (Freund et. al., 2003)
- Learning To Rank (LETOR) dataset
- http//research.microsoft.com/users/tyliu/LETOR/de
fault.aspx - Provides three test collections, standardized
feature sets, train/validation/test splits
15Committee Perceptron Learning Curves
16Committee Perceptron Performance
17Committee Perceptron Performance (OHSUMED)
18Committee Perceptron Performance (TD2004)
19Committee Perceptron Training Time
- Much faster than other rank learning algorithms.
- Training time on OHSUMED dataset
- CP 450 seconds for 50 iterations
- RankSVM gt 21k seconds
- 45-fold reduction in training time with
comparable performance.
20Committee Perceptron Summary
- CP is a fast perceptron-based learning algorithm,
applied to document ranking. - Significantly outperforms the pocket and average
perceptron variants on learning document ranking
functions. - Performs comparably to two strong baseline rank
learning algorithms, but trains in much less time.
21Future Directions
- Performance of the Committee Perceptron is good,
but it could be better - What are we really optimizing?
- (not MAP or NDCG)
22Loss Functions for Pairwise Preference Learners
- Minimizing the number of mis-ranked document
pairs - This only loosely corresponds to ranked-based
evaluation measures - Problem All rank positions treated the same
23Problems with Optimizing the Wrong Metric
24Ranked Retrieval Pairwise- Preference Loss
Functions
- Average Precision places more emphasis on
higher-ranked documents.
25Ranked Retrieval Pairwise- Preference Loss
Functions
- Average Precision places more emphasis on
higher-ranked documents. - Re-writing AP as a pairwise loss function
26Preliminary Results
Using MAP-Loss Using Pairs-Loss
27