Title: Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem
1Interactively Optimizing Information Retrieval
Systems as a Dueling Bandits Problem
- ICML 2009
- Yisong Yue
- Thorsten Joachims
- Cornell University
2Learning To Rank
- Supervised Learning Problem
- Extension of classification/regression
- Relatively well understood
- High applicability in Information Retrieval
- Requires explicitly labeled data
- Expensive to obtain
- Expert judged labels search user utility?
- Doesnt generalize to other search domains.
3Our Contribution
- Learn from implicit feedback (users clicks)
- Reduce labeling cost
- More representative of end user information needs
- Learn using pairwise comparisons
- Humans are more adept at making pairwise
judgments - Via Interleaving Radlinski et al., 2008
- On-line framework (Dueling Bandits Problem)
- We leverage users when exploring new retrieval
functions - Exploration vs exploitation tradeoff (regret)
4Team-Game Interleaving
(uthorsten, qsvm)
f1(u,q) ? r1
f2(u,q) ? r2
1. Kernel Machines http//svm.first.gmd.de/ 2.
SVM-Light Support Vector Machine
http//ais.gmd.de/thorsten/svm
light/ 3. Support Vector Machine and Kernel ...
References http//svm.research.bell-labs.com/SVMr
efs.html 4. Lucent Technologies SVM demo applet
http//svm.research.bell-labs.com/SVT/SVMsvt.htm
l 5. Royal Holloway Support Vector Machine
http//svm.dcs.rhbnc.ac.uk
NEXTPICK
1. Kernel Machines http//svm.first.gmd.de/ 2.
Support Vector Machine http//jbolivar.freeserver
s.com/ 3. An Introduction to Support Vector
Machines http//www.support-vector.net/ 4. Archiv
es of SUPPORT-VECTOR-MACHINES ... http//www.jisc
mail.ac.uk/lists/SUPPORT... 5. SVM-Light Support
Vector Machine http//ais.gmd.de/thorsten/svm
light/
Interleaving(r1,r2)
1. Kernel Machines T2 http//svm.first.gmd.de/
2. Support Vector Machine T1 http//jbolivar.free
servers.com/ 3. SVM-Light Support Vector Machine
T2 http//ais.gmd.de/thorsten/svm light/ 4. An
Introduction to Support Vector Machines T1 http/
/www.support-vector.net/ 5. Support Vector
Machine and Kernel ... References T2 http//svm.r
esearch.bell-labs.com/SVMrefs.html 6. Archives of
SUPPORT-VECTOR-MACHINES ... T1 http//www.jiscmai
l.ac.uk/lists/SUPPORT... 7. Lucent Technologies
SVM demo applet T2 http//svm.research.bell-labs
.com/SVT/SVMsvt.html
Invariant For all k, in expectation same number
of team members in top k from each team.
Interpretation (r2 Â r1) ? clicks(T2) gt
clicks(T1)
Radlinski, Kurup, Joachims CIKM 2008
5Dueling Bandits Problem
- Continuous space bandits F
- E.g., parameter space of retrieval functions
(i.e., weight vectors) - Each time step compares two bandits
- E.g., interleaving test on two retrieval
functions - Comparison is noisy independent
6Dueling Bandits Problem
- Continuous space bandits F
- E.g., parameter space of retrieval functions
(i.e., weight vectors) - Each time step compares two bandits
- E.g., interleaving test on two retrieval
functions - Comparison is noisy independent
- Choose pair (ft, ft) to minimize regret
- ( users who prefer best bandit over chosen ones)
7- Example 1
- P(f gt f) 0.9
- P(f gt f) 0.8
- Incurred Regret 0.7
- Example 2
- P(f gt f) 0.7
- P(f gt f) 0.6
- Incurred Regret 0.3
- Example 3
- P(f gt f) 0.51
- P(f gt f) 0.55
- Incurred Regret 0.06
8Modeling Assumptions
- Each bandit f 2F has intrinsic value v(f)
- Never observed directly
- Assume v(f) is strictly concave ( unique f )
- Comparisons based on v(f)
- P(f gt f) s( v(f) v(f) )
- P is L-Lipschitz
- For example
9Probability Functions
10Dueling Bandit Gradient Descent
- Maintain ft
- Compare with ft (close to ft -- defined by step
size) - Update if ft wins comparison
- Expectation of update close to gradient of P(ft gt
f) - Builds on Bandit Gradient Descent Flaxman et
al., 2005
11d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
12d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
13d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
14d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
15d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
16d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
17d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
18d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
19d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
20Analysis (Sketch)
- Dueling Bandit Gradient Descent
- Sequence of partially convex functions ct(f)
P(ft gt f) - Random binary updates (expectation close to
gradient) - Bandit Gradient Descent Flaxman et al., SODA
2005 - Sequence of convex functions
- Use randomized update
- (expectation close to gradient)
- Can be extended to our setting
(Assumes more information)
21Analysis (Sketch)
- Convex functions satisfy
- Both additive and multiplicative error
- Depends on exploration step size d
- Main analytical contribution bounding
multiplicative error -
22Regret Bound
- Regret grows as O(T3/4)
- Average regret shrinks as O(T-1/4)
- In the limit, we do as well as knowing f in
hindsight
d O(1/T-1/4 ) ? O(1/T-1/2 )
23Practical Considerations
- Need to set step size parameters
- Depends on P(f gt f)
- Cannot be set optimally
- We dont know the specifics of P(f gt f)
- Algorithm should be robust to parameter settings
- Set parameters approximately in experiments
24- 50 dimensional parameter space
- Value function v(x) -xTx
- Logistic transfer function
- Random point has regret almost 1
More experiments in paper.
25Web Search Simulation
- Leverage web search dataset
- 1000 Training Queries, 367 Dimensions
- Simulate users issuing queries
- Value function based on NDCG_at_10 (ranking measure)
- Use logistic to make probabilistic comparisons
- Use linear ranking function.
- Not intended to compete with supervised learning
- Feasibility check for online learning w/ users
- Supervised labels difficult to acquire in the
wild
26- Chose parameters with best final performance
- Curves basically identical for validation and
test sets (no over-fitting) - Sampling multiple queries makes no difference
27What Next?
- Better simulation environments
- More realistic user modeling assumptions
- DBGD simple and extensible
- Incorporate pairwise document preferences
- Deal with ranking discontinuities
- Test on real search systems
- Varying scales of user communities
- Sheds on insight / guides future development
28Extra Slides
29Active vs Passive Learning
- Passive Data Collection (offline)
- Biased by current retrieval function
- Point-wise Evaluation
- Design retrieval function offline
- Evaluate online
- Active Learning (online)
- Automatically propose new rankings to evaluate
- Our approach
30Relative vs Absolute Metrics
- Our framework based on relative metrics
- E.g., comparing pairs of results or rankings
- Relatively recent development
- Absolute Metrics
- E.g., absolute click-through rate
- More common in literature
- Suffers from presentation bias
- Less robust to the many different sources of noise
31What Results do Users View/Click?
Joachims et al., TOIS 2007
32(No Transcript)
33Analysis (Sketch)
- Convex functions satisfy
- We have both multiplicative and additive error
- Depends on exploration step size d
- Main technical contribution bounding
multiplicative error -
Existing results yields sub-linear bounds on
34Analysis (Sketch)
- We know how to bound
- Regret
- We can show using Lipschitz and symmetry of s
35More Simulation Experiments
- Logistic transfer function s(x) 1/(1exp(-x))
- 4 choices of value functions
- d, ? set approximately
36(No Transcript)
37NDCG
- Normalized Discounted Cumulative Gain
- Multiple Levels of Relevance
- DCG
- contribution of ith rank position
- Ex has DCG score of
- NDCG is normalized DCG
- best possible ranking as score NDCG 1
38Considerations
- NDCG is discontinuous w.r.t. function parameters
- Try larger values of d, ?
- Try sampling multiple queries per update
- Homogenous user values
- NDCG_at_10
- Not an optimization concern
- Modeling limitation
- Not intended to compete with supervised learning
- Sanity check of feasibility for online learning
w/ users