Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem presentation

About This Presentation

Transcript and Presenter's Notes

Title: Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem

1
Interactively Optimizing Information Retrieval
Systems as a Dueling Bandits Problem

ICML 2009
Yisong Yue
Thorsten Joachims
Cornell University

2
Learning To Rank

Supervised Learning Problem
Extension of classification/regression
Relatively well understood
High applicability in Information Retrieval
Requires explicitly labeled data
Expensive to obtain
Expert judged labels search user utility?
Doesnt generalize to other search domains.

3
Our Contribution

Learn from implicit feedback (users clicks)
Reduce labeling cost
More representative of end user information needs
Learn using pairwise comparisons
Humans are more adept at making pairwise
judgments
Via Interleaving Radlinski et al., 2008
On-line framework (Dueling Bandits Problem)
We leverage users when exploring new retrieval
functions
Exploration vs exploitation tradeoff (regret)

4
Team-Game Interleaving
(uthorsten, qsvm)
f1(u,q) ? r1
f2(u,q) ? r2
1. Kernel Machines http//svm.first.gmd.de/ 2.
SVM-Light Support Vector Machine
http//ais.gmd.de/thorsten/svm
light/ 3. Support Vector Machine and Kernel ...
References http//svm.research.bell-labs.com/SVMr
efs.html 4. Lucent Technologies SVM demo applet
http//svm.research.bell-labs.com/SVT/SVMsvt.htm
l 5. Royal Holloway Support Vector Machine
http//svm.dcs.rhbnc.ac.uk
NEXTPICK
1. Kernel Machines http//svm.first.gmd.de/ 2.
Support Vector Machine http//jbolivar.freeserver
s.com/ 3. An Introduction to Support Vector
Machines http//www.support-vector.net/ 4. Archiv
es of SUPPORT-VECTOR-MACHINES ... http//www.jisc
mail.ac.uk/lists/SUPPORT... 5. SVM-Light Support
Vector Machine http//ais.gmd.de/thorsten/svm
light/
Interleaving(r1,r2)
1. Kernel Machines T2 http//svm.first.gmd.de/
2. Support Vector Machine T1 http//jbolivar.free
servers.com/ 3. SVM-Light Support Vector Machine
T2 http//ais.gmd.de/thorsten/svm light/ 4. An
Introduction to Support Vector Machines T1 http/
/www.support-vector.net/ 5. Support Vector
Machine and Kernel ... References T2 http//svm.r
esearch.bell-labs.com/SVMrefs.html 6. Archives of
SUPPORT-VECTOR-MACHINES ... T1 http//www.jiscmai
l.ac.uk/lists/SUPPORT... 7. Lucent Technologies
SVM demo applet T2 http//svm.research.bell-labs
.com/SVT/SVMsvt.html
Invariant For all k, in expectation same number
of team members in top k from each team.
Interpretation (r2 Â r1) ? clicks(T2) gt
clicks(T1)
Radlinski, Kurup, Joachims CIKM 2008
5
Dueling Bandits Problem

Continuous space bandits F
E.g., parameter space of retrieval functions
(i.e., weight vectors)
Each time step compares two bandits
E.g., interleaving test on two retrieval
functions
Comparison is noisy independent

6
Dueling Bandits Problem

Continuous space bandits F
E.g., parameter space of retrieval functions
(i.e., weight vectors)
Each time step compares two bandits
E.g., interleaving test on two retrieval
functions
Comparison is noisy independent
Choose pair (ft, ft) to minimize regret
( users who prefer best bandit over chosen ones)

Example 1
P(f gt f) 0.9
P(f gt f) 0.8
Incurred Regret 0.7
Example 2
P(f gt f) 0.7
P(f gt f) 0.6
Incurred Regret 0.3
Example 3
P(f gt f) 0.51
P(f gt f) 0.55
Incurred Regret 0.06

8
Modeling Assumptions

Each bandit f 2F has intrinsic value v(f)
Never observed directly
Assume v(f) is strictly concave ( unique f )
Comparisons based on v(f)
P(f gt f) s( v(f) v(f) )
P is L-Lipschitz
For example

9
Probability Functions
10
Dueling Bandit Gradient Descent

Maintain ft
Compare with ft (close to ft -- defined by step
size)
Update if ft wins comparison
Expectation of update close to gradient of P(ft gt
f)
Builds on Bandit Gradient Descent Flaxman et
al., 2005

11
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
12
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
13
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
14
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
15
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
16
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
17
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
18
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
19
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
20
Analysis (Sketch)

Dueling Bandit Gradient Descent
Sequence of partially convex functions ct(f)
P(ft gt f)
Random binary updates (expectation close to
gradient)
Bandit Gradient Descent Flaxman et al., SODA
2005
Sequence of convex functions
Use randomized update
(expectation close to gradient)
Can be extended to our setting

(Assumes more information)
21
Analysis (Sketch)

Convex functions satisfy
Both additive and multiplicative error
Depends on exploration step size d
Main analytical contribution bounding
multiplicative error

22
Regret Bound

Regret grows as O(T3/4)
Average regret shrinks as O(T-1/4)
In the limit, we do as well as knowing f in
hindsight

d O(1/T-1/4 ) ? O(1/T-1/2 )
23
Practical Considerations

Need to set step size parameters
Depends on P(f gt f)
Cannot be set optimally
We dont know the specifics of P(f gt f)
Algorithm should be robust to parameter settings
Set parameters approximately in experiments

50 dimensional parameter space
Value function v(x) -xTx
Logistic transfer function
Random point has regret almost 1

More experiments in paper.
25
Web Search Simulation

Leverage web search dataset
1000 Training Queries, 367 Dimensions
Simulate users issuing queries
Value function based on NDCG_at_10 (ranking measure)
Use logistic to make probabilistic comparisons
Use linear ranking function.
Not intended to compete with supervised learning
Feasibility check for online learning w/ users
Supervised labels difficult to acquire in the
wild

Chose parameters with best final performance
Curves basically identical for validation and
test sets (no over-fitting)
Sampling multiple queries makes no difference

27
What Next?

Better simulation environments
More realistic user modeling assumptions
DBGD simple and extensible
Incorporate pairwise document preferences
Deal with ranking discontinuities
Test on real search systems
Varying scales of user communities
Sheds on insight / guides future development

28
Extra Slides
29
Active vs Passive Learning

Passive Data Collection (offline)
Biased by current retrieval function
Point-wise Evaluation
Design retrieval function offline
Evaluate online
Active Learning (online)
Automatically propose new rankings to evaluate
Our approach

30
Relative vs Absolute Metrics

Our framework based on relative metrics
E.g., comparing pairs of results or rankings
Relatively recent development
Absolute Metrics
E.g., absolute click-through rate
More common in literature
Suffers from presentation bias
Less robust to the many different sources of noise

31
What Results do Users View/Click?
Joachims et al., TOIS 2007
32
(No Transcript)
33
Analysis (Sketch)

Convex functions satisfy
We have both multiplicative and additive error
Depends on exploration step size d
Main technical contribution bounding
multiplicative error

Existing results yields sub-linear bounds on
34
Analysis (Sketch)

We know how to bound
Regret
We can show using Lipschitz and symmetry of s

35
More Simulation Experiments

Logistic transfer function s(x) 1/(1exp(-x))
4 choices of value functions
d, ? set approximately

36
(No Transcript)
37
NDCG

Normalized Discounted Cumulative Gain
Multiple Levels of Relevance
DCG
contribution of ith rank position
Ex has DCG score of
NDCG is normalized DCG
best possible ranking as score NDCG 1

38
Considerations

NDCG is discontinuous w.r.t. function parameters
Try larger values of d, ?
Try sampling multiple queries per update
Homogenous user values
NDCG_at_10
Not an optimization concern
Modeling limitation
Not intended to compete with supervised learning
Sanity check of feasibility for online learning
w/ users

Write a Comment

User Comments (0)

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem PowerPoint PPT Presentation