Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem PowerPoint PPT Presentation

presentation player overlay
1 / 38
About This Presentation
Transcript and Presenter's Notes

Title: Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem


1
Interactively Optimizing Information Retrieval
Systems as a Dueling Bandits Problem
  • ICML 2009
  • Yisong Yue
  • Thorsten Joachims
  • Cornell University

2
Learning To Rank
  • Supervised Learning Problem
  • Extension of classification/regression
  • Relatively well understood
  • High applicability in Information Retrieval
  • Requires explicitly labeled data
  • Expensive to obtain
  • Expert judged labels search user utility?
  • Doesnt generalize to other search domains.

3
Our Contribution
  • Learn from implicit feedback (users clicks)
  • Reduce labeling cost
  • More representative of end user information needs
  • Learn using pairwise comparisons
  • Humans are more adept at making pairwise
    judgments
  • Via Interleaving Radlinski et al., 2008
  • On-line framework (Dueling Bandits Problem)
  • We leverage users when exploring new retrieval
    functions
  • Exploration vs exploitation tradeoff (regret)

4
Team-Game Interleaving
(uthorsten, qsvm)
f1(u,q) ? r1
f2(u,q) ? r2
1. Kernel Machines http//svm.first.gmd.de/ 2.
SVM-Light Support Vector Machine
http//ais.gmd.de/thorsten/svm
light/ 3. Support Vector Machine and Kernel ...
References http//svm.research.bell-labs.com/SVMr
efs.html 4. Lucent Technologies SVM demo applet
http//svm.research.bell-labs.com/SVT/SVMsvt.htm
l 5. Royal Holloway Support Vector Machine
http//svm.dcs.rhbnc.ac.uk
NEXTPICK
1. Kernel Machines http//svm.first.gmd.de/ 2.
Support Vector Machine http//jbolivar.freeserver
s.com/ 3. An Introduction to Support Vector
Machines http//www.support-vector.net/ 4. Archiv
es of SUPPORT-VECTOR-MACHINES ... http//www.jisc
mail.ac.uk/lists/SUPPORT... 5. SVM-Light Support
Vector Machine http//ais.gmd.de/thorsten/svm
light/
Interleaving(r1,r2)
1. Kernel Machines T2 http//svm.first.gmd.de/
2. Support Vector Machine T1 http//jbolivar.free
servers.com/ 3. SVM-Light Support Vector Machine
T2 http//ais.gmd.de/thorsten/svm light/ 4. An
Introduction to Support Vector Machines T1 http/
/www.support-vector.net/ 5. Support Vector
Machine and Kernel ... References T2 http//svm.r
esearch.bell-labs.com/SVMrefs.html 6. Archives of
SUPPORT-VECTOR-MACHINES ... T1 http//www.jiscmai
l.ac.uk/lists/SUPPORT... 7. Lucent Technologies
SVM demo applet T2 http//svm.research.bell-labs
.com/SVT/SVMsvt.html
Invariant For all k, in expectation same number
of team members in top k from each team.
Interpretation (r2 Â r1) ? clicks(T2) gt
clicks(T1)
Radlinski, Kurup, Joachims CIKM 2008
5
Dueling Bandits Problem
  • Continuous space bandits F
  • E.g., parameter space of retrieval functions
    (i.e., weight vectors)
  • Each time step compares two bandits
  • E.g., interleaving test on two retrieval
    functions
  • Comparison is noisy independent

6
Dueling Bandits Problem
  • Continuous space bandits F
  • E.g., parameter space of retrieval functions
    (i.e., weight vectors)
  • Each time step compares two bandits
  • E.g., interleaving test on two retrieval
    functions
  • Comparison is noisy independent
  • Choose pair (ft, ft) to minimize regret
  • ( users who prefer best bandit over chosen ones)

7
  • Example 1
  • P(f gt f) 0.9
  • P(f gt f) 0.8
  • Incurred Regret 0.7
  • Example 2
  • P(f gt f) 0.7
  • P(f gt f) 0.6
  • Incurred Regret 0.3
  • Example 3
  • P(f gt f) 0.51
  • P(f gt f) 0.55
  • Incurred Regret 0.06

8
Modeling Assumptions
  • Each bandit f 2F has intrinsic value v(f)
  • Never observed directly
  • Assume v(f) is strictly concave ( unique f )
  • Comparisons based on v(f)
  • P(f gt f) s( v(f) v(f) )
  • P is L-Lipschitz
  • For example

9
Probability Functions
10
Dueling Bandit Gradient Descent
  • Maintain ft
  • Compare with ft (close to ft -- defined by step
    size)
  • Update if ft wins comparison
  • Expectation of update close to gradient of P(ft gt
    f)
  • Builds on Bandit Gradient Descent Flaxman et
    al., 2005

11
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
12
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
13
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
14
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
15
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
16
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
17
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
18
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
19
d explore step size ? exploit step
size Current point Losing candidate Winning
candidate
Dueling Bandit Gradient Descent
20
Analysis (Sketch)
  • Dueling Bandit Gradient Descent
  • Sequence of partially convex functions ct(f)
    P(ft gt f)
  • Random binary updates (expectation close to
    gradient)
  • Bandit Gradient Descent Flaxman et al., SODA
    2005
  • Sequence of convex functions
  • Use randomized update
  • (expectation close to gradient)
  • Can be extended to our setting

(Assumes more information)
21
Analysis (Sketch)
  • Convex functions satisfy
  • Both additive and multiplicative error
  • Depends on exploration step size d
  • Main analytical contribution bounding
    multiplicative error

22
Regret Bound
  • Regret grows as O(T3/4)
  • Average regret shrinks as O(T-1/4)
  • In the limit, we do as well as knowing f in
    hindsight

d O(1/T-1/4 ) ? O(1/T-1/2 )
23
Practical Considerations
  • Need to set step size parameters
  • Depends on P(f gt f)
  • Cannot be set optimally
  • We dont know the specifics of P(f gt f)
  • Algorithm should be robust to parameter settings
  • Set parameters approximately in experiments

24
  • 50 dimensional parameter space
  • Value function v(x) -xTx
  • Logistic transfer function
  • Random point has regret almost 1

More experiments in paper.
25
Web Search Simulation
  • Leverage web search dataset
  • 1000 Training Queries, 367 Dimensions
  • Simulate users issuing queries
  • Value function based on NDCG_at_10 (ranking measure)
  • Use logistic to make probabilistic comparisons
  • Use linear ranking function.
  • Not intended to compete with supervised learning
  • Feasibility check for online learning w/ users
  • Supervised labels difficult to acquire in the
    wild

26
  • Chose parameters with best final performance
  • Curves basically identical for validation and
    test sets (no over-fitting)
  • Sampling multiple queries makes no difference

27
What Next?
  • Better simulation environments
  • More realistic user modeling assumptions
  • DBGD simple and extensible
  • Incorporate pairwise document preferences
  • Deal with ranking discontinuities
  • Test on real search systems
  • Varying scales of user communities
  • Sheds on insight / guides future development

28
Extra Slides
29
Active vs Passive Learning
  • Passive Data Collection (offline)
  • Biased by current retrieval function
  • Point-wise Evaluation
  • Design retrieval function offline
  • Evaluate online
  • Active Learning (online)
  • Automatically propose new rankings to evaluate
  • Our approach

30
Relative vs Absolute Metrics
  • Our framework based on relative metrics
  • E.g., comparing pairs of results or rankings
  • Relatively recent development
  • Absolute Metrics
  • E.g., absolute click-through rate
  • More common in literature
  • Suffers from presentation bias
  • Less robust to the many different sources of noise

31
What Results do Users View/Click?
Joachims et al., TOIS 2007
32
(No Transcript)
33
Analysis (Sketch)
  • Convex functions satisfy
  • We have both multiplicative and additive error
  • Depends on exploration step size d
  • Main technical contribution bounding
    multiplicative error

Existing results yields sub-linear bounds on
34
Analysis (Sketch)
  • We know how to bound
  • Regret
  • We can show using Lipschitz and symmetry of s

35
More Simulation Experiments
  • Logistic transfer function s(x) 1/(1exp(-x))
  • 4 choices of value functions
  • d, ? set approximately

36
(No Transcript)
37
NDCG
  • Normalized Discounted Cumulative Gain
  • Multiple Levels of Relevance
  • DCG
  • contribution of ith rank position
  • Ex has DCG score of
  • NDCG is normalized DCG
  • best possible ranking as score NDCG 1

38
Considerations
  • NDCG is discontinuous w.r.t. function parameters
  • Try larger values of d, ?
  • Try sampling multiple queries per update
  • Homogenous user values
  • NDCG_at_10
  • Not an optimization concern
  • Modeling limitation
  • Not intended to compete with supervised learning
  • Sanity check of feasibility for online learning
    w/ users
Write a Comment
User Comments (0)