BBM: Bayesian Browsing Model from Petabyte-scale Data - PowerPoint PPT Presentation

About This Presentation
Title:

BBM: Bayesian Browsing Model from Petabyte-scale Data

Description:

Can we infer user-perceived relevance for each (query, url) pair? ... Experiments in 20 batches. LL Improvement Ratio = Comparison w.r.t. Frequency. Intuition ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 26
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: BBM: Bayesian Browsing Model from Petabyte-scale Data


1
BBM Bayesian Browsing Model from Petabyte-scale
Data
  • Chao Liu, MSR-Redmond
  • Fan Guo, Carnegie Mellon University
  • Christos Faloutsos, Carnegie Mellon University

2
Massive Log Streams
  • Search log
  • 10 terabyte each day (keeps increasing!)
  • Involves billions of distinct (query, url)s
  • Questions
  • Can we infer user-perceived relevance for each
    (query, url) pair?
  • How many passes of the data are needed? Is one
    enough?
  • Can the inference be parallel?
  • Our answer Yes, Yes, and Yes!

3
BBM Bayesian Browsing Model
query
ClickThroughs
4
Dependencies in BBM

Si
S1
S2

Ei
E1
E2
Ci
C1
C2

5
Road Map
  • Exact Model Inference
  • Algorithms through an Example
  • Experiments
  • Conclusions

6
Notations
  • For a given query
  • Top-M positions, usually M10
  • Positional relevance
  • M(M1)/2 combinations of (r, d)s
  • n search instances
  • N documents impressed in total
  • Document relevance

7
Model Inference
  • Ultimate goal
  • Observation conditional independence

8
P(CS) by Chain Rule
  • Likelihood of search instance
  • From S to R

9
Putting things together
  • Posterior with
  • Re-organize by Rjs

10
What Tells US
  • Exact inference with joint posterior in closed
    form
  • Joint posterior factorizes and hence mutually
    independent
  • At most M(M1)/2 1 numbers to fully
    characterize each posterior
  • Count vector

11
Road Map
  • Exact Model Inference
  • Algorithms through an Example
  • Experiments
  • Conclusions

12
LearnBBM One-Pass Counting
13
An Example
  • Compute
  • Count vector for R4

1
1
14
LearnBBM on MapReduce
  • Map emit((q,u), idx)
  • Reduce construct the count vector

15
Example on MapReduce
16
Road Map
  • Exact Model Inference
  • Algorithms through an Example
  • Experiments
  • Conclusions

17
Experiments
  • Compare with the User Browsing Model (Dupret and
    Piwowarski, SIGIR08)
  • The same dependence structure
  • But point-estimation of document relevance rather
    than Bayesian
  • Approximate inference through iterations
  • Data
  • Collected from Aug and Sept 2008
  • 10 algorithmic results only
  • Split to training/test sets according to time
    stamps for each query
  • 51 million search instances of 1.15 million
    distinct queries, 10X larger than the SIGIR08
    study

18
Overall Comparison on Log-Likelihood
  • Experiments in 20 batches
  • LL Improvement Ratio

19
Comparison w.r.t. Frequency
  • Intuition
  • Hard to predict clicks for infrequent queries
  • Easy for frequent ones

20
Model Comparison on Efficiency
57 times faster
21
Petabyte-Scale Experiment
  • Setup
  • 8 weeks data, 8 jobs
  • Job k takes first k-week data
  • Experiment platform
  • SCOPE Easy and Efficient Parallel Processing of
    Massive Data Sets Chaiken et al, VLDB08

22
Scalability of BBM
  • Increasing computation load
  • more queries, more urls, more impressions
  • Near-constant elapse time

23
Road Map
  • Exact Model Inference
  • Algorithms through an Example
  • Experiments
  • Conclusions

24
Conclusions
  • Bayesian Browsing Model for Search streams
  • Exact Bayesian inference
  • Joint posterior in closed form
  • A single pass suffices
  • Map-Reducible for Parallelism
  • Admissible to incremental updates
  • Perfect for mining click streams
  • Models for other stream data
  • Browsing, twittering, Web 2.0, etc?

25
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com