Title: Cumulative distribution networks: Graphical models for cumulative distribution functions
1Cumulative distribution networks Graphical
models for cumulative distribution functions
- Jim C. Huang and Brendan J. Frey
- Probabilistic and Statistical Inference Group,
Department of Electrical and Computer
Engineering,
University of Toronto,
Toronto, ON, Canada
2Motivation
- Problems where density models may be
intractable/unsuitable - e.g. Models with latent variables
unidentifiability, intractability - e.g. Learning to rank
- Cumulative distribution network (CDN)
3Cumulative distribution functions (CDFs)
Negative convergence
Positive convergence
Monotonicity
- Marginalization ? maximization
- Conditioning ? differentiation
4Cumulative distribution networks (CDNs)
- Bipartite graph for
representing CDFs - Example
- Sufficient for to be CDFs (Huang and
Frey, 2008) - e.g. Multivariate Gaussian CDFs, multivariate
sigmoids, - copulas
5Necessary/sufficient conditions on CDN functions
- Negative convergence (necessity and sufficiency)
- Positive convergence (sufficiency)
For each node a, at least one neighboring
function ? 0
All functions ? 1
6Necessary/sufficient conditions on CDN functions
- Monotonicity lemma (sufficiency)
- (assuming derivatives exist!)
All functions monotonically non-decreasing
Sufficient condition for a valid joint CDF
Each CDN function can be a CDF of its
arguments
7Example Bivariate CDN distributions
- Using Gaussian bivariate CDFs
8Example Bivariate CDN distributions
- Using Gumbel copulas with marginal t-CDFs and
Gaussian CDFs
9Conditional independence and graph separation in
CDNs
- For any disjoint variable node sets
separated by with respect to
10Conditional independence in CDNs
- For any disjoint variable node sets
separated by with respect to - e.g. X and Y are conditionally dependent given Z
- e.g. X and Y are conditionally independent given
Z
11Conditional independence and graph separation in
CDNs
12Connection to bi-directed graphs
- Graphs for representing marginal independence
- e.g.
- Covariance graphs (Kauermann, 1996)
- Binary models for marginal independence (Drton
and Richardson, 2008) - Factorial mixture models (Silva and Ghahramani,
2009)
13Null-dependence in CDNs
14Null-dependence in CDNs
15Mapping between CDNs and factor graphs
- Equivalence between bi-directed graph and
directed graph - Equivalence between CDN and factor graph
16Inference by message passing
- Conditioning ? differentiation
- Replace sum in sum-product with differentiation
- Recursively apply product rule via
message-passing with messages ?, ? - Derivative-Sum-Product algorithm (Huang and Frey,
2008)
17The derivative-sum-product algorithm
- In a CDN
- In a factor graph
18Derivative-Sum-Product
- Message from function to variable
19Derivative-Sum-Product
- Message from variable to function
20Application Ranking in multiplayer gaming
- e.g. Halo 2 game with 7 players, 3 teams
Given game outcomes, update player skills as a
function of all player/team performances
21Ranking in multiplayer gaming
Local cumulative model linking team rank rn
with player performances xn
e.g. Team 2 has rank 2
22Ranking in multiplayer gaming
Pairwise model of team ranks rn,rn1
Enforce stochastic orderings between teams via h
23Application Ranking in multiplayer gaming
- CDN functions Gaussian CDFs
- Skill updates
- Prediction
24Interpretation of skill updates
- For any given player let
denote the outcomes of games he/she has
played previously - Then the skill function corresponds to
25Results
- Previous methods for ranking players
- ELO (Elo, 1978)
- TrueSkill (Graepel, Minka and Herbrich, 2006)
- After message-passing
26Factor graph and CDN for multiplayer games
27Factor graph and CDN for multiplayer games
28Factor graph and CDN for multiplayer games
Dual factor graph
TrueSkill factor graph
29Learning to rank from observations
- GOAL Learn a ranking function which
minimizes probability of misranking on - test queries
Training data
Learning
Predict on test data
?
30Structured ranking learning
- Define structured loss functional as likelihood
of generating order graphs - Use stochastic gradients to minimize structured
loss functional given independent observations
31Converting from an order graph to a CDN
Edge in order graph
Preference variable node in CDN
32Probabilistic models for rank data as CDNs
Plackett-Luce model
Bradley-Terry model
e.g. RankNet (Burges et al, 2005), RankMotif
(Chen, Hughes and Morris, 2007)
e.g. ListNet (Cao et al, 2007), ListMLE (Xia et
al., 2008)
33Ranking documents for information retrieval
- Loss functional
- Multivariate sigmoids
34Ranking documents for information retrieval
- Ranking function
- Nadaraya-Watson estimator with Gaussian kernel
35Ranking documents for information retrieval
- Performance metrics
- Precision
- Average precision
- Normalized Discounted Cumulative Gains (NDCG) for
a ranked list of documents with labels r(j)
36Application Information retrieval
- OHSUMED dataset (LETOR 2.0)
37Application Information retrieval
- OHSUMED dataset (LETOR 2.0)
38Application Information retrieval
- OHSUMED dataset (LETOR 2.0)
39Application Computational systems biology
40Ranking transcription factor binding sites
- Learn from protein binding microarray data
(Berger et al. 2006)
41Ranking transcription factor binding sites
- Ranking function depends on position weight
matrix M
Probability of occurrence
Position
42Ranking transcription factor binding sites
43Ranking transcription factor binding sites
- Learn to rank microRNA targets using diverse
datasets
44Ranking microRNA targets
- Combine quantitative features and sequence data
- Quantitative features can be obtained from
diverse experimental data and computational
prediction methods
45Ranking microRNA targets
46Ranking microRNA targets
- PITA score (Kertesz et al., 2007) measures degree
to which microRNA target site is accessible due
to RNA secondary structure
47Ranking microRNA targets
- MicroRNA activity is associated with decreased
target mRNA and protein abundance
48Discussion
- Maximum-likelihood learning in CDNs
- Message-passing in graphs with loops
- Approximations for DSP messages in continuous
models - Refinements to structured ranking learning
framework - Optimization algorithms
- Choice of CDN functions
- Choice of ranking function
- Further applications
- Vision
- Collaborative prediction
- Genomics, proteomics, immunology
- Problems with partial ordering of variables