Density Ratio Estimation: A New Versatile Tool for Machine Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Density Ratio Estimation: A New Versatile Tool for Machine Learning

Description:

Title: PowerPoint Author: sugi Last modified by: Masashi Sugiyama Created Date: 5/12/2000 5:44:36 AM Document presentation format – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 51
Provided by: sugi2
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Density Ratio Estimation: A New Versatile Tool for Machine Learning


1
Density Ratio EstimationA New Versatile
Toolfor Machine Learning
  • Department of Computer Science
  • Tokyo Institute of Technology
  • Masashi Sugiyama
  • sugi_at_cs.titech.ac.jp
  • http//sugiyama-www.cs.titech.ac.jp/sugi

2
Overview of My Talk (1)
  • Consider the ratio of two probability densities.
  • If the ratio is estimated, various machine
    learning problems can be solved!
  • Non-stationarity adaptation, domain adaptation,
    multi-task learning, outlier detection, change
    detection in time series, feature selection,
    dimensionality reduction, independent component
    analysis, conditional density estimation,
    classification, two-sample test

3
Overview of My Talk (2)
Vapnik said When solving a problem of
interest, one should not solve a more general
problem as an intermediate step
Knowing densities
Knowing ratio
  • Estimating density ratio is substantially easier
    than estimating densities!
  • Various direct density ratio estimation methods
    have been developed recently.

4
Organization of My Talk
  • Usage of Density Ratios
  • Covariate shift adaptation
  • Outlier detection
  • Mutual information estimation
  • Conditional probability estimation
  • Methods of Density Ratio Estimation

5
Covariate Shift Adaptation
Shimodaira (JSPI2000)
  • Training/test input distributions are different,
    but target function remains unchanged
  • extrapolation

Function
Input density
Training samples
Test samples
Target function
6
Adaptation Using Density Ratios
  • Ordinary least-squares is not consistent.
  • Density-ratio weighted least-squares is
    consistent.

Applicable to any likelihood-based methods!
7
Model Selection
  • Controlling bias-variance trade-off is important
    in covariate shift adaptation.
  • No weighting low-variance, high-bias
  • Density ratio weighting low-bias, high-variance
  • Density ratio weighting plays an important role
    for unbiased model selection
  • Akaike information criterion (regular models)
  • Subspace information criterion (linear models)
  • Cross-validation (arbitrary models)

Shimodaira (JSPI2000)
Sugiyama Müller (StatDec.2005)
Sugiyama, Krauledat Müller (JMLR2007)
8
Active Learning
  • Covariate shift naturally occurs in active
    learning scenarios
  • is designed by users.
  • is determined by environment.
  • Density ratio weighting plays an important role
    for low-bias active learning.
  • Population-based methods
  • Pool-based methods

Wiens (JSPI2000) Kanamori Shimodaira
(JSPI2003) Sugiyama, (JMLR2006)
Kanamori, (NeuroComp2007), Sugiyama Nakajima
(MLJ2009)
9
Real-world Applications
  • Age prediction from faces (with NECsoft)
  • Illumination and angle change (KLSCV)
  • Speaker identification (with Yamaha)
  • Voice quality change (KLRCV)
  • Brain-computer interface
  • Mental condition change (LDACV)

Ueki, Sugiyama Ihara (ICPR2010)
Yamada, Sugiyama Matsui (SP2010)
Sugiyama, Krauledat Müller (JMLR2007) Li,
Kambara, Koike Sugiyama (IEEE-TBE2010)
10
Real-world Applications (cont.)
  • Japanese text segmentation (with IBM)
  • Domain adaptation from general conversation to
    medicine (CRFCV)
  • Semiconductor wafer alignment (with Nikon)
  • Optical marker allocation (AL)
  • Robot control
  • Dynamic motion update (KLSCV)
  • Optimal exploration (AL)

???????????.
Tsuboi, Kashima, Hido, Bickel Sugiyama (JIP2008)
Sugiyama Nakajima (MLJ2009)
Hachiya, Akiyama, Sugiyama Peters (NN2009)
Hachiya, Peters Sugiyama (ECML2009)
Akiyama, Hachiya, Sugiyama (NN2010)
11
Organization of My Talk
  • Usage of Density Ratios
  • Covariate shift adaptation
  • Outlier detection
  • Mutual information estimation
  • Conditional probability estimation
  • Methods of Density Ratio Estimation

12
Inlier-based Outlier Detection
Hido, Tsuboi, Kashima, Sugiyama Kanamori
(ICDM2008) Smola, Song Teo (AISTATS2009)
  • Test samples having different tendency from
    regular ones are regarded as outliers.

Outlier
13
Real-world Applications
  • Steel plant diagnosis (with JFE Steel)
  • Printer roller quality control (with Canon)
  • Loan customer inspection (with IBM)
  • Sleep therapy from biometric data

Takimoto, Matsugu Sugiyama (DMSS2009)
Hido, Tsuboi, Kashima, Sugiyama Kanamori
(KAIS2010)
Kawahara Sugiyama (SDM2009)
14
Organization of My Talk
  • Usage of Density Ratios
  • Covariate shift adaptation
  • Outlier detection
  • Mutual information estimation
  • Conditional probability estimation
  • Methods of Density Ratio Estimation

15
Mutual Information Estimation
  • Mutual information (MI)
  • MI as an independence measure
  • MI can be approximated using density ratio

and are statistically independent
Suzuki, Sugiyama Tanaka (ISIT2009)
16
Usage of MI Estimator
  • Independence between input and output
  • Feature selection
  • Sufficient dimension reduction
  • Independence between inputs
  • Independent component analysis
  • Independence between input and residual
  • Causal inference

Suzuki, Sugiyama, Sese Kanamori (BMC
Bioinformatics 2009)
Suzuki Sugiyama (AISTATS2010)
Suzuki Sugiyama (ICA2009)
Yamada Sugiyama (AAAI2010)
y
Causal inference
e
Feature selection, Sufficient dimension reduction
x
x
Independent component analysis
17
Organization of My Talk
  • Usage of Density Ratios
  • Covariate shift adaptation
  • Outlier detection
  • Mutual information estimation
  • Conditional probability estimation
  • Methods of Density Ratio Estimation

18
Conditional Probability Estimation
  • When is continuous
  • Conditional density estimation
  • Mobile robots transition estimation
  • When is categorical
  • Probabilistic classification

Sugiyama, Takeuchi, Suzuki, Kanamori, Hachiya
Okanohara (AISTATS2010)
80
Class 1
Pattern
Class 2
20
Sugiyama (Submitted)
19
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment matching Approach
  • Ratio matching Approach
  • Comparison
  • Dimensionality Reduction

20
Density Ratio Estimation
  • Density ratios are shown to be versatile.
  • In practice, however, the density ratio should be
    estimated from data.
  • Naïve approach Estimate two densities separately
    and take their ratio

21
Vapniks Principle
When solving a problem, dont solve more
difficult problems as an intermediate step
Knowing densities
Knowing ratio
  • Estimating density ratio is substantially easier
    than estimating densities!
  • We estimate density ratio without going through
    density estimation.

22
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Comparison
  • Dimensionality Reduction

23
Probabilistic Classification
  • Separate numerator and denominator samples by a
    probabilistic classifier
  • Logistic regression achieves the minimum
    asymptotic variance when model is correct.

Qin (Biometrika1998)
24
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Comparison
  • Dimensionality Reduction

25
Moment Matching
  • Match moments of and
  • Ex) Matching the mean

26
Kernel Mean Matching
Huang, Smola, Gretton, Borgwardt Schölkopf
(NIPS2006)
  • Matching a finite number of moments does not
    yield the true ratio even asymptotically.
  • Kernel mean matching All the moments are
    efficiently matched using Gaussian RKHS
  • Ratio values at samples can
    be estimated via quadratic program.
  • Variants for learning entire ratio function and
    other loss functions are also available.

Kanamori, Suzuki Sugiyama (arXiv2009)
27
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Kullback-Leibler Divergence
  • Squared Distance
  • Comparison
  • Dimensionality Reduction

28
Ratio Matching
Sugiyama, Suzuki Kanamori (RIMS2010)
  • Match a ratio model with true ratio
    under the Bregman divergence
  • Its empirical approximation is given by

(Constant is ignored)
29
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Kullback-Leibler Divergence
  • Squared Distance
  • Comparison
  • Dimensionality Reduction

30
Kullback-Leibler Divergence (KL)
Sugiyama, Nakajima, Kashima, von Bünau Kawanabe
(NIPS2007) Nguyen, Wainwright Jordan (NIPS2007)
  • Put . Then BR yields
  • For a linear model
    ,
  • minimization of KL is convex.

(ex. Gauss kernel)
31
Properties
Nguyen, Wainwright Jordan (NIPS2007) Sugiyama,
Suzuki, Nakajima, Kashima, von Bünau Kawanabe
(AISM2008)
  • Parametric case
  • Learned parameter converge to the optimal value
    with order .
  • Non-parametric case
  • Learned function converges to the optimal
    function with order slightly slower than
    (depending on the covering number or the
    bracketing entropy of the function class).

32
Experiments
  • Setup d-dim. Gaussian with covariance identity
    and
  • Denominator mean (0,0,0,,0)
  • Numerator mean (1,0,0,,0)
  • Ratio of kernel density estimators (RKDE)
  • Estimate two densities separately and take ratio.
  • Gaussian width is chosen by CV.
  • KL method
  • Estimate density ratio directly.
  • Gaussian width is chosen by CV.

33
Accuracy as a Functionof Input Dimensionality
RKDE
Normalized MSE
KL
Dim.
  • RKDE curse of dimensionality
  • KL works well

34
Variations
  • EM algorithms for KL with
  • Log-linear models
  • Gaussian mixture models
  • Probabilistic PCA mixture models
  • are also available.

Tsuboi, Kashima, Hido, Bickel Sugiyama (JIP2009)
Yamada Sugiyama (IEICE2009)
Yamada, Sugiyama, Wichern Jaak (Submitted)
35
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Kullback-Leibler Divergence
  • Squared Distance
  • Comparison
  • Dimensionality Reduction

36
Squared Distance (SQ)
Kanamori, Hido Sugiyama (JMLR2009)
  • Put . Then BR yields
  • For a linear model
    ,
  • minimizer of SQ is analytic
  • Computationally efficient!

37
Properties
Kanamori, Hido Sugiyama (JMLR2009)
Kanamori, Suzuki Sugiyama (arXiv2009)
  • Optimal parametric/non-parametric rates of
    convergence are achieved.
  • Leave-one-out cross-validation score can be
    computed analytically.
  • SQ has the smallest condition number.
  • Analytic solution allows computation of the
    derivative of mutual information estimator
  • Sufficient dimension reduction
  • Independent component analysis
  • Causal inference

38
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Comparison
  • Dimensionality Reduction

39
Density Ratio Estimation Methods
Method Density estimation Domains of de/nu Model selection Computation time
RKDE Involved Could differ Possible Very fast
LR Free Same? Possible Slow
KMM Free Same? ??? Slow
KL Free Could differ Possible Slow
SQ Free Could differ Possible Fast
  • SQ would be preferable in practice.

40
Organization of My Talk
  • Usage of Density Ratios
  • Methods of Density Ratio Estimation
  • Probabilistic Classification Approach
  • Moment Matching Approach
  • Ratio Matching Approach
  • Comparison
  • Dimensionality Reduction

41
Direct Density Ratio Estimationwith
Dimensionality Reduction (D3)
  • Directly density ratio estimation without going
    through density estimation is promising.
  • However, for high-dimensional data, density
    ratio estimation is still challenging.
  • We combine direct density ratio estimation with
    dimensionality reduction!

42
Hetero-distributional Subspace (HS)
  • Key assumption and are
    different only in a subspace (called HS).
  • This allows us to estimate the density ratio only
    within the low-dimensional HS!

HS
Full-rank and orthogonal
43
HS Search Based onSupervised Dimension Reduction
Sugiyama, Kawanabe Chui (NN2010)
  • In HS, and are different,
    so and would
    be separable.
  • We may use any supervised dimension reduction
    method for HS search.
  • Local Fisher discriminant analysis (LFDA)
  • Can handle within-class multimodality
  • No limitation on reduced dimensions
  • Analytic solution available

Sugiyama (JMLR2007)
44
Pros and Cons of SupervisedDimension Reduction
Approach
  • Pros
  • SQLFDA gives analytic solution for D3.
  • Cons
  • Rather restrictive implicit assumption that
    and are independent.
  • Separability does not always lead to HS (e.g.,
    two overlapped, but different densities)

Separable
HS
45
Characterization of HS
Sugiyama, Hara, von Bünau, Suzuki, Kanamori
Kawanabe (SDM2010)
  • HS is given as the maximizer of the Pearson
    divergence
  • Thus, we can identify HS by finding maximizer of
    PE with respect to .
  • PE can be analytically approximated by SQ!

46
HS Search
  • Gradient descent
  • Natural gradient descent
  • Utilize the Stiefel-manifold structure for
    efficiency
  • Givens rotation
  • Choose a pair of axes and rotate in the subspace
  • Subspace rotation
  • Ignore rotation within subspaces for efficiency

Amari (NeuralComp1998)
Plumbley (NeuroComp2005)
Rotation across the subspace
Rotation within the subspace
Hetero-distributional subspace
47
Experiments
True ratio (2d)
Samples (2d)
Plain SQ (2d)
Increasing dimensionality (by adding noisy dims)
Plain SQ
D3-SQ (2d)
D3-SQ
48
Conclusions
  • Many ML tasks can be solved via density ratio
    estimation
  • Non-stationarity adaptation, domain adaptation,
    multi-task learning, outlier detection, change
    detection in time series, feature selection,
    dimensionality reduction, independent component
    analysis, conditional density estimation,
    classification, two-sample test
  • Directly estimating density ratios without going
    through density estimation is the key.
  • LR, KMM, KL, SQ, and D3 .

49
Books
  • Quiñonero-Candela, Sugiyama, Schwaighofer
    Lawrence (Eds.), Dataset Shift in Machine
    Learning, MIT Press, 2009.
  • Sugiyama, von Bünau, Kawanabe Müller, Covariate
    Shift Adaptation in Machine Learning, MIT Press
    (coming soon!)
  • Sugiyama, Suzuki Kanamori,
    Density Ratio Estimation in Machine Learning,
    Cambridge University Press (coming soon!)

50
The World of Density Ratios
Real-world applications Brain-computer
interface, Robot control, Image
understanding, Speech recognition, Natural
language processing, Bioinformatics
Machine learning algorithms Covariate shift
adaptation, Outlier detection, Conditional
probability estimation, Mutual information
estimation
Density ratio estimation Fundamental algorithms
(LR, KMM, KL, SQ) Large-scale, High-dimensionality
, Stabilization, Robustification
Theoretical analysis Convergence, Information
criteria, Numerical stability
Write a Comment
User Comments (0)
About PowerShow.com