Learning with Limited Supervision by Input and Output Coding - PowerPoint PPT Presentation

About This Presentation
Title:

Learning with Limited Supervision by Input and Output Coding

Description:

P P P P(xi) Empirical results Face recognition (Yale), using SVM (poly-2) Dimension reduction: KPCA, KDA, OLaplacian * Empirical results Face recognition ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 101
Provided by: cmu146
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning with Limited Supervision by Input and Output Coding


1
Learning with Limited Supervision by Input and
Output Coding
  • Yi Zhang
  • Machine Learning Department
  • Carnegie Mellon University
  • April 30th, 2012

2
Thesis Committee
  • Jeff Schneider, Chair
  • Geoff Gordon
  • Tom Mitchell
  • Xiaojin (Jerry) Zhu, University of
    Wisconsin-Madison

3
Introduction
(x1,y1) (xn,yn)
  • Learning a prediction system, usually based on
    examples
  • Training examples are usually limited
  • Cost of obtaining high-quality examples
  • Complexity of the prediction problem

Learning
Y
X
4
Introduction
(x1,y1) (xn,yn)

  • Solution exploit extra information about the
    input and output space
  • Improve the prediction performance
  • Reduce the cost for collecting training examples

Learning
Y
X
5
Introduction
(x1,y1) (xn,yn)
?
?
  • Solution exploit extra information about the
    input and output space
  • Representation and discovery?
  • Incorporation?

Learning
Y
X
6
Outline
Part I Encoding Input Information by
Regularization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
7
Regularization
  • The general formulation
  • Ridge regression
  • Lasso

8
Outline
Part I Encoding Input Information by
Regularization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
9
Learning with unlabeled text
  • For a text classification task
  • ? plenty of unlabeled text on the Web
  • ? seemingly unrelated to the task
  • What can we gain from such unlabeled text?

Yi Zhang, Jeff Schneider and Artur Dubrawski.
Learning the Semantic Correlation An Alternative
Way to Gain from Unlabeled Text. NIPS 2008
10
A motivating example for text learning
  • Humans learn text classification effectively!
  • Two training examples
  • gasoline, truck
  • - vote, election
  • Query
  • gallon, vehicle
  • Seems very easy! But why?

11
A motivating example for text learning
  • Humans learn text classification effectively!
  • Two training examples
  • gasoline, truck
  • - vote, election
  • Query
  • gallon, vehicle
  • Seems very easy! But why?
  • Gasoline gallon, truck vehicle

12
A covariance operator for regularization
  • Covariance structure of model coefficients
  • Usually unknown -- learn from unlabeled text?

13
Learning with unlabeled text
  • Infer the covariance operator
  • Extract latent topics from unlabeled text (with
    resampling)
  • Observe the contribution of words in each topic
  • gas 0.3, gallon 0.2, truck 0.2, safety
    0.2,
  • Estimate the correlation (covariance) of
    words

14
Learning with unlabeled text
  • Infer the covariance operator
  • Extract latent topics from unlabeled text (with
    resampling)
  • Observe the contribution of words in each topic
  • gas 0.3, gallon 0.2, truck 0.2, safety
    0.2,
  • Estimate the correlation (covariance) of
    words
  • For a new task, we learn with regularization

15
Experiments
  • Empirical results on 20 newsgroups
  • 190 1-vs-1 classification tasks, 2 labeled
    examples
  • For any task, majority of unlabeled text (18/20)
    is irrelevant
  • Similar results on logistic regression and least
    squares

1 V. Sindhwani and S. Keerthi. Large scale
semi-supervised linear svms. In SIGIR, 2006
16
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
17
Multi-task learning
  • Different but related prediction tasks
  • An example
  • Landmine detection using radar images
  • Multiple tasks different landmine fields
  • Geographic conditions
  • Landmine types
  • Goal information sharing among tasks

18
Regularization for multi-task learning
  • Our approach view MTL as estimating a parameter
    matrix

W
19
Regularization for multi-task learning
  • Our approach view MTL as estimating a parameter
    matrix
  • A covariance operator for regularizing a matrix?
  • Vector w
  • Matrix W

W
(Gaussian prior)
?
Yi Zhang and Jeff Schneider. Learning Multiple
Tasks with a Sparse Matrix-Normal Penalty. NIPS
2010
20
Matrix-normal distributions
  • Consider a 2 by 3 matrix W
  • The full covariance Kronecker product of
    and

full covariance
row covariance
column covariance

21
Matrix-normal distributions
  • Consider a 2 by 3 matrix W
  • The full covariance Kronecker product of
    and
  • The matrix-normal density offers a compact form
    for

full covariance
row covariance
column covariance

22
Learning with a matrix-normal penalty
  • Joint learning of multiple tasks
  • Alternating optimization

Matrix-normal prior
23
Learning with a matrix-normal penalty
  • Joint learning of multiple tasks
  • Alternating optimization
  • Other recent work as variants of special cases
  • Multi-task feature learning Argyriou et al, NIPS
    06 learning with the feature covariance
  • Clustered multi-task learning Jacob et al, NIPS
    08 learning with the task covariance and
    spectral constraints
  • Multi-task relationship learning Zhang et al,
    UAI 10 learning with the task covariance

Matrix-normal prior
24
Sparse covariance selection
  • Sparse covariance selection in matrix-normal
    penalties
  • Sparsity of
  • Conditional independence of rows (tasks) and
    columns (feature dimensions) of W

25
Sparse covariance selection
  • Sparse covariance selection in matrix-normal
    penalties
  • Sparsity of
  • Conditional independence of rows (tasks) and
    columns (feature dimensions) of W
  • Alternating optimization
  • Estimating W same as before
  • Estimating and L-1 penalized
    covariance estimation

26
Results on multi-task learning
  • Landmine detection multiple landmine
    fields
  • Face recognition multiple 1-vs-1 tasks

1 Jacob, Bach, and Vert. Clustered multi-task
learning A convex formulation. NIPS, 2008
2 Argyriou, Evgeniou, and Pontil. Multi-task
feature learning, NIPS 2006
27
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
28
Learning compressible models
  • Learning compressible models
  • A compression operator P instead of
  • Bias model compressibility

Yi Zhang, Jeff Schneider and Artur Dubrawski.
Learning Compressible Models. SDM 2010
29
Energy compaction
  • Image energy is concentrated at a few frequencies

JPEG (2D-DCT), 46 1 compression
30
Energy compaction
  • Image energy is concentrated at a few frequencies
  • Models need to operate at relevant frequencies

JPEG (2D-DCT), 46 1 compression
2D-DCT
31
Digit recognition
  • Sparse vs. compressible
  • Model coefficients w

sparse vs compressible
sparse vs compressible
sparse vs compressible
coefficients w
compressed coefficients Pw
coefficients w as an image
32
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
33
Dimension reduction
  • Dimension reduction conveys information about the
    input space
  • Feature selection ? importance
  • Feature clustering ? granularity
  • Feature extraction ? more general structures

34
How to use a dimension reduction?
  • However, any reduction loses certain information
  • May be relevant to a prediction task
  • Goal of projection penalties
  • Encode useful information from a dimension
    reduction
  • Control the risk of potential information loss

Yi Zhang and Jeff Schneider. Projection Penalty
Dimension Reduction without Loss. ICML 2010
35
Projection penalties the basic idea
  • The basic idea
  • Observation reduce the feature space ? restrict
    the model search to a model subspace MP
  • Solution still search in the full model space M,
    and penalize the projection distance to the model
    subspace MP

36
Projection penalties linear cases
  • Learn with a (linear) dimension reduction P

37
Projection penalties linear cases
  • Learn with projection penalties
  • Optimization

projection distance
38
Projection penalties nonlinear cases
w
MP
M
P
wP
Rd
Rp
P
?
F
F
X
P
?
F
F
Yi Zhang and Jeff Schneider. Projection Penalty
Dimension Reduction without Loss. ICML 2010
39
Projection penalties nonlinear cases
w
MP
M
P
wP
Rd
Rp
M
w
MP
P
wP
F
F
X
w
MP
M
P
wP
F
F
Yi Zhang and Jeff Schneider. Projection Penalty
Dimension Reduction without Loss. ICML 2010
40
Empirical results
  • Text classification (20 newsgroups), using
    logistic regression
  • Dimension reduction latent Dirichlet allocation

Classification Errors
Projection Penalty
Projection Penalty
Original
Original
Reduction
Reduction
41
Empirical results
  • Text classification (20 newsgroups), using
    logistic regression
  • Dimension reduction latent Dirichlet allocation

Classification Errors
Similar results on face recognition, using SVM
(poly-2) Dimension reduction KPCA, KDA,
OLaplacian Face Similar results on house price
prediction, using regression Dimension reduction
PCA and partial least squares
42
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
43
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
44
Multi-label classification
  • Multi-label classification
  • Existence of certain label dependency
  • Example classify an image into scenes (deserts,
    river, forest, etc)
  • Multi-class problem is a special case only one
    class is true

Label dependency
Learn to predict

x
yq
y2
y1
45
Output coding
  • d lt q compression, i.e., source coding
  • d gt q error-correcting codes, i.e., channel
    coding
  • Use the redundancy to correct prediction
    (transmission) errors

Learn to predict

x
z
z1
zd
z2
z3
encoding
decoding

yq
y2
y1
y
46
Error-correcting output codes (ECOCs)
  • Multi-class ECOCs Dietterich Bakiri, 1994
    Allwein, Schapire Singer 2001
  • Encode into a (redundant) set of binary problems
  • Learn to predict the code
  • Decode the predictions
  • Our goal design ECOCs for multi-label
    classification

y1
y2 vs. y3
y3,y4 vs. y7
Learn to predict


x
zt
z2
z1
encoding
decoding

yq
y2
y1
47
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
48
Composite likelihood
  • The composite likelihood (CL) a partial
    specification of the likelihood as the product of
    simple component likelihoods
  • e.g., pairwise likelihood
  • e.g., full conditional likelihood
  • Estimation using composite likelihoods
  • Computational and statistical efficiency
  • Robustness under model misspecification

49
Multi-label problem decomposition
  • Problem decomposition methods
  • Decomposition into subproblems (encoding)
  • Decision making by combining subproblem
    predictions (decoding)
  • Examples 1-vs-all, 1-vs-1, 1-vs-1 1-vs-all,
    etc





Learn to predict
x

yq
y2
y1
50
1-vs-All (Binary Relevance)
Independently
  • Classify each label independently
  • The composite likelihood view

Learn to predict

x
yq
y2
y1
51
Pairwise label ranking 1

y1 vs. y2
yq-1 vs. yq
y1 vs. y3
Learn to predict
x
  • 1-vs-1 method (a.k.a. pairwise label ranking)
  • Subproblems pairwise label comparisons
  • Decision making label ranking by counting the
    number winning comparisons, and thresholding


yq
y2
y1
1 Hullermeier et. al. Artif. Intell., 2008
52
Pairwise label ranking 1

y1 vs. y2
yq-1 vs. yq
y1 vs. y3
Learn to predict
x
  • 1-vs-1 method (a.k.a. pairwise label ranking)
  • Subproblems pairwise label comparisons
  • Decision making label ranking by counting the
    number winning comparisons, and thresholding
  • The composite likelihood view


yq
y2
y1
1 Hullermeier et. al. Artif. Intell., 2008
53
Calibrated label ranking 2

y1 vs. y2
yq-1 vs. yq
y1 vs. y3
Learn to predict
  • 1-vs-1 1-vs-all (a.k.a. calibrated label
    ranking)
  • Subproblems 1-vs-1 1-vs-all
  • Decision making label ranking, and a smart
    thresholding based on 1-vs-1 and 1-vs-all
    predictions

x
Learn to predict

yq
y2
y1
2 Furnkranz et. al. MLJ, 2008
54
Calibrated label ranking 2

y1 vs. y2
yq-1 vs. yq
y1 vs. y3
Learn to predict
  • 1-vs-1 1-vs-all (a.k.a. calibrated label
    ranking)
  • Subproblems 1-vs-1 1-vs-all
  • Decision making label ranking, and a smart
    thresholding based on 1-vs-1 and 1-vs-all
    predictions
  • The composite likelihood view

x
Learn to predict

yq
y2
y1
2 Furnkranz et. al. MLJ, 2008
55
A composite likelihood view
  • A composite likelihood view for problem
    decomposition
  • Choice of subproblems ? specification of a
    composite likelihood?
  • Decision making ? inference on the composite
    likelihood?





Learn to predict
x

yq
y2
y1
56
A composite pairwise coding
  • Subproblems individual and pairwise label
    densities
  • conveys more information
    than

yi0, yj0 yi0, yj1
yi1, yj0 yi1, yj1
yi0, yj0 yi0, yj1
yi1, yj0 yi1, yj1
Yi Zhang and Jeff Schneider. A Composite
Likelihood View for Multi-Label Classification.
AISTATS 2012
57
A composite pairwise coding
  • Decision making a robust mean-field
    approximation
  • is not robust to
    underestimation of label densities

Yi Zhang and Jeff Schneider. A Composite
Likelihood View for Multi-Label Classification.
AISTATS 2012
58
A composite pairwise coding
  • Decision making a robust mean-field
    approximation
  • is not robust to
    underestimation of label densities
  • A composite divergence, robust and efficient to
    optimize

Yi Zhang and Jeff Schneider. A Composite
Likelihood View for Multi-Label Classification.
AISTATS 2012
59
Data sets
  • The Scene data
  • Image ? scenes (beach, sunset, fall foliage,
    field, mountain and urban)

  • ? beach, urban

Boutell et. al., Pattern Recognition 2004
60
Data sets
  • The Emotion data
  • Music ? emotions (amazed, happy, relaxed, sad,
    etc)
  • The Medical data
  • Clinical text ? medical categories (ICD-9-CM
    codes)
  • The Yeast data
  • Gene ? functional categories
  • The Enron data
  • Email ? tags on topics, attachment types, and
    emotional tones

61
Empirical results
  • Similar results on other data sets (emotions,
    medical, etc)

1 Hullermeier et. al. Label ranking by learning
pairwise preferences. Artif. Intell., 2008
2 Furnkranz et. al. Multi-label classification
via calibrated label ranking. MLJ, 2008
3 Read et. al. Classifier chains for
multi-label classification. ECML, 2009
4 Tsoumak et. al. Random k-labelsets an
ensemble method for multilabel classification.
ECML, 2007
5 Zhang et. al. Multi-label learning by
exploiting label dependency. KDD, 2010
62
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
problem-dependent coding and code predictability
Composite likelihood for pairwise coding
Multi-label output codes with CCA
Maximum-margin output coding
63
Multi-label output coding
  • Design output coding to multi-label problems
  • Problem-dependent encodings to exploit label
    dependency
  • Code predictability
  • Propose multi-label ECOCs via CCA

?
?
?
Learn to predict


x
zt
z2
z1
encoding
decoding

yq
y2
y1
64
Canonical correlation analysis
  • Given , CCA finds
    projection directions
  • with maximum correlation

65
Canonical correlation analysis
  • Given , CCA finds
    projection directions
  • with maximum correlation
  • Also known as the most predictable criterion
  • CCA finds most predictable directions v in the
    label space

66
Multi-label ECOCs using CCA
  • Encoding and learning
  • Perform CCA
  • Code includes both original labels and label
    projections

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
Yi Zhang and Jeff Schneider. Multi-label Output
Codes using Canonical Correlation Analysis.
AISTATS 2011
67
Multi-label ECOCs using CCA
  • Encoding and learning
  • Perform CCA
  • Code includes both original labels and label
    projections
  • Learn classifiers for original labels
  • Learn regression for label projections

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
Yi Zhang and Jeff Schneider. Multi-label Output
Codes using Canonical Correlation Analysis.
AISTATS 2011
68
Multi-label ECOCs using CCA
  • Decoding
  • Classifiers Bernoulli on q
    original labels
  • Regression Gaussian on d
    label projections

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
Yi Zhang and Jeff Schneider. Multi-label Output
Codes using Canonical Correlation Analysis.
AISTATS 2011
69
Multi-label ECOCs using CCA
  • Decoding
  • Classifiers Bernoulli on q
    original labels
  • Regression Gaussian on d
    label projections
  • Mean-field approximation

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
Yi Zhang and Jeff Schneider. Multi-label Output
Codes using Canonical Correlation Analysis.
AISTATS 2011
70
Empirical results
  • Similar results on other criteria (macro/micro
    F-1 scores)
  • Similar results on other data (emotions)
  • Similar results on other base learners (decision
    trees, SVMs)

1 Furnkranz et. al. Multi-label classification
via calibrated label ranking. MLJ, 2008
2 D. Hsu, et. al. Multi-label prediction via
compressed sensing. NIPS, 2009
3 Zhang and Schneider. A composite likelihood
view for multi-label classification. AISTATS 2012
71
Outline
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
problem-dependent coding and code predictability
Composite likelihood for pairwise coding
Discriminative and predictable codes
Multi-label output codes with CCA
Maximum-margin output coding
72
Recall coding with CCA
  • CCA finds label projections z that are most
    predictable
  • Low transmission errors in channel coding

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
z
predict
x
73
A recent paper 1 coding with PCA
  • Label projections z obtained by PCA
  • z has maximum sample variance, i.e., far away
    from each other.
  • Minimum code distance?

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
z
1 Tai and Lin, 2010
74
Goal predictable and discriminative codes
  • Predictable prediction is closed to the correct
    codeword
  • Discriminative prediction is far away from
    incorrect codewords

Learn to predict


x
z
z1
zd
yq
y2
y1
encoding
decoding

yq
y2
y1
y
z
predict
x
75
Maximum margin output coding
  • A max-margin formulation

z
predict
x
76
Maximum margin output coding
  • A max-margin formulation
  • Assume M is best linear predictor (in closed form
    of X, Y, V)
  • Reformulate using metric learning
  • Deal with the exponentially large number of
    constraints
  • The cutting plane method
  • Overgenerating

77
Maximum margin output coding
  • A max-margin formulation
  • Assume M is best linear predictor, and define

78
Maximum margin output coding
  • A max-margin formulation
  • Metric learning formulation define the
    Mahalanobis metric
  • and the notation

79
Maximum margin output coding
  • The metric learning problem
  • An exponentially large number of constraints
  • Cutting plane method? No polynomial-time
    separation oracle!

80
Maximum margin output coding
  • The metric learning problem
  • An exponentially large number of constraints
  • Cutting plane method? No polynomial-time
    separation oracle!
  • Cutting plane method with overgenerating
    (relaxation)
  • Relax into
  • Linearize for the relaxed domain
  • New separation oracle box-constrained QP

81
Empirical results
  • Similar results on other data (emotions and
    medical)

1 Furnkranz et. al. Multi-label classification
via calibrated label ranking. MLJ, 2008
2 Zhang et. al. Multi-label learning by
exploiting label dependency. KDD, 2010
3 D. Hsu, et. al. Multi-label prediction via
compressed sensing. NIPS, 2009
4 Tai and Lin. Multi-label Classification with
Principal Label Space Transformation. Neur. Comp.
5 Zhang and Schneider. Multi-label output codes
via canonical correlation analysis, AISTATS 2011
82
Conclusion
  • Regularization to exploit input information
  • Semi-supervised learning with word correlation
  • Multi-task learning with a matrix-normal penalty
  • Learning compressible models
  • Projection penalties for dimension reduction
  • Output coding to exploit output information
  • Composite pairwise coding
  • Coding via CCA
  • Coding via max-margin formulation
  • Future

83
Thank you! Questions?
Part I Encoding Input Information by
Regularization
Multi-task generalization
Learning with word correlation
A matrix-normal penalty for multi-task learning
Go beyond covariance and correlation structures
Encode a dimension reduction
Learn compressible models
Projection penalties
Part II Encoding Output Information by Output
Codes
problem-dependent coding and code predictability
Composite likelihood for pairwise coding
Discriminative and predictable codes
Multi-label output codes with CCA
Maximum-margin output coding
84
(No Transcript)
85
(No Transcript)
86
Local smoothness
  • Smoothness of model coefficients
  • Key property certain order of derivatives are
    sparse

Differentiation operator
87
Brain computer interaction
  • Classify Electroencephalography (EEG) signals
  • Sparse models vs. piecewise
    smooth models

88
Projection penalties linear cases
  • Learn a linear model with a given linear
    reduction P

89
Projection penalties linear cases
  • Learn a linear model with a given linear
    reduction P

90
Projection penalties linear cases
  • Learn a linear model with projection penalties

projection distance
91
Projection penalties RKHS cases
M
w
  • Learning in RKHS with projection penalties
  • Primal
  • Solve for in the dual (see the next page)
  • Solve for v and b in the primal

MP
P
wP
F
F
X
92
Projection penalties RKHS cases
M
w
  • Representer theorem for
  • Dual

MP
P
wP
F
F
X
93
Projection penalties nonlinear cases
  • Learning linear models
  • Learning RKHS models

P
P
P
F
Rd
F
F
Rp
F
X
?
P(xi)
94
Empirical results
  • Face recognition (Yale), using SVM (poly-2)
  • Dimension reduction KPCA, KDA, OLaplacian

Classification Errors
95
Empirical results
  • Face recognition (Yale), using SVM (poly-2)
  • Dimension reduction KPCA, KDA, OLaplacian

Classification Errors
96
Empirical results
  • Face recognition (Yale), SVM (poly-2)
  • Dimension reduction KPCA, KDA, OLaplacian

Classification Errors
97
Empirical results
  • Price forecasting (Boston house), ridge
    regression
  • Dimension reduction partial least squares

1-R2
98
Binary relevance
Independently
  • Binary relevance (a.k.a. 1-vs-all)
  • Subproblems classify each label independently
  • Decision making same
  • Assume no label dependency

Learn to predict

x
yq
y2
y1
99
Binary relevance
Independently
  • Binary relevance (a.k.a. 1-vs-all)
  • Subproblems classify each label independently
  • Decision making same
  • Assume no label dependency
  • The composite likelihood view

Learn to predict

x
yq
y2
y1
100
Empirical results
  • Emotion data (classify music to different
    emotions)
  • Evaluation measure subset accuracy

1 Furnkranz et. al. Multi-label classification
via calibrated label ranking. MLJ, 2008
2 D. Hsu, et. al. Multi-label prediction via
compressed sensing. NIPS, 2009
Write a Comment
User Comments (0)
About PowerShow.com