Learning to Predict Combinatorial Structures Shankar Vembu Joint work with Thomas G

About This Presentation

Title:

Learning to Predict Combinatorial Structures Shankar Vembu Joint work with Thomas G

Description:

Is it possible to learn structured prediction models without using ... Digraphs (-1,0,1) adjacency matrix. 37. Dicycles. 38. and others... Ordinal regression ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 46

Provided by: wro4

Category:

more less

Transcript and Presenter's Notes

Title: Learning to Predict Combinatorial Structures Shankar Vembu Joint work with Thomas G

1
Learning to Predict Combinatorial
StructuresShankar VembuJoint work with Thomas
Gärtner
2
Learning Structured Prediction Models

Exact vs. Approximate inference

3
Learning Structured Prediction Models

Is it possible to learn structured prediction
models without using any inference algorithm?

4
Learning Structured Prediction Models

Is it possible to learn structured prediction
models without using any inference algorithm?
Yes For combinatorial structures

5
Combinatorial Structures

Partially ordered sets
Permutations (label ranking)
Directed cycles
Graphs
Multiclass, multilabel, ordinal, hierarchical
classification

6
Outline

Structured prediction and limitations of existing
models
Training combinatorial structures
Constructing combinatorial structures
Application settings

7
Structured Prediction

Input and output spaces
Training data
Joint scoring function
Prediction

8
Discriminative Structured Prediction
exponential number of constraints ?
9
Assumptions

Decoding (polynomial time)
given h, x find
Separation (polynomial time)
given h, x, y find any
or prove that none exists
Optimality (in NP)
given h, x, y decide

decoding is the strongest assumption and
optimality is the weakest
10
Optimality vs. Non-optimality

optimality
given h, x, y decide
non-optimality
given h, x, y decide

11
Dicycles - Optimality vs. Non-optimality

optimality - is there no longer cycle?
given h, x, y decide
non-optimality - is there any longer cycle?
given h, x, y decide

12
Dicycles - Optimality vs. Non-optimality

optimality - is there no longer cycle?
given h, x, y decide
non-optimality - is there any longer cycle?
given h, x, y decide

Proposition The non-optimality problem for
directed cycles is NP-complete
13
Optimality vs. Non-optimality
Optimality
we are interested mostly in problems where
non-optimality is NP-complete
14
Training Combinatorial Structures
15
Loss Functions

AUC-loss

16
Loss Functions

AUC-loss
Exponential loss

17
Loss Functions

AUC-loss
Exponential loss
2nd-order Taylor expansion at 0

18
Regularised Risk Minimisation

Assumption 1 Tensor product of Hilbert spaces
representer theorem
Assumption 2
has bases

19
Regularised Risk Minimisation
20
Optimisation with Finite Output Embedding
Let Using the canonical orthonormal bases of
, optimise
21
Recipe for Training Combinatorial Structures
22
Recipe for Training Combinatorial Structures

Finite dimensional output embedding

23
Recipe for Training Combinatorial Structures

Finite dimensional output embedding

Polynomial-time computation of

24
Recipe for Training Combinatorial Structures

Finite dimensional output embedding

Polynomial-time computation of

Polynomially-sized unconstrained quadratic program

25
Constructing Combinatorial Structures
26
Approximation Algorithms

Decoding
given h, x find
Approximate decoding
given h, x find any

27
Approximation Measure

Maximisation problem, Approximation factor 0.65

c(s)
c(s_min)
c(s_max)
0
1
0.65
0.5
28
z-approximation

z-approximation is better suited when negative
solutions are possible. It is invariant to (i)
constant offsets, (ii) changing from max to
min, and (iii) using the complement of binary
variables
29
Decoding Sibling Systems

Consider a set system
with a sibling function and
an output map such that

30
Decoding Sibling Systems

Consider a set system
with a sibling function and
an output map such that

Theorem There is a 1/2-factor z-approximation
algorithm for decoding sibling systems
31
Decoding Independence Systems

Consider a set system
with

32
Decoding Independence Systems

Consider a set system
with
Theorem There is a
factor z-approximation algorithm for decoding
independence systems

33
Application settings
34
Multiclass
Example
Decoding is trivial
35
Multilabel
Decoding
36
Dicycles

Digraphs
(-1,0,1) adjacency matrix

37
Dicycles
38
and others

Ordinal regression
Hierarchical classification
Partially ordered sets
Permutations
Graphs

39
Experiments
40
Multilabel Classification

Yeast dataset 1500 training, 917 test, 14
labels
Comparison with multi-label SVM

41
Hierarchical Classification

WIPO-alpha data set 1352 training, 358 test
Number of nodes -188, max. depth - 3

LOSS SVM H-SVM H-RLS H-M3 Hamming H-M3 Tree CSOP
0-1 87.2 76.2 72.1 70.9 65 51.1
Hamming 1.84 1.74 1.69 1.67 1.73 1.84
Tree 0.053 0.051 0.05 0.05 0.048 0.046
42
Dicycle Policy Estimation