Learning to Predict Combinatorial Structures Shankar Vembu Joint work with Thomas G - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Learning to Predict Combinatorial Structures Shankar Vembu Joint work with Thomas G

Description:

Is it possible to learn structured prediction models without using ... Digraphs (-1,0,1) adjacency matrix. 37. Dicycles. 38. and others... Ordinal regression ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 46
Provided by: wro4
Category:

less

Transcript and Presenter's Notes

Title: Learning to Predict Combinatorial Structures Shankar Vembu Joint work with Thomas G


1
Learning to Predict Combinatorial
StructuresShankar VembuJoint work with Thomas
Gärtner
2
Learning Structured Prediction Models
  • Exact vs. Approximate inference

3
Learning Structured Prediction Models
  • Is it possible to learn structured prediction
    models without using any inference algorithm?

4
Learning Structured Prediction Models
  • Is it possible to learn structured prediction
    models without using any inference algorithm?
  • Yes For combinatorial structures

5
Combinatorial Structures
  • Partially ordered sets
  • Permutations (label ranking)
  • Directed cycles
  • Graphs
  • Multiclass, multilabel, ordinal, hierarchical
    classification

6
Outline
  • Structured prediction and limitations of existing
    models
  • Training combinatorial structures
  • Constructing combinatorial structures
  • Application settings

7
Structured Prediction
  • Input and output spaces
  • Training data
  • Joint scoring function
  • Prediction

8
Discriminative Structured Prediction
exponential number of constraints ?
9
Assumptions
  • Decoding (polynomial time)
  • given h, x find
  • Separation (polynomial time)
  • given h, x, y find any
    or prove that none exists
  • Optimality (in NP)
  • given h, x, y decide

decoding is the strongest assumption and
optimality is the weakest
10
Optimality vs. Non-optimality
  • optimality
  • given h, x, y decide
  • non-optimality
  • given h, x, y decide

11
Dicycles - Optimality vs. Non-optimality
  • optimality - is there no longer cycle?
  • given h, x, y decide
  • non-optimality - is there any longer cycle?
  • given h, x, y decide

12
Dicycles - Optimality vs. Non-optimality
  • optimality - is there no longer cycle?
  • given h, x, y decide
  • non-optimality - is there any longer cycle?
  • given h, x, y decide

Proposition The non-optimality problem for
directed cycles is NP-complete
13
Optimality vs. Non-optimality
Optimality
we are interested mostly in problems where
non-optimality is NP-complete
14
Training Combinatorial Structures
15
Loss Functions
  • AUC-loss

16
Loss Functions
  • AUC-loss
  • Exponential loss

17
Loss Functions
  • AUC-loss
  • Exponential loss
  • 2nd-order Taylor expansion at 0

18
Regularised Risk Minimisation
  • Assumption 1 Tensor product of Hilbert spaces
  • representer theorem
  • Assumption 2
    has bases

19
Regularised Risk Minimisation
20
Optimisation with Finite Output Embedding
Let Using the canonical orthonormal bases of
, optimise
21
Recipe for Training Combinatorial Structures
22
Recipe for Training Combinatorial Structures
  • Finite dimensional output embedding

23
Recipe for Training Combinatorial Structures
  • Finite dimensional output embedding
  • Polynomial-time computation of

24
Recipe for Training Combinatorial Structures
  • Finite dimensional output embedding
  • Polynomial-time computation of
  • Polynomially-sized unconstrained quadratic program

25
Constructing Combinatorial Structures
26
Approximation Algorithms
  • Decoding
  • given h, x find
  • Approximate decoding
  • given h, x find any

27
Approximation Measure
  • Maximisation problem, Approximation factor 0.65

c(s)
c(s_min)
c(s_max)
0
1
0.65
0.5
28
z-approximation

z-approximation is better suited when negative
solutions are possible. It is invariant to (i)
constant offsets, (ii) changing from max to
min, and (iii) using the complement of binary
variables
29
Decoding Sibling Systems
  • Consider a set system
  • with a sibling function and
  • an output map such that

30
Decoding Sibling Systems
  • Consider a set system
  • with a sibling function and
  • an output map such that

Theorem There is a 1/2-factor z-approximation
algorithm for decoding sibling systems
31
Decoding Independence Systems
  • Consider a set system
  • with

32
Decoding Independence Systems
  • Consider a set system
  • with
  • Theorem There is a
    factor z-approximation algorithm for decoding
    independence systems

33
Application settings
34
Multiclass
Example
Decoding is trivial
35
Multilabel
Decoding
36
Dicycles
  • Digraphs
  • (-1,0,1) adjacency matrix

37
Dicycles
38
and others
  • Ordinal regression
  • Hierarchical classification
  • Partially ordered sets
  • Permutations
  • Graphs

39
Experiments
40
Multilabel Classification
  • Yeast dataset 1500 training, 917 test, 14
    labels
  • Comparison with multi-label SVM

41
Hierarchical Classification
  • WIPO-alpha data set 1352 training, 358 test
  • Number of nodes -188, max. depth - 3

LOSS SVM H-SVM H-RLS H-M3 Hamming H-M3 Tree CSOP
0-1 87.2 76.2 72.1 70.9 65 51.1
Hamming 1.84 1.74 1.69 1.67 1.73 1.84
Tree 0.053 0.051 0.05 0.05 0.048 0.046
42
Dicycle Policy Estimation
  • Artificial setting
  • Predicting the cyclic tour of different people
  • Hidden policy for each person s/he takes the
    route that
  • maximises reward
  • Goal is to estimate the hidden policy

43
Dicycle Policy Estimation
  • Comparision with SVM-Struct using approximate
    inference

44
Ongoing Work
  • Probabilistic models
  • Negative log likelihood as loss
    function
  • Sampling techniques for combinatorial structures
  • Markov chain Monte Carlo methods
  • Provable guarantees for mixing time

45
  • THANKS!
Write a Comment
User Comments (0)
About PowerShow.com