Kernel Matching Reduction Algorithms - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Kernel Matching Reduction Algorithms

Description:

KMP appends kernel functions iteratively to the classification model. By contrary, KMRAs reduce the size of the dictionary step by step, ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 45
Provided by: sistSwjt3
Category:

less

Transcript and Presenter's Notes

Title: Kernel Matching Reduction Algorithms


1
  • Kernel Matching Reduction Algorithms
  • for Classification
  • Jianwu Li and Xiaocheng Deng
  • Beijing Institute of Technology

2
Introduction
  • Kernel-based pattern classification techniques
  • Support vector machines (SVM)
  • Kernel linear discriminant analysis (KLDA)
  • Kernel Perceptrons

3
Introduction
  • Support vector machines (SVM)
  • Structural risk minimization (SRM)
  • Maximum margin classification
  • Quadratic optimization problem
  • Kernel trick

4
Introduction
  • Support vector machines (SVM)
  • Support vectors (SV)
  • Sparse solutions

5
Introduction
  • Kernel matching pursuit (KMP)
  • KMP appends functions to an initially empty basis
    sequentially, from a redundant dictionary of
    functions, to approximate a classification
    function by using a certain loss criterion.
  • KMP can produce much sparser models than SVMs.

6
Introduction
  • Kernel Matching Reduction Algorithms (KMRAs)
  • Inspired by KMP and SVMs, we propose kernel
    matching reduction algorithms.
  • Different from KMP, kernel matching reduction
    algorithms (KMRAs), are proposed to perform a
    reverse procedure in this paper.

7
Introduction
  • Kernel Matching Reduction Algorithms (KMRAs)
  • Firstly, all training examples are selected to
    construct a function dictionary.
  • Then the function dictionary is reduced
    iteratively by linear support vector machines
    (SVMs).
  • During the reduction process, the parameters of
    the functions in the dictionary can be adjusted
    dynamically.

8
Kernel Matching Reduction Algorithms
  • Constructing a Kernel-Based Dictionary
  • For a binary classification problem, assume there
    exist l training examples, which form the
    training set S (x1, y1),
    (x2, y2), . . . , (xl, yl),
  • where xi ? Rd, yi ?-1, 1, and yi represents
    the class label of the point xi, i 1, 2, . . .
    , l.

9
Kernel Matching Reduction Algorithms
  • Constructing a Kernel-Based Dictionary
  • Given a kernel function K Rd Rd ? R, similar
    to KMP, we use kernel functions, centered on the
    training points, as our dictionary
  • D K(x, xi)i 1, . . . , l.

10
Kernel Matching Reduction Algorithms
  • Constructing a Kernel-Based Dictionary
  • Here, the Gaussian kernel function is selected

11
Kernel Matching Reduction Algorithms
  • Constructing a Kernel-Based Dictionary
  • The value of si should be set to keep the
    influence of the local domain around xi and
    prevent xi from having a high activation for the
    field far from xi.

12
Kernel Matching Reduction Algorithms
  • Constructing a Kernel-Based Dictionary
  • Therefore, we adopt the following heuristic
    method
  • Where are p nearest neighbors of xi. Such,
    the receptive width of each point is determined
    to cover a certain region in the sample space.

13
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • Using all the kernel functions from the
    kernel-based dictionary D K(x, xi)i 1, . . .
    , l, we construct a mapping from original space
    to feature space.
  • Any training example xi in S is mapped to a
    corresponding point zi in S, where zi (K(xi,
    x1),K(xi, x2), . . . , K(xi, xl)).
  • The training set S (x1, y1), (x2, y2), . . . ,
    (xl, yl) in original space is mapped to S (z1,
    y1), (z2, y2), . . . , (zl, yl) in feature
    space.

14
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • we design a linear decision function gl(zt)
    sign(fl(zt)) in feature space, and
  • which corresponds to the nonlinear form in
    original space
  • where w (w1, w2, . . . , wl) represents
    weights of every dimension in z.

15
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • We can decide which kernel functions are
    important for classification, and which are not,
    according to their weights magnitudes wi in (3)
    or (4), where wi denotes the absolute value of
    wi. Those redundant kernel functions, which have
    lowest weights magnitudes, can be deleted from
    the dictionary to reduce the model.

16
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • If we use the usual least squares error criterion
    to find this function, it is not practical, since
    the number of training examples, at the
    beginning, is equal to, or near to, the dimension
    number of the feature space S, and we will
    confront the problem of the not-invertible matrix.

17
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • In fact, support vector machines (SVMs), based on
    the structural risk minimization, are fit for
    solving supervised classification problems with
    high dimensions. we also adopt linear SVMs to
    find the classification function in (3) or (4) on
    S.

18
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • The optimization objective of linear SVMs is to
    minimize
  • subject to the constraints
  • yi(w zi) b 1 - ?i, and ?i 0, i 1, 2,
    , l ,

19
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • wi denotes the contribution of zi to the
    classifier in
  • (3), and the higher the value of wi, the
    more contribution of zi to the model.
  • Consequently, we can rank zi according to the
    values of wi (i 1, 2, , l) from large to
    small. We can also rank xi by wi, because xi is
    the preimage of zi in the original space.

20
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • The xi with the smallest wi can be deleted from
    the dictionary D, and D can be reduced to D.
  • Then we can continue this procedure on the new
    dictionary D. Thus, the process can be
    iteratively performed until a given stop
    criterion is satisfied.
  • Note that, each s should be computed again on the
    new dictionary D, according to (2), after D is
    reduced to D every time, such that the receptive
    widths of kernel functions in D can always cover
    the whole sample space.

21
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • We can set a tolerant minimum accuracy d for the
    training examples, as the termination criterion
    of this procedure.
  • We expect to gain a simplest model under the
    condition of guaranteeing the satisfied
    classification accuracy for all training
    examples.
  • This idea accords with the principles of minimum
    description length and Occams Razor.
  • Therefore, this algorithm can be expected to have
    a good generalization ability.

22
Kernel Matching Reduction Algorithms
  • Reducing the Kernel-Based Dictionary by Linear
    SVMs
  • Different from KMP which appends kernel functions
    to the last model gradually, this reduction
    strategy can expect to avoid local optima, just
    due to deleting redundant functions from the
    functions dictionary iteratively.

23
Kernel Matching Reduction Algorithms
  • The Detailed Procedure of KMRAs
  • Step 1, Set the parameter p in (2), the cross
    validation fold number v for determining C in
    (5), and the required classification accuracy d
    on the training examples.
  • Step 2, Input training examples S (x1, y1),
    (x2, y2), . . . , (xl, yl).
  • Step 3, Compute each s by the equation (2), and
    construct the kernel-based dictionary D K(x,
    xi)i 1, . . . , l.

24
Kernel Matching Reduction Algorithms
  • The Detailed Procedure of KMRAs
  • Step 4, Transform S to S by the dictionary D.
  • Step 5, Determine C by v-fold cross validation.
  • Step 6, Train the linear SVM with the penalty
    factor C on S, and obtain the classification
    model, including wi, i 1, 2, . . . , l.
  • Step 7, Rank xi by their weights magnitudes wi,
    i 1, 2, . . . , l.

25
Kernel Matching Reduction Algorithms
  • The Detailed Procedure of KMRAs
  • Step 8, If the classification accuracy of this
    model for training data is higher than d, delete
    from D the K(x, xi) which has the smallest wi,
    then adjust each s for new D by (2), and go to
    Step 4 Otherwise go to Step 9.
  • Step 9, Output the classification model, which
    satisfies the accuracyd with the simplest
    structure.

26
Kernel Matching Reduction Algorithms
  • The Detailed Procedure of KMRAs
  • The reduction step 8 can be generalized to remove
    more than one basis function per iteration for
    improving the training speed.

27
Comparing with Other Machine Learning Algorithms
  • Although KMRAs, KMP, SVMs, HSSVMs, and RBFNNs can
    all generate a similar decision function shape as
    the equation (4), KMRAs have distinct
    characteristics, in the essence, compared with
    several other algorithms.

28
Comparing with Other Machine Learning Algorithms
  • Differences with KMP
  • Both KMRA and KMP build kernel-based
    dictionaries, but they adopt different ways to
    select basis functions for last solutions. KMP
    appends kernel functions iteratively to the
    classification model. By contrary, KMRAs reduce
    the size of the dictionary step by step, by
    deleting redundant kernel functions.
  • Moreover, different from KMP, KMRAs utilize
    linear SVMs to find solutions in feature space.

29
Comparing with Other Machine Learning Algorithms
  • KMRA Versus SVM
  • The main difference between KMRA and SVM consists
    in the approaches of producing feature spaces.
    KMRAs create the feature space by a kernel-based
    dictionary, whereas SVMs by kernel functions.
  • Kernel functions in SVMs must satisfy Mercers
    theorem, while KMRAs have no restrictions on
    kernel functions in the dictionary . The
    comparison between KMRAs and SVMs is similar to
    that between KMP and SVM. In fact, we select
    Gaussian kernel functions in this paper, which
    can have different kernel widths obtained by the
    equation (2), but those Gaussian kernel
    functions, for all support vectors of SVMs, have
    the same kernel width.

30
Comparing with Other Machine Learning Algorithms
  • Linking with HSSVMs
  • Hidden space support vector machines (HSSVMs),
    also map input patterns into a high-dimensional
    hidden space by a set of nonlinear functions, and
    then train linear SVMs in the hidden space. From
    this viewpoint of constructing feature spaces and
    performing linear SVMs, KMRAs are similar to
    HSSVMs. But we adopt an iterative procedure to
    eliminate redundant kernel functions, until
    obtaining a condense solution.
  • KMRAs can be considered as an improved version of
    HSSVMs.

31
Comparing with Other Machine Learning Algorithms
  • Relation with RBFNNs
  • Although RBFNNs also build feature spaces using
    usually Gaussian kernel functions, they create
    discrimination functions in the least square
    sense. However, KMRAs use linear SVMs, i.e. the
    idea of structural risk minimization, to find
    solutions.
  • In a broad sense, we can think of KMRAs as a
    special model of RBFNNs with a new configuration
    design strategy.

32
Experiments
  • Description on Data Sets and Parameter Settings
  • We compare KMRAs with SVMs, on four datasets
    Wisconsin Breast Cancer, Pima Indians Diabetes,
    Heart, and Australian, in which the former two
    are from the UCI machine learning databases, and
    the latter two from the Statlog database.
  • We directly use the LIBSVM software package for
    performing the normal SVM.

33
Experiments
  • Description on Data Sets and Parameter Settings
  • Throughout the experiments
  • 1. All training data and test data are normalized
    to -1, 1.
  • 2. Two-thirds of examples are randomly selected
    as training examples, and the remaining one-third
    as test those.
  • 3. Gaussian kernel functions are chosen for SVMs,
    in which the kernel width s and the penalty
    parameter C are decided by ten-fold cross
    validation on the training set.
  • 4. p 2, in equation (2), is adopted.
  • 5. v 5, in Step 5 of algorithm KMRA, is set.
  • 6. For any dataset, SVM is firstly trained, and
    then according to the classification accuracy of
    SVM, we determine the stop accuracyd for KMRAs.

34
Experiments
  • Experimental Results
  • We first illustrate the results from standard
    SVMs, including their parameters C and s in Table
    1, and support vector numbers SVs, and the
    prediction accuracy in Table 2.

35
Experiments
  • Experimental Results
  • We set the termination accuracy d 0.97, 0.8,
    0.8, and 0.9 in KMRAs for these four datasets
    respectively, according to the classification
    accuracies of SVMs in Table 2.
  • We perform KMRAs on these datasets, and record
    classification accuracies for test datasets per
    iteration with algorithms running. Then we also
    show the results in Fig. 1.

36
Experiments
  • Experimental Results
  • In Fig. 1, the accuracies of SVMs on test
    examples are expressed in the thick straight
    lines, and the thin curves represent the
    classification performance of KMRAs. The row axis
    denotes iteration times of KMRAs, that is to say,
    numbers of kernel functions in the dictionary
    decrease gradually from left to right.

37
Experiments
  • Experimental Results
  • For Diabetes and Australian, we can find the
    prediction accuracies of KMRAs are improved
    gradually with kernel functions in the dictionary
    reducing. At the beginning of KMRAs runs, we can
    conclude that the overfittings happen. Before
    KMRAs end, the performance of KMRAs approaches
    to, even is superior to, that of SVMs.
  • For Breast and Heart, from the beginning to the
    end, the curves of KMRAs fluctuate up and down
    around the accuracy lines of SVMs.

38
Experiments
  • Experimental Results
  • We further illustrate, in the Table 2, the
    numbers of kernel functions (i.e. SVs), which
    appear in the last classification functions, as
    well as the corresponding prediction accuracies,
    when KMRAs terminate.
  • Moreover, we record the best performance during
    the iterative process of KMRAs, and also list
    them in the Table 2.
  • From Table 2, compared with SVMs, KMRAs use much
    sparser support vectors, whereas they can obtain
    comparable results.

39
Experiments
  • Experimental Results

40
Experiments
  • Experimental Results

41
Experiments
  • Experimental Results

42
Conclusions
  • We propose KMRAs, which delete redundant kernel
    functions from a kernel-based dictionary,
    iteratively. Therefore, we expect KMRAs can avoid
    local optima, and can have a good generalization
    ability.
  • Experimental results demonstrate that, compared
    with SVMs, KMRAs show comparable accuracies, but
    with typically much sparser representations. This
    means that KMRAs can have a fast classification
    speed for test examples than SVMs.
  • In addition, analogous to SVMs, we can extend
    KMRAs to solve multi-classification problems,
    though we only consider the two-class situation
    in this paper.

43
Conclusions
  • We can also find, KMRAs gain sparser models at
    the expense of a long training time.
    Consequently, future work should attempt to
    explore how to reduce the training cost.
  • In conclusion, KMRAs provide a new problem
    solving approach for classification.

44
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com