Design Review of MPS Japan FID 109182 - PowerPoint PPT Presentation

Loading...

PPT – Design Review of MPS Japan FID 109182 PowerPoint presentation | free to download - id: 732eb8-MTc3Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Design Review of MPS Japan FID 109182

Description:

R.G. Gore, Jiang Li, Michael T. Manry, Li-Min Liu, Changhua Yu, and John Wei, – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 66
Provided by: Huag
Learn more at: http://www.uta.edu
Category:
Tags: fid | mps | design | intro | japan | network | neural | review

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Design Review of MPS Japan FID 109182


1
Optimizing Pattern Recognition Systems
Professor Mike Manry University of Texas at
Arlington
2
Outline
Introduction Feature Selection Complexity
Minimization Recent Classification
Technologies Neural Nets Support Vector
Machines Boosting A Maximal Margin
Classifier Growing and Pruning Software Examples C
onclusions
3
Intro -- Problems In Pattern Recognition Systems
Z
4
Intro -- Problems In Pattern Recognition Systems
Problems We usually dont know which block(s)
are bad Leftmost blocks may be
more expensive and difficult to change
Solution Using the system illustrated on the
next slide, we can quickly
optimize the rightmost blocks and narrow the list
of potential problems.
5
Intro -- IPNNL Optimization System (
www-ee.uta.edu/eeweb/ip/ )
Pattern Recognition Application
Dim(z) N Dim(x) N N ltlt N
IPNNL Optimization System
6
Intro -- Presentation Goals
  • Examine blocks in the Optimization System
  • Feature Selection
  • Recent Classification Technologies
  • Growing and Pruning of Neural Net Classifiers
  • Demonstrate the Optimization System on
  • Data from Bell Helicopter Textron
  • Data from UTA Bioengineering
  • Data from UTA Civil Engineering

7
Example System Text Reader
8
Feature Selection - Combinatorial Explosion
NS ( )
N N
Number of size-N subsets of N features is
  • Scanning Generate candidate subsets
  • Subset Evaluation Calculate subset goodness,
    and save the
  • best ones

9
Feature Selection Example of Combinatorial
Explosion
  • Given data for a classification problem with
  • N 90 candidate features,
  • there are 9.344 x 1017 subsets of size N17
  • and a total of 290 1.2379 x 1027 subsets

10
Feature Selection - Scanning Methods
Available Methods
Brute Force (BF) Scanning Examine every subset
(See previous slide) Branch and Bound (BB) 1
Avoids examining subsets known to be poor. Plus L
minus R (L-R) 1 Adds L good features and
eliminates the R worst features Floating
Search (FS) 2 Faster than BB. Feature
Ordering (FO) 1 Given the empty subset,
repeatedly add the best additional feature
to the subset. Also called forward selection.
Ordering Based Upon Individual Feature Goodness
(FG) Let G measure the goodness of a subset
scanning method. Then G(FG) lt G(FO) lt G(L-R) lt
G(FS) lt G(BB) G(BF)
11
Feature Selection Scanning Methods Feature
Goodness vs. Brute Force
  • Suppose that available features are
  • z1 x n, z2 x n, z3 y 2n, and z4
    n
  • where x and y are useful and n is noise.
  • FG Nested subsets z1, z1, z2 , z1, z2 , z3
  • BF Optimal subsets z2, z1, z4 , z1, z3 , z4
    since x z1 - z4 and y z3 - 2?z4
  • An optimal subset may include features that are
    useless, according to the FG approach

12
Feature Selection - Subset Evaluation Metrics
(SEMs)
  • Let x be a feature subset, and let xk be a
    candidate feature
  • for possible inclusion into x.
  • Requirements for SEM f()
  • f ( x U xk ) lt f (x), ( U denotes union )
  • f() related to classification success (Pe for
    example )
  • Example SEMs
  • Brute Force Subset Evaluation Design a good
    classifier
  • and measure Pe
  • Scatter Matrices 1

13
Feature Selection A New Approach 3
Scanning Approach Floating Search SEM
Pe for piecewise linear
classifier
At the output, the absence of groups 1, 3, and 5
reveals problems
14
Feature Selection Example 1
Classification of Numeral Images N 16
features and Nc 10
Chosen Subsets 6, 6, 9, 6, 9, 14, 6,
9, 14, 13, 6, 9, 14, 13, 3, 6, 9, 14, 13,
3, 15, 6, 9, 14, 13, 11, 15, 16, 6, 9, 14,
13, 11, 15, 16, 3, 6, 9, 14, 13, 11, 15, 16,
3, 4, 6, 9, 14, 13, 11, 15, 16, 3, 4, 1, 6,
9, 14, 13, 11, 15, 16, 3, 4, 1, 12, 6, 9, 14,
13, 11, 15, 16, 3, 4, 1, 12, 7, 6, 9, 14, 13,
11, 15, 16, 3, 4, 1, 12, 7, 8, 6, 9, 14, 13,
11, 15, 16, 3, 4, 1, 12, 7, 8, 5, 6, 9, 14,
13, 11, 15, 16, 3, 4, 1, 12, 7, 8, 5, 2, 6, 9,
14, 13, 11, 15, 16, 3, 4, 1, 12, 7, 8, 5, 2, 10,
15
Feature Selection Example 2
Classification of Sleep Apnea Data (Mohammad
Al-Abed and Khosrow Behbehani) N 90 features
and Nc 2
Chosen Subsets 11, 11, 56, 11, 56, 61,
11, 28, 55, 63, 11, 28, 55, 63, 27, 11,
28, 55, 63, 62, 17, 11, 28, 55, 63, 53, 17,
20, 11, 28, 55, 63, 53, 17, 20, 62, 11, 28,
55, 63, 53, 17, 20, 4, 40, 11, 28, 55, 63, 53,
26, 20, 4, 40, 8, 11, 28, 22, 19, 53, 26, 20,
4, 40, 30, 80, 11, 28, 22, 19, 53, 26, 20, 4,
40, 30, 80, 85, 11, 28, 22, 19, 53, 26, 20, 4,
40, 30, 80, 85, 8, 11, 28, 22, 19, 53, 26, 20,
4, 40, 30, 80, 85, 8, 48, 11, 28, 22, 19, 53,
26, 20, 4, 40, 30, 80, 85, 8, 48, 38, 11, 28,
22, 19, 53, 26, 20, 4, 40, 30, 80, 85, 8, 48, 45,
87, 11, 28, 13, 19, 53, 26, 65, 4, 40, 30, 80,
85, 8, 48, 75, 87, 18,
16
Complexity Minimization
  • Complexity Defined as the number of free
    parameters or coefficients
  • in a processor
  • Complexity Minimization (CM) Procedure
  • Minimize training set Pe for each classifier
    size, with respect to all
  • weights or coefficients. Nw is the number of
    weights or coefficients.
  • Measure Pe for a validation data set.
  • Choose that network that minimizes the
    validation sets Pe
  • CM leads to smaller, better classifiers, if it
    can be performed. It is
  • related to SRM 4,5.
  • Implication To perform SRM, we need methods for
    quickly varying
  • network size during training (growing) or
    after training (pruning)

17
Complexity Minimization
E
validation
Nw
Vary Nw until validation error is minimized.
18
Recent Classification Technologies Neural Nets
Minimize MSE E(w), where tpi 1 for the correct
class (i ic ) and 0 for an incorrect class (i
id ) . yi(x) is the ith output discriminant of
the trained classifier. Neural net support
vector x satisfies yic(x) max yid (x) 1


The MSE between yi(x) and bi(x) P(ix) is
Theorem 1 5,6 As the number of training
patterns Nv increases, the training error E(w)
approaches e(w)C, where C is a constant.
19
Recent Classification Technologies Neural Nets
  • Advantages
  • Neural net outputs approximate Bayes
    discriminant P(i x)
  • Training modifies all network weights
  • CM easily performed via growing and pruning
    methods
  • Accommodates any size training data file
  • Problems
  • E(w) and e(w) are not proportional to Pe.
  • From Theorem 1,
  • yi bi ei,
  • where ei is random zero-mean noise. Noise
    degrades performance,
  • leaving room for improvement via SVM training
    and boosting.

20
Recent Classification Technologies Support
Vector Machines 4,5
  • SVM A neural net structure with these
    properties
  • Output weights form hyperplane decision
    boundaries
  • Input vectors xp satisfying yp(ic) b and
    max y (id) -b are
  • called support vectors
  • Correctly classified input vectors xp outside
    the decision margins
  • do not adversely affect training
  • Incorrectly classified xp do not strongly
    affect training
  • In some SVMs, Nh (number of hidden units)
    initially equals Nv
  • (the number of training patterns)
  • Training may involve quadratic programming

21
Recent Classification Technologies Support Vector
Machines
x2
SV
b
b
SV
SV
SV
x1
SVM discriminant
LMS discriminant
Correctly classified patterns distort the LMS
discriminant
22
Recent Classification Technologies Support Vector
Machines
  • Advantage Good classifiers from small data
    sets
  • Problems
  • SVM methods are practical only for small data
    sets
  • Training difficult when there are many classes
  • Kernel parameter found by hit or miss
  • Fails to minimize complexity (far too many
    hidden
  • units required, hidden unit weights dont
    adapt)
  • Current Work
  • Developing SVM pruning approach
  • Developing Regression-based maximal margin
    classifiers

23
Recent Classification Technologies Boosting 5
yik(x) ith class discriminant for the kth
classifier. K is the number of classifiers being
fused. In discriminant fusion, we calculate the
ak so that the weighted average discriminant,
has better performance than the yik(x)
Adaboost 5 sequentially picks a training subset
xp, tpk, from the available data and designs
yik(x) and ak so they are functions of the
previous (k-1) classifiers.
24
Recent Classification Technologies Boosting
  • Advantage
  • Final classifier can be used to process video
    signals in real
  • time when step function activations are used
  • Problems
  • Works best for the two-class case, uses huge
    data files.
  • Final classifier is a large, highly redundant
    neural net.
  • Training can take days, CM not performed.
  • Future
    Work
  • Pruning should be tried to reduce redundancy.
  • Feature selection should be tried to speed up
    training

25
Recent Classification Technologies Problems with
Adaboost
N 3, Nc 2, K 3 (number of classifiers
being fused )
Future Work Prune ADA boosted networks
26
A Maximal Margin Classifier Problems With MSE
Type Training
The ith class neural net discriminant can be
modeled as yp(i) tp(i) ep(i) This
additive noise ep(i) degrades the performances of
regression-based classifiers, as mentioned
earlier. Correctly classified patterns
contribute to MSE, and can adversely affect
training (See following page). In other words,
regression-based training tries to force all
training patterns to be support vectors
27
A Maximal Margin Classifier Problems With MSE
Type Training
(1) Standard regression approach tries to force
all training vectors to be support vectors (2)
Red lines are counted as errors, even though
those patterns are classified more correctly than
desired (3) Outliers can distort decision
boundary locations
28
A Maximal Margin Classifier5 Existence of
Regression-Based Optimal Classifiers
Let X be the basis vector (hidden units
inputs) of a neural net. The output vector is
y WX The minimum MSE is found by solving
RW C (1)
where R EXXT and C EXtT
(2) If an optimal coefficient
matrix Wopt exists, Copt RWopt from (1), so
Copt exists. From (2), we can find Copt if the
desired output vector t is defined correctly.
Regression-based training can mimic other
approaches.
29
A Maximal Margin Classifier Regression Based
Classifier Design 7
Consider the Empirical Risk (MSE)
If yp(ic) gt tp(ic) (Correct class
discriminant large) or yp(id) lt tp(id)
(Incorrect class discriminant small) Classificati
on error decreases but E increases.
30
A Maximal Margin Classifier Regression Based
Classifier Design-Continued
The discrepancy is fixed by re-defining the
empirical risk as 
If yp(ic) gt tp(ic ) set tp(ic) yp(ic) If
yp(id) lt tp(id ) for an incorrect class id, set
tp(id) yp(id). In both cases, tp(i) is set
equal to yp(i) so no error is counted. This
algorithm partially mimics SVM training since
correctly classified patterns do not affect the
MSE.( Related to Ho-Kashyap procedures, 5
pp.249-256 )
31
A Maximal Margin Classifier Problems With MSE
Type Training
  • Recall that E counts all non-support vectors as
    erroneous

32
A Maximal Margin Classifier Errors Contributing
to E
  • In E, only errors (green lines) inside the
    margins are minimized.
  • Outliers can be eliminated.

33
A Maximal Margin Classifier Comments
  • The proposed algorithm
  •  
  • Adapts to any number of training patterns
  • Allows for any number of hidden units
  • Makes CM straightforward
  • (4) Is used to train the MLP. The resulting
    classifier is called a maximal margin classifier
    (MMC)
  • Questions
  • Does this really work ?
  • How do the MMC and SVM approaches compare ?

34
A Maximal Margin Classifier Two-Class Example
  • Numeral Classification N 16, Nc 2, Nv
    600
  • Goal Discriminate numerals 4 and 9.
  • SVM Nh gt 150, Et 4.33 and Ev 5.33
  • MMC Nh 1 , Et 1.67 and Ev 5.83
  • Comments The 2-Class SVM seems better, but the
    price is too steep.

35
A Maximal Margin Classifier Multi-Class Examples
  • Numeral Classification N 16, Nc 10, Nv
    3,000
  • SVM Nh gt 608, Ev 14.53
  • MMC Nh 32 , Ev 8.1
  • Bell Flight Condition Recognition N 24,
  • Nc 39, Nv 3,109
  • SVM Training fails
  • MMC Nh 20 , Ev 6.97
  • Nh - number of hidden units
  • Nv - number of training patterns
  • Ev validation error percentage
  • Conclusion SVMs may not work for medium and
    large size multi-class problems.

36
Growing and Pruning Candidate MLP Training Block
If complexity minimization (CM) is used, the
resulting Ef(Nh) curve is monotonic
Practical ways of approximating CM, are growing
and pruning.
37
Growing and Pruning Growing
Ef(w)
validation
Nw
Growing Starting with no hidden units,
repeatedly add Na units and train the network
some more.
Advantages Creates a monotonic Ef(Nh) curve.
Usefulness is concentrated in the first few
units added. Disadvantage Hidden units are not
optimally ordered.
38
Growing and Pruning Pruning 8
Ef(w)
validation
Nw
Pruning Train a large network. Then repeatedly
remove a less useful unit using OLS.
Advantages Creates a monotonic Ef(Nh) curve.
Hidden units are optimally ordered Disadvantag
e Usefulness is not concentrated in the first
few units
39
Growing and Pruning Pruning a Grown Network 9
Data set is for inversion of radar scattering
from bare soil surfaces. It has 20 inputs and 3
outputs
Training error for Inversion of Radar Scattering
40
Growing and Pruning Pruning a Grown Network
Validation error for Radar Scattering Inversion
41
Growing and Pruning Pruning a Grown Network
Prognostics data set for onboard flight load
synthesis (FLS) in helicopters, where we estimate
mechanical loads on critical parts, using
measurements available in the cockpit. There are
17 inputs and 9 outputs.
Training error for Prognostics data
42
Growing and Pruning Pruning a Grown Network
Validation error plots for Prognostics data
43
Growing and Pruning Pruning a Grown Network
Data set for estimating phoneme likelihood
functions in speech, has 39 inputs and 117 outputs
Training error for Speech data
44
Growing and Pruning Pruning a Grown Network
Validation error for Speech data
45
Growing and Pruning
  • Remaining Work Insert into the IPNNL
    Optimization System

46
IPNNL Software Motivation
Theorem 2 (No Free Lunch Theorem 5 ) In the
absence of assumptions concerning the training
data, no training algorithm is inherently better
than another. Implication We usually make some
assumptions. However, given training data,
several classifiers should be tried after feature
selection.
47
IPNNL Software Block Diagram
MLP
PLN
FLN
Prune Validate
Feature Selection
Size
Train
LVQ
SVM
Data
RBF
Final Network
SOM
Select Network Type
Produce Final Network
Analyze Your Data
48
IPNNL Software Examples
  • The IPNNL Optimization system is demonstrated on
  • Flight condition recognition data from Bell
    Helicopter Textron (Prognostics Problem)
  • Sleep apnea data from UTA Bioengineering (Prof.
    Khosrow Behbehani and Mohammad Al-Abed)
  • Traveler characteristics data from UTA Civil
    Engineering (Prof. Steve Mattingly and Isaradatta
    Rasmidatta)

49
Examples Bell Helicopter Textron
  • Flight condition recognition (prognostics) data
    from Bell Helicopter Textron
  • Features N 24 cockpit measurements
  • Patterns 4,745
  • Classes Nc 39 helicopter flight categories

50
Run Feature selection, and save new training and
validation files with only 18 features
51
Run MLP sizing, decide upon 12 hidden units
52
Run MLP training, save the network
53
Run MLP pruning, with validation. Final network
has 10 hidden units.
54
Examples Behbehani and Al-Abed
  • Classification of Sleep Apnea Data (Mohammad
  • Al-Abed and Khosrow Behbehani)
  • Features N 90 features from Co-occurrence
    features applied to STDFT
  • Patterns 136
  • Classes Nc 2 ( Yes/No )
  • Previous Software Matlab Neural Net Toolbox

55
  • Run Feature selection, and save new training and
    validation files with only 17 features. The curve
    is ragged because of the small number of patterns.

56
Run MLP sizing, decide upon 5 hidden units
57
Run MLP training, save the network
58
Run MLP pruning, with validation. Final network
has 3 hidden units.
59
Examples - Mattingly and Rasmidatta
  • Classification of traveler characteristics data
    (Isaradatta Rasmidatta and Steve Mattingly)
  • Features N 22 features
  • Patterns 7,325
  • Classes Nc 3 (car, air, bus/train )
  • Previous Software NeuroSolutions by
    NeuroDimension

60
Run Feature selection, and save new training and
validation files with only 4 features. The flat
curve means few features are needed.
61
Run MLP sizing, decide upon 2 hidden units. Flat
curve means few hidden units, if any, are needed.
62
Run MLP training, save the network
63
Run MLP pruning, with validation. Final network
has 1 hidden unit.
64
Conclusions
  • An effective feature selection algorithm has been
    developed
  • Regression-based networks are compatible with CM
  • Regression-based training can extend maximal
    margin concepts to many nonlinear networks
  • Several existing and potential blocks in the
    IPNNL Optimization System have been discussed
  • The system has been demonstrated on three pattern
    recognition applications
  • A similar Optimization System is available for
    approximation/regression applications

65
References
  • 1 K. Fukunaga, Introduction to Statistical
    Pattern Recognition, 2nd edition, Academic Press,
    1990.
  • 2 P. Pudil, J. Novovicova, J. Kittler, "
    Floating Search Methods in Feature Selection "
    Pattern Recognition Letters, vol 15 , pp
    1119-1125, 1994
  • 3 Jiang Li, Michael T. Manry, Pramod Lakshmi
    Narasimha, and Changhua Yu, Feature Selection
    Using a Piecewise Linear Network, accepted by
    IEEE Trans. on Neural Networks.
  • 4 Vladimir N. Vapnik, Statistical Learning
    Theory, John Wiley Sons, 1998.
  • 5 Duda, Hart and Stork, Pattern Classification,
    2nd edition, John Wiley and Sons, 2001.
  • 6 Dennis W. Ruck et al., "The Mulitlayer
    Perceptron as an Approximation to a Bayes Optimal
    Discriminant Function," IEEE Trans. on Neural
    Networks, Vol. 1, No. 4, 1990.
  • 7 R.G. Gore, Jiang Li, Michael T. Manry, Li-Min
    Liu, Changhua Yu, and John Wei, "Iterative Design
    of Neural Network Classifiers through
    Regression". International Journal on Artificial
    Intelligence Tools, Vol. 14, Nos. 12 (2005) pp.
    281-301.
  • 8 F. J. Maldonado and M.T. Manry, "Optimal
    Pruning of Feed Forward Neural Networks Using the
    Schmidt Procedure", Conference Record of the
    Thirty Sixth Annual Asilomar Conference on
    Signals, Systems, and Computers., November 2002,
    pp. 1024-1028.
  • 9 Pramod Lakshmi Narasimha, Walter Delashmit,
    Michael Manry, Jiang Li and Fransisco Maldonado,
    Fast Generation of a Sequence of Trained and
    Validated Feed-Forward Networks, accepted by the
    Nineteenth International Conference of the
    Florida AI Research Society, May 2006.
About PowerShow.com