Title: Business Data Solution Using Clustering, Linear Programming, and Neural Net
1Business Data Solution Using Clustering, Linear
Programming, and Neural Net
- A presentation to
- El Paso del Norte Software Association
-
- Somnath (Shom) Mukhopadhyay
- Information and Decision Sciences Department
- The University of Texas at El Paso
- August 27th 2003
2Outline of Presentation
- Data Mining Definition
- Introduction of Neural Net
- - Physiological flavor
- - General framework
- - Classes of PDP models
- - Sigma-PI units
- - Conclusion
3Outline of Presentation (Continued)
- Examples of real-world application problems
- Organization of theoretical concepts
- - Three methods used for classification
- - A new LP based method for classification
problem. - - Application to a fictitious problem with four
classes. - - Comparing LP method results with the results
from a neural network method - - QA
4Data Mining - definition
- - Exploring relationships in large amount of data
- - Should generalize
- - Should be empirically validated
- Examples
- - Customer Relationship Management (CRM)
- - Credit Scoring
- - Clinical decision support
5PDP Models and Brain
- Physiological Flavor
- Representation and Learning in PDP models
- Origins of PDP
- - Jackson (1869) and Luria (1966)
- - Hebb (1950)
- - Rosenblatt (1959)
- - Grossberg (1970)
- - Rumelhart (1977)
6General Framework for PDP
- A set of processing units
- A state of activation
- An output function of each unit
- A pattern of connectivity among units
- A propagation rule
- An activation rule
- A learning rule
- An operating environment
7The Basic Components of a PDP system
8Classes of PDP models
- Simple Linear Models
- Linear Threshold Units
- Brain State in a Box (BSB) by J. A. Anderson
- Thermodynamic models
- Grossberg
- Connectionist modeling
9Sigma-PI Units
10A few real-world applications of interest to
organizations and individuals
- Breast cancer detection
- Heart disease diagnosis
- Enemy sub-marine detection
- Mortgage delinquency prediction
- Stock market prediction
- Japanese Character recognition and conversion
11(No Transcript)
12What is classification?
- Identification of a set of certain mutually
exclusive classes - Identify a set of meaningful attributes that
discriminate among the classes - Illustrations
- Using a meaningful set of attributes, can we
differentiate between frequent and infrequent
occurrence?
13Decision Boundaries of a typical classification
problem
14Three Methods for Classification
- Identifying decision boundaries for each class
region - Linear discriminant (Glover at al., 1988)
- Linear programming (Roy and Mukhopadhyay, 1991)
- Neural Networks (Rumelhart, 1986)
15A new LP based method for classification problem
- Step 1. Identify and discard outliers using
Clustering -
- Step 2. Form decision boundaries for each class
region by using LP
16Step 2 Form Decision Boundaries
- Development of Boundary Functions
- Use convex functions to calibrate the boundary.
One example function f(x) ?ai Xi ?bi Xi2
? ?cij Xi Xj d where j i 1
17Step 2 Form Decision Boundaries (Contd.)
- One instance of the general function.
fA(x) a1 X1 a2 X2 b1 X12 b2 X22 d
18Step 2 Form Decision Boundaries (Contd.)
- LP formulation of the previous problem instance
Minimize e s.t. fA(x1) gt e fA(x8) gt e fA(x9)
lt -e ... fA(x18) lt -e egt a small positive
constant.
Minimize e s.t. a2 b2 d gt e for pattern
x1 a1 b1 d gt e for pattern x2 - a2 b2 d
gt e for pattern x3 - a1 b1 d gt e for
pattern x4 . a1 a2 b1 b2 d lt - e for
pattern x15 a1 - a2 b1 b2 d lt - e for
pattern x16 - a1 - a2 b1 b2 d lt - e for
pattern x17 - a1 a2 b1 b2 d lt - e for
pattern x18 egt a small positive constant.
19Step 2 Form Decision Boundaries (Contd.)
- Solution of this LP formulation gives decision
boundaries.
Specifically we get, a1 0, a2 0, b1 -1, b2
-1, d 1e Therefore, the boundary
function fA(x) a1 X1 a2 X2 b1 X12 b2 X22
d translates into fA(x) 1 - X12 - X22 e
20Step 2 Form Decision Boundaries (Contd.)
- Putting this result into picture we have the
following decision boundary
21Step 2 Form Multiple Decision Boundaries
- A class does not have to be neatly packed within
one boundary. - For problems requiring multiple decision
boundaries, the algorithm can find multiple
disjointed regions for the same class. For
example, a class called corner seats in a
soccer stadium is scattered into four disjointed
regions. -
22An example of a decision space of a fictitious
problem (It has four classes A, B, C, D)
23Decision Boundary Identification Process for
Class D only
24Six Decision Boundaries found for Class B
25Constructing MLP from masksMasking functions put
on a network to exploit parallelism.
26Neural Networks Method for Classification
- Neural networks
- develops non-linear functions to associate inputs
with outputs - no assumptions about distribution of data
- handles missing data well (graceful degradation)
- Supervised neural networks
- Estimating and testing the model
- Construct a training sample and a holdout sample
- Estimate model parameters using training sample
- Test the estimated models classification ability
using holdout sample
27Comparison between LP and NN performance for
three real-world problem
28Future Research
- - Autonomous Learning
- learn without outside interventions
- does class dependent feature selection
- derives simple if-then type classification rules
that humans can understand - develops non-linear functions to associate inputs
with outputs
29Q A