1 / 74

Data Mining and Gated Expert Neural Networks for

Prognostic of Systems Health Monitoring

- Mo Jamshidi, Ph.D., DEgr., Dr. H.C.
- F-IEEE, F-ASME, F-AAAS, F-NYAS, F-HAE, F-TWAS
- Regents Professor, Electrical and Computer Engr.

Department - Director, Autonomous Control Engineering (ACE)

Center - University of New Mexico, Albuquerque, NM, USA
- Advisor, NASA JPL (1991-93), Headquarters

(1996-2003) - Sr. Research Advisor, US AF Research Lab.

(1984-90,2001-present) - Consultant, US DOE Oak Ridge NL (1988-92), Office

of Renewable Energy (2001-2003) - Vice President, IEEE Systems, Man and Cybernetics

Society - http//ace.unm.edu www.vlab.unm.edu

moj_at_wacong.org.org - Fairbanks, Alaska, USA May 24 2005

OUTLINE

- Definition of Prognostics
- History of Prognostics
- Approaches of Prognostics
- Principle Component Analysis PCA
- PCA via Neural Network Architecture
- Prognostics via Neural Networks
- Gated Approach to Hardware Prognostics
- Applications Health and Industry
- Conclusion and Future Efforts

Prognostics vs. Diagnostics vs. Health Monitoring

Are They the Same?

- Health Monitor v to keep track of current

status systematically with a view to collect

information. - Diagnosis n identifying the nature or cause

of some phenomenon. - Prognosis n a prediction about how something

(as the weather) will develop, forecasting. - Conclusion they are not the same
- The Websters New World Dictionary.

So How Are They Related?

- Health monitoring uses instrumentation to collect

information about the subject system. - Diagnostics uses the information in real time
- to detect abnormal operation or outright

faults. - Prognostics uses the information to predict the

onset of abnormal conditions and faults prior the

actual failure to allow the operators to

gracefully plan for shutdown or, if required,

operate the system in a degraded but safe-to-use

mode until a shutdown and maintenance can be

accomplished.

A Brief History of Automated Diagnostics and

Prognostics

- Before the advent of inexpensive computing,

diagnosis was ad-hoc, manual, and depended on

human experts. - With the advent of accessible digital computers,

early expert systems attempt diesel locomotive

engine diagnostics based on oil analysis. Humans

still required for prognostics. - 1970s saw the start of equipment health

monitoring for high-value systems (i.e. nuclear

power plants) and on-line diagnostics using

minicomputers. Human interpretation was still

required. - 1980s saw the use of personal computers and

digital analyzers to do equipment health

monitoring. Some automatic shut-down on extreme

exception was included, but human involvement was

still required.

A Brief History (Contd.)

- 1990s saw built-in test and real-time

diagnostics added to military electronics and

high-value civilian systems. Health

monitoring/diagnostics at this point were

evolving into decision support systems for the

operator. - NOW Diagnostics pervasive
- Automobiles (On Star , OBD II, heavy equipment,

trucks, etc.) - Electronics/electro-mechanical devices (copiers,

complex manufacturing equipment, etc.)

A Brief History (Contd.)

- Aviation (Boeing-777, Air Bus, etc.)
- Prognostics at the component/ subsystem level

start to appear for the first time. - Still no system-wide prognostics! By and large,

prognostics are still done by the human operators

deciding how much further they can go before

stopping.

Literature Survey

- Diagnostics are well developed.
- Prognostics are not!
- Logical next step Intelligent System Level

Prognostics

Approaches to Diagnostics and Prognostics

- Data Driven Methods
- Analytical Methods
- Knowledge based Methods

Data Signatures

- Library of predictive algorithms based on a

number of advanced pattern recognition techniques

- such as multivariate statistics, neural

networks, signal analysis - Identify the partitions that separate the early

signatures of functioning systems from those

signatures of malfunctioning systems

Predictive indicators of failures

- A viable prognostic system should be able to

provide an accurate picture of faults, component

degradation, and predictive indicators of

failures - Allowing our operators to take preventive

maintenance actions to avoid costly damage on

critical parts and to maintain availability/readin

ess rates for the system.

Data Driven Methods

- The huge amount of data has to be reduced

intelligently for any careful fault diagnosis. - Reduce the superficial dimensionality of data to

intrinsic dimensionality (i.e., number of

independent variables with significant

contributions to nonrandom variations in the

observations).

Data Driven Methods

- Feature extraction
- Partial Least Square (PLS)
- Fisher Discriminant Analysis
- Canonical Variate Analysis
- Principal Component Analysis
- We will only focus on PCA and its non-linear

relative (NLPCA).

Principal Component Analysis

- What is PCA?
- It is a way of identifying patterns in data, and

expressing the data in such a way as to highlight

their similarities and differences. Since

patterns in data can be hard to find in data of

high dimension, where the luxury of graphical

representation is not available.

Principal Component Analysis

- PCA is a powerful tool for analyzing data.
- The other main advantage of PCA is that once you

have found these patterns in the data, and you

compress the data, i.e. by reducing the number of

dimensions, you have not much loss of

information.

PCA

- The feature variables in PCA (also referred to as

factors) are linear combinations of the original

problem variables.

Classical Statistics based PCA steps

- Get Data
- Subtract the mean
- Calculate the covariance matrix
- Calculate eigenvalues and eigenvectors of

covariance matrix - Choose feature vector (data compression begins

from here) - Derive the new data set (reduced)

Principal Component Analysis (PCA)

- Assuming a data set of containing n

observations and m variables (i.e., a n x m

matrix), PCA divides into two matrices or

the scores dimension (n x f) and which is the

loading matrix dimension (m x f) plus a matrix

of residuals of dimension (n x m).

Principal Component Analysis (PCA)

- It is known that PCA optimizes the process by

minimizing the Euclidean norm of the residual

matrix . - To satisfy this condition, it is known that

columns of are the eigenvectors corresponding

to the f largest eigenvalues of the covariance

matrix of .

Principal Component Analysis (PCA)

- In other words, PCA transforms our data from m to

f dimension by providing a linear mapping - where represents a row of the original data

set and represents the corresponding row

of .

Non-Linear PCA (NLPCA)

- In Kramers NLPCA, the linear transformation in

PCA is generalized to any nonlinear function such

that - where is a nonlinear vector function composed

of f individual nonlinear functions analogous to

the columns of .

Non-Linear PCA (NLPCA)

Analytical Methods

- The analytical methods generate features using

detailed mathematical models. - Based on the measured input and output ,

it is common to generate residuals , parameter

estimates , and state estimates . - The residuals are the outcomes of consistency

checks between the plant observations and a

mathematical model.

Integrated Method for Fault Diagnostics and

Prognostics (IFDP)

- Based on
- NLPCA for dimensionality reduction
- Society of experts (E-AANN, KSOM, RBFC)
- Gated Experts
- All developed in Matlab with Simulink for model

simulations

Extended Auto-Associative Neural Networks (E-AANN)

Kohonen Self-Organizing Maps (KSOM)

- KSOM defines a mapping from the input data space

?n onto a regular two-dimensional array of nodes.

- In the System, a KSOM input is a vector combining

both inputs and outputs of a certain the System

component. - Every node i is defined by a prototype vector mi

? ?n. Input vector x ? ?n is compared with every

mi and the best match mb is selected.

Kohonen Self-Organizing Maps (KSOM)

Three-dimensional input data in which each sample

vector x consists of the RGB (red-green-blue)

values of a color vector.

Radial Basis Function based Clustering (RBFC)

- The RBF rulebase is identified by our clustering

algorithm. - We will consider a specific case of a rulebase

with n inputs and a single output. The inputs to

the rulebase are assumed to be normalized to fall

within the range 0,1.

Gated Experts for Combining Predictions of

Different Methods

- The Gated Experts (GE) architecture Weigened et

al, 1995 was developed as a method for

adaptively combining predictions of multiple

experts operating in an environment with changing

hidden regimes. - The predictions are combined using a gate block,

which dynamically assigns probabilities to the

forecast of each expert being correct based on

how close the current regime in the data fits the

area of expertise for that expert.

Gated Experts for Combining Predictions of

Different Methods

- The training process for the GE architecture uses

the expectation-maximization (EM) algorithm,

which combines both supervised and unsupervised

learning. - The supervised component in experts learns to

predict the conditional mean for the next

observed value, and the unsupervised component in

the gate learns to discover hidden regimes and

assign the probabilities to experts forecasts

accordingly.

Gated Experts for Combining Predictions of

Different Methods

- The unsupervised component is also present in

experts in the form of a variance parameter,

which each expert adjusts to match the variance

of the data for which it was found most

responsible by the gate.

(No Transcript)

Prototype Hardware Implementations

- A Chiller at Texas AM University with (Langari

and his team) - A laser pointing system prototype at the

University of New Mexico (Jamshidi and ACE team) - A COIL laser at AFRL - USAF (Jamshidi Stone)
- A flash memory line at Intel Corp. (Jamshidi

Stone)

Chiller Model at Texas AM University

Training Data and Test Data

Whole data with 1000 samples

Training Data and Test Data

Normalized training data with 2 noise (sorted)

Training Data and Test Data

Normalized test data with 2 noise (sorted)

One Sensor with Drift Error

Test data with 2 noise, sensor 3 has drift error

One Sensor with Drift Error

Drift error and sensor 3 data

One Sensor with Shift Error

Test data with 2 noise, sensor 3 has shift error

One Sensor with Shift Error

E-AANN output, the input data had 2 noise and

shift error

One Sensor with Shift Error

Shift error and sensor 3 data

One Sensor with Shift Error

The difference between E-AANN input and output,

the input data had 2 noise and shift error

PCA Application to Cardiac Output

- Cardiac output is defined by two factors.
- Stroke volume
- Heart Rate
- Cardiac Output Heart rate X Stroke volume
- (ml/min) (beats/min)

(ml/beat) - CO for basal metabolic rate is about 5.5L/min

The human heart

Prognostics of CO using PCA Analysis

- PCA is used in identifying patterns in data, and

expressing the data in such a way to highlight

their similarities and differences. - PCA assists us in making an accurate prognostic

analysis of a patients Cardiac output performance

and hence predict possible heart failures.

Good data representation

- By taking several measurements of CO, one is able

to predict the possibilities of heart failure,

and this allows for PCA to be very useful in the

prognostics of Cardiac output. - PCA takes these millions of output measurements

and crunches them into a graph representation,

from which we can easily visualize CO defects.

(No Transcript)

Why prognostics ?

- In medicine, the cheapest way to cure disease is

to prevent it. This is done with early

diagnostics, medicines, vaccines, etc.. - However with an accurate prognostics approach,

conditions like heart attack and heart failure

can be greatly minimized. - PCA enables us to arrive at prognostics.

Parkinson's Disease Tremors

- a) No medication nor brain Stimulation
- b) Brain Stimulation no medication
- c) No brain stimulation and medication
- d) Bran stimulation and medication

Test 1 Tests made on the differences and

similarities in patients that have both

medication and brain stimulation on vs.

medication off and brain stimulation on.

Test 2 Tests made on the differences and

similarities in patients that have both

medication and brain stimulation on vs.

medication on and brain stimulation off.

Test 3 Tests made on the differences and

similarities in patients that have both

medication and brain stimulation on vs.

medication off and brain stimulation off.

Test 4 Tests made on the differences and

similarities in patients that have medication on

and brain stimulation off vs. medication off and

brain stimulation on.

PCA Image Processing - ORIGINAL REDUCED 10

EIGENVECTORS

PCA ORIGINAL REDUCED 20 EIGENVECTORS

PCA ORIGINAL REDUCED 30 EIGENVECTORS

PCA ORIGINAL REDUCED 40 EIGENVECTORS

PCA ORIGINAL REDUCED 54 EIGENVECTORS

USING ALL 325 EIGENVECTORS

- With all 325 eigenvectors we can see that this

image looks the same as our image with only 54

eigenvectors.

PCA PERCENTAGES

Eigenvectors Of Eigenvectors Used

10 5.20

20 10.42

30 15.63

40 20.83

54 28.10

325 100

Laser Pointing System at UNM

Lab View Controller Algorithm

ADC

DAC

DAC

X/Y motors

Mirror

Filter

Detector Quadrant

L A S E R

Prognostics Possible test beds

- Chemical
- Laser
- System
- ATL
- Advanced
- Tactical
- Laser

Prognostics Possible test beds

- Large Gimbal system
- hardware system -
- NOP (North Oscura
- Peak) System

HARDWARE Prognostic System

Knowledge Base (NOP Senior Engineers)

Original Data

RBFC

Outputs

Relevant Data

Inputs

NOP Subsystem

Data Reduction Expert System

KSOM

GE-NN

PCA

Reduced Dominant Data

E-AANN

Inputs

NOP Diagnostic Prognostic System

Architecture

The Intel Flash Memory Assembly Line

- The Intel flash memory assembly line is a state

of the art system that uses many sensors to

monitor operating conditions.

PCA

- Hundreds of sensors produce thousands of signal

inputs per minute on the assembly line. Most of

the incoming data is irrelevant. Principal

component analysis finds the relevant information

among the explosion of data and provides it to a

computer for analysis.

Feature Extraction

- PCA is used to reduce the dimensionality of

the sensor data and extract features (or

characteristic attributes). The features are fed

to the computer for analysis.

Alternate Method

- Alternately, data can be fuzzified and

similarities can be found through this process.

A neural network is then trained from the

different data sets to determine a good data

signature for which to judge all incoming

streams of data.

Decision Making

- Distilled signal information is handed to a

computer for analysis. The computer can quickly

recognize changing trends leading to a failure

and alert an operator before the failure actually

occurs.

Conclusions

- Due to the huge number of sensors on many

Systems, our approach for fault diagnostics and

prognostics must be capable of intelligent data

reduction (PCA) in such a way that no important

data is lost and all the crucial data be used for

smart prognosis with minimum false alarms. - In its final configuration, it is expected that a

library of these strong methods which is under

development at benefit the the System program,

ATL, Intel System, Bio-medical cases, etc.

- THANK YOU!