Title: Condition Based Maintenance of Critical Machinery Assets: An Intelligent Architecture
1Condition Based Maintenance of Critical Machinery
Assets An Intelligent Architecture
- Dr. George Vachtsevanos
- Georgia Institute of Technology
- School of Electrical and Computer Engineering
- Atlanta GA 30332-0250
- (404) 894-6252 Voice
- (404) 894-7583 Fax
- gjv_at_ece.gatech.edu
- http//icsl.marc.gatech.edu
- Presented at the Workshop on Automated Machinery
Maintenance - The University of Texas at Arlington
- July 17, 2003
2Topical Outline
- Introduction What is Condition Based
Maintenance? - Elements of a CBM Architecture
- Example Demonstration
- An Intelligent Agent Based CBM Paradigm
- Future RD Directions/Concluding Comments.
3Condition-Based Maintenance
The Opportunity
Condition Based Maintenance (CBM) promises to
deliver improved maintainability and operational
availability of naval systems while reducing
life-cycle costs
The Challenge
Prognostics is the Achilles heel of CBM systems -
predicting the time to failure of critical
machines requires new and innovative
methodologies that will effectively integrate
diagnostic results with maintenance scheduling
practices
4Condition Based Maintenance
- Objective
- Determine the optimum time to perform
maintenance - Problem Definition
- A scheduling problem schedule maintenance
timing to meet specified objective criteria under
certain constraints
5Condition Based Maintenance
- Major Objective
- Extend system life cycle as much as possible
without endangering its integrity - Enabling Technologies
- Various Optimization Tools
- Genetic Algorithms
- Evolutionary Computing
6A Maintenance Management Architecture
Enabling Technologies Genetic Algorithms for
Optimum Maintenance Scheduling Case-Based
Reasoning and Induction Cost-Benefit Analysis
Studies
7CBM Performance Assessment
- Objective
- To assess the technical and economic feasibility
of various prognostic algorithms - Technical Measures
- Accuracy, Speed, Complexity, Scalability
- Overall Performance Measure
- w1Accuracy w2Complexity w3Speed (wi -
weighting factors)
Performance Assessment Matrix
8Prognostics
- Objective
- Determine time window over which maintenance must
be performed without compromising the systems
operational integrity
9Prognostics
- Enabling Technologies
- Multi-Step Adaptive Kalman Filtering
- Auto-Regressive Moving Average Models
- Stochastic Auto-Regressive Integrated Moving
Average Models (ARIMA) - Forecasting by Pattern and Cluster Search
- Variants Analysis
- Parameter Estimation Methods
- Others
10Prognostics
- Enabling Technologies (contd)
- AI Techniques
- Case-Based Reasoning
- Intelligent Decision-Based Models
- Min-Max Graphs
- Petri Nets
- Soft Computing Tools
- Neural Nets
- Fuzzy Systems
- Neuro-Fuzzy
11PEDS Software System Architecture
(Stand-alone)
12The Navy Centrifugal Chiller
13Chiller Failure Modes
14- Timing sequence(1)-No fault Detected
Start next cycle
Data Collection
Mode Identification
Feature Extraction
(Extract features For Diagnostics only)
FuzzyDS WNN
No Fault Detected Prognostic routines will not
run
t
t1
t0
t2
1. The timing sequence is managed by the Task
Manager 2. Algorithm modules are started by
FEATURE READY events 3. Each diagnostic module
decides upon the presence or absence of a
fault 4. The diagnostic modules report their
conclusion to the database. 5. Each diagnostic
module runs its routine and responds back to the
task manager. 6. Task manager receives the events
and decides which module or algorithm should be
started. 7. The diagnostic decision (or No fault)
is displayed on the GUIGUI receives result from
database. 8. All prognostic routines are
initiated when a fault has been detected.
15Timing sequence(2)Fault Confirmed
Start confirmation
Continue confirmation
Fault confirmed Start prognosis
Data Collection
Mode Identification
Collect data for prognosis
Feature Extraction
Extracts features for prognosis
FuzzyDS WNN
DWNN
Fault Detected By FuzzyDS or WNN
Fault confirmed
CPNN
t
t1
t2
t3
t4
t0
16Timing sequence(3)Fault not confirmed
Start confirmation
Start normal cycle again
Continue confirmation
Data Collection
Feature Extraction
FuzzyDS WNN
No fault detected here
Fault Detected By FuzzyDS or WNN
t1
t0
t2
t3
t4
t
17Software Design
- Developing platform Microsoft Visual C ,
Visual Basic, SQL server 2000, Access 2000.
Software running under Windows NT platform - Component based open system architecture. All the
system components are implemented as Microsoft
COM objects - Event-based distributed communication. Capable of
transmitting events at different priorities. - Provides database for storage of collected
data,configuration information, diagnostic and
prognostic results.
18Conceptual Model of PEDS 3-Tier Client-Server
Architecture
User Services
Prognostic Services
Persistent Services
Diagnostics
Operator User Interface
ICAS Database Interface
Feature Extraction
Features Repository Interface
Administrator User Interface
Prognostics
Tier 1
Tier 2
Tier 3
19Software Diagram
Data sampling module
Task Manager
Time Out?
Start Sampling Data
Events
Sampling Data
Event
N
Enough Data?
Prognosis
Diagnosis
y
Save to Database
Start Feature Extractor
Wait for event
Wait for event
N
Feature extractor
Feature Ready?
Wait for event
Get Features From Database
Get Features From Database
y
Get Data From Database
N
Time Out?
Event
Do Prognosis
Do Diagnosis
y
Do Extraction
Start Diagnosing
Save results to Database
Save results to Database
Save Features to Database
Start Prognosing
20Database relationships
21Mode DiagramExample Testbed AC-Plant
22Mode Identification
- Modes are characterized by the dynamics,
set-points, and controller. - Modes switch due to events.
Fuzzy Petri Net (Mode Changes Due to Events)
Mode Decision
Current Mode
Sensors
Dynamics Classifier (Mode Due to Dynamics)
23Fuzzy Petri Net
other modes
Normal Mode
Events Characterized by Membership Functions
53
44
Overload Mode
If Chilled Water Inlet Temperature is above 53
degrees and Chilled Water Outlet Temperature is
above 44 degrees then Switch to Overload Mode
other modes
24Fuzzy Petri Net Simulation
Fuzzy Petri net marking
Sensors
mode
marking
SENSORS Pre-rotation vane position Chilled water
inlet temperature Chilled water outlet temperature
MODE CHANGES Normal Load Mode Full Load
Mode Overload Mode
25Feature Selection and Extraction
- Motivation Data driven diagnostic/prognostic
algorithms require for fault detection and
classification a feature vector whose elements
comprise the best fault signature indicators - Intelligent distinguishability and
identifiability metrics must be defined for
selecting the best features - Time and frequency domain analysis techniques
must be employed to extract the selected features
Sensors
Pre- Processing
FeatureExtraction
FaultClassification
Predictionof Fault Evolution
Data
26Overall Procedures for Feature Selection and
Diagnostic Rule Generation
PEDS Database
Raw Database
Featurebase
Feature Vector Table
Diagnostic Rulebase
Feature Preparation
Rough Set Feature Selection
Rough Set Rule Generation
Feature Extraction
Preprocessing
Rough set theory is a popular data mining
methodology, which provides mathematical methods
to remove redundancies and to discover
hidden relationships in a database.
27Feature Preparation
Heuristic Feature Pre-selection
Fault Symptoms from ONR Report
Available Measurements from York Test
Featurebase
Fault Mode1
Time
Feature 1
Feature 2
Decision
York Test Database
Raw Data
Feature Candidates
Feature Extraction
Preprocessing
0
20.5
11.5
t1
t2
1
23.5
9.5
Preprocessing includes removing unreasonable
objects from the database and assigning the
operational status.
28Feature Extraction Architecture
Maintainance
Rough Set Data MIner
Actions
Feature
Rule
Feature
Selector
Generator
Vectors
Diagnostic
Historical
FeatureValues
Rules
Featurebase
Classifier
Feature
Feature
Diagnostic
Designer
Vector Table
Table
Rule Table
Operator
Sensor Suite
Classifier
Feature
Diagnostic
Parameters
Vectors
Rules
Alarms,/
Reports
Data Calibration
Raw
On-line
Diagnostic
data
Feature Values
Results
Feature
Preprocessing
Diagnostor
Extractor
GUI
Database
Prognostor
Prognostic
Historical
Results
Diagnostic/Prognostic
Feature Value
Module
Figure 1-2. Industrial Diagnostic/Prognostic
Framework based on Data Mining Feature Selection.
29The Diagnostic Module
High-frequency failure modes (engine stall,
etc.) The Wavelet Neural Net Approach
A Two-Prong Approach
Low-frequency events (Temperature, RPM sensor,
etc.) The Fuzzy Logic Approach
30Sensor Data
Features
Preprocessing and Feature Extraction
Failure Templates
Fuzzify Features
Inference Engine
Fuzzy Rule Base (1) If symptom A is high
symptom B is low then failure mode is
F1 (2) ...
(Defuzzify) Failure Mode
31Fuzzy Logic DiagnosticArchitecture
Features
Fuzzy Logic Classification
Dempster-Shafer Theory of Evidence
Fuzzification
Construct Mass Functions from Possibilities
Rulebase
Fuzzy Inference Engine
Combine Evidence
Defuzzification
Calculate Degree of Certainty
Threshold for Fault Declaration
Fault Declaration
Degree of Certainty
32Prognostics
Two Prognostic Scenarios
Two Prognostic Outcomes
33Prognostics
Supervised
Unsupervised
34The Prognostic Module
QUESTION Once an impending failure is detected
and identified, how can we predict the time
window during which maintenance must be performed?
Prognostic Module (T?)
Sensor Data
Diagnostic Module
CBM
APPROACH
- Employ a recurrent neuro-fuzzy model to predict
time window T
- Update prediction continuously as more
information becomes - available from the diagnostic module
35Wavelet Neural Network (WNN)
36Dynamic Wavelet Neural Network (DWNN)
Y(t1) WNN(Y(t), ,Y(t-M), U(t), , U(t-n))
37On Virtual Sensors
- Many failure modes are difficult or impossible to
monitor - Question How do we build a fault meter?
- Answer Virtual Sensor
- The Notion Use available sensor data to map
known measurements to a fault measurement - Potential Problem Areas How do we train the
neural net? Laboratory or controlled experiments
required
38Dynamic Virtual Sensor
39Implementation of The Prognostic System
40Application Examples
- A defective bearing with a crack causes the
machine to vibrate abnormally - Vibrations can be caught with accelerometers
which translate mechanical movement into
electrical signals - Bearing crack faults may be prognosed by
examining and predicting their vibration signals
41An Experimental Setup
42- Cracks on the races or balls - Particles in the
lubricant - Gaps in between moving parts, etc
43A Sample Database
44Bearing Fault Diagnosis
For the good bearing, features 0.3960
0.1348 For the defective bearing, features
4.9120 9.2182 0 1 WNN(0.3960
0.1348) gt The bearing is good! 1 0
WNN(4.9120 9.2182) gt The bearing is
defective!
45(No Transcript)
46Bearing Fault Prognosis
TTF 19 time units
47Current time
Predicted time to failure
Current time
Finish time
6
Failure Condition
Real Data
5
WNN Output
4
Power Spectrum Area
Power Spectrum Area
3
Time-to-failure
2
1
Prediction Time
0
0
20
40
60
80
100
Time Window
Time Window
Prediction up to 98 time windows using the
trained WNN
Prediction of time-to-failure using the trained
WNN time-to-failure 38 time windows
48Prognosis in the frequency domain
49(No Transcript)
50Width
Width
Depth
Depth
51Mounting Bolt Faults
52Problem Definition
- Challenges in Technology Prediction
- Long term prediction (i.e. 30 years) is facing
growing uncertainty - Current trends can be misleading
- External causal factors are numerous and
difficult to quantify - Data is sparse, difficult to extract a trend
53Confidence Prediction Neural Network
54Time Series Prediction Architecture
55The Confidence Prediction Neural Network (CPNN)
- For CPNN, each node assigns a weight (degree of
confidence) for an input X and a
candidate output Yi. - Final output is the weighted sum of all candidate
outputs. - In addition to the final output, the confidence
distribution of that output can be computed as
CPNN
56Prognostic Results
Without reinforcement learning
historical data
prediction
real failure time
dist of prognostic failure time
57Prognostic Results
With reinforcement learning
58New Challenges for CBM/PHM Systems
- Diagnostic and prognostic systems have so far
been - designed in an ad hoc manner
- static - in terms of performance
- passive - in that they only respond to events and
never initiate actions - centralized - so that either the knowledge-base,
control, or model is centralized even for
distributed frameworks - tightly coupled - resulting in poor
error-tolerant frameworks - non-scalable
- non-portable - since they are very system
specific, and - require expert personnel to upgrade/update
59Centralized Control KB Architectures
UUT
Sensors
Knowledge Base
Events
Preprocessing
Diagnosis
Diagnostic Algorithm
UUT
Sensors
Data-mining
Prognosis
Prognostic Algorithm
Control
Events
Feature Extraction
UUT
Sensors
A Generic Central Control and Knowledge Base
Framework
Events
60Distributed Control KB Architectures
UUT
Local Diagnostic Algorithms
Sensors
Events
Local Prognostic Algorithms
Local Control
Local KB
Knowledge Fusion
UUT
Diagnosis
Diagnostic Algorithms
Sensors
Central Control
Prognostic Algorithms
Prognosis
Central Knowledge Base
Events
Distributed Control and Knowledge Base Framework
UUT
61Model-Based Software Architecture
62The System Modeling Approach
- I. Concept of Hybrid System
63The System Modeling Approach (continued)
- II. The Object-Oriented Hybrid Model Architecture
64Object-Oriented Modeling - Physics-Based or
Physical Modeling
- System Concept - An Example JSF Propulsion System
- Components
- Components
- Interconnections/ Couplings
- Semi-empirical models
- Deterministic-stochastic models
- Finite-element models
65Physical Modeling in Prognosis
The concept of Virtual Sensor
Problem Difficult or impossible to monitor in an
operational state fault dimensions. Solution
Train a physical model (semi-empirical,
physics-based) to map measurable quantities
(vibration, temperature, etc.) into fault
dimensions (crack length, wear etc.
66Control Architecture for Intelligent Agent
Paradigm
67Important Features of Agenthood
- Adaptation
- Learning, Knowledge-Discovery
- Communication
- Self-Organization
- Cooperation
- Agent Communication Language
- Cooperate with other Software Agents/User Agents
- Form Agent Communities - MAS
- Autonomy
- Proactive and Reactive
- Goal-directed
- Multi-threaded
The IA Paradigm Agent-Oriented Programming
(AOP), Agent-Oriented Design (AOD),
Agent-Oriented Software Engineering (AOSE)
68Intelligence of Intelligent Agents
A simple reflex agent
A reflex agent with internal states
Sensors
What the world is like
State
An agent with explicit goals
Sensors
How the world evolves
ENVIRONMENT
What the world is like
Condition-action rules
What actions I should do now
State
What my actions do
Sensors
An agent with utility
How the world evolves
What the world is like
ENVIRONMENT
Effectors
Condition-action rules
What actions I should do now
Sensors
State
What my actions do
What the world will be after I do action A
What the world is like
How the world evolves
Effectors
ENVIRONMENT
What the world will be after I do action A
What my actions do
What actions I should do now
Goals
ENVIRONMENT
- Varying degrees of intelligence
- Recognize goals and intentions
- React to unexpected situations in a robust manner
- Focus of Research Concepts of Learning,
Self-Organization, and Active Diagnosis
How happy I will be in this state
Utility
Effectors
What actions I should do now
Effectors
69Learning Issues Approaches
- Issues in Learning
- Incremental Learning Ability
- Order Independence
- Unknown Attribute Handling
- Noisy Data Tolerance
- Source Combination
- Open Questions
- What should be learned?
- When to learn?
- How to learn?
- Possible Solutions
- Learn diagnostic rules via Episodic Information
(Experiences) - Learn when Episodic Information is not enough to
handle current scenario - Aggregate learning episodes using some concept
learning technique
70Case-Based Reasoning Learning
- CBR - an episodic memory of past experiences
- CBR - initial cases by examples
- CBR Methodology
- Indexing (generate indices for classification
and categorization) - Retrieval (retrieve the best past cases from the
memory) - Adaptation (modify old solution to conform to
new situation) - Testing (did the proposed solution work)
- Learning (explain failed store successful
solutions)
71The Dynamic Case-Based Reasoning Architecture
72Structure of Static and Dynamic Case Library
73Concept Learning
- Memory Aggregate (MA) approach
- systematic learning from examples and
counter-examples - incremental and order independent
- combines sources
- counter examples handled similarly by building an
MA called C_MA
Object A
Example 1
Has_C, 1/3
Has_A, 1/3
Has_B, 1/3
attrB m
attrC 16
attrA a
Has_A2, 1/2
Has_A1, 1/2
attrA1 a
attrA2 2
MA based on Examples 1 2
Example 2
Object A
Object A
Has_C, 1/5
Has_A, 1/2
Has_A, 2/5
Has_B, 2/5
Has_B, 1/2
attrB m2/2
attrC 16,1/1
attrA a,2/2
attrB m
attrA a
Has_A2, 1/2
Has_A1, 2/4
Has_A2, 1/2
Has_B1, 1/1
Has_A1, 1/2
Has_B1, 1/1
attrB1 y,1/1
attrA1b,1/2 a, 1/2
attrA2 2/2
attrB1 y
attrA2 2
attrA1b
74Concept CBR
- Learning Concepts as Case Library
- Storing Memory Aggregates that are sufficiently
different from each other as unique cases - Memory Aggregates are also used for Case
Retrieval, Adaptation, Testing, and Learning - For Diagnostic Framework, Concept CBR can learn
- concept of Normal Mode(s) of operation given
the sensor and event readings as attributes - concept of unique Fault Modes given that the
modes differ sufficiently in their signatures
75Multiagent Systems (MAS)
- An MAS is a loosely coupled network of problem
solvers that interact to solve problems that are
beyond the individual capabilities or knowledge. - Characteristics of MAS
- each agent has incomplete information
- there is no system global control
- data is decentralized and
- computation is asynchronous.
- Multiagent Software Engineering (MaSE)
76A Multiagent Diagnostic Prognostic Framework
- Global Perspective
- Analysis and knowledge-distribution at multiple
levels of abstraction. - The MAS framework becomes part of a Decision
Support System.
77Engineering a Multiagent System - I
- Goal Hierarchy Diagram
- to specify goals and sub-goals
- Sequence Diagram
- to establish possible roles and how
- communicating agents will achieve them
- Induced Agent Classes
- mapping from roles to agent
- definitions
- aggregation of roles/agent classes
- according to available resources
- and similarity of roles
.
78Engineering Multiagent Systems - II
- Mapping Roles to Agent Classes
- Diagnostic Agents
- Sensor Agents
- Black Board Agents
- Risk Reliability
- Assessment Agents
79Active Diagnosis
- Extends the offline ideas of Probing or
Testing - It is biased to monitor normal conditions
- Active Diagnosis Monitors consistency among data
- Active Diagnosis of DES - A Design Time Approach
- the system itself is not diagnosable
- design a controller called Diagnostic
Controller that will make the system diagnosable - Active Diagnosis Possibilities
- Inline with Intelligent Agent paradigm
- Collaboration in Multiagent Systems can be
directed to achieve Active Diagnosis
80Active vs. Passive Diagnosis
- Passive Diagnosis
- Diagnoser FSM that monitors events and sensors to
generate diagnosis. - A Diagnosable Plant generates a language from
which unobservable failure conditions can be
uniquely inferred by the Diagnoser FSM. - Design-Time Active Diagnosis
- Design a controller that will make an otherwise
non-diagnosable plant generate a language that
is diagnosable.
81Active Diagnosis - Agent Perspective
- Given an anomalous situation, Diagnostic Agent
Plans, Learns, and Coordinates. - Learning takes place between distributed agents
that share their experiences - Coordination helps search, retrieval, adapting
activities - Planning is required to determine if learning and
coordination is possible in the given expected
time-to-failure condition - Run-time Active Diagnosis
- non-intrusive
- autonomous and rational
82Performance Measures(How to Compare and
)
83Innovative Thrusts
- Intelligent Agent-based Distributed PHM
Architecture - Error-tolerant, flexible, and scalable
diagnostic/prognostic framework - Automated Prescription
- What is wrong and how to I fix it?
- When do I fix it?
- How much confidence do I have that the system
will not fail during the execution of a mission? - Object-oriented modeling framework/physics-based
models - Case-based reasoning paradigm - archive case
studies - reason about new situations - Open Systems Architecture - OSA CBM and ICAS
compatible - Active Diagnosis / Prognosis notions
- Performance Metrics / Performance based PHM
modules
84The Candidate Application Domains
- Shipboard Processes - gas turbines, AC plants,
elevators, main fire systems, etc. - Propulsion / Drivetrain Components - clutch,
Seals, pumps, bearings, blades, etc. - Power Systems - generation, shipboard
distribution - Radar tracing and other communication-related
systems - Other
85Contributions
- Development of a new methodology for the design
of a diagnostic and prognostic framework for
large-scale distributed systems - Development of Run-time Active Diagnosis approach
/ extension to Prognosis - Development of Learning Strategies for diagnostic
/ prognostic problem domain - Development of performance measures for the
comparison of centralized vs. distributed
diagnostic and prognostic frameworks
86Implementation IssuesEmbedded Distributed
Diagnostic Platform (EDDP)
- Hardware
- Modular I/O (e.g. NIs FieldPoint System, or
MAX-IO). - Embedded PC (e.g. MPC - Matchbox PC of TIQIT or
MAX-PC of Strategic-Test). - Network (e.g. Ethernet, PROFIBUS, CAN).
- Software
- Windows CE, Linux, QNX, VxWorks, or OsX operating
systems. - Embedded databases (like Polyhedra).
- RAD tools (like eMbedded Visual Studio of
Microsoft).
87A Possible Agent Node
An Operator Interface (LCD Display)
A Small PC (MPC, MAX-PC)
Network (Ethernet, CAN, Profibus)
Distributed I/O System (FieldPoint)
Sensors
Sensors
Sensors
88CBM Performance Assessment
- Objective
- To assess the technical and economic feasibility
of various prognostic algorithms - Technical Measures
- Accuracy, Speed, Complexity, Scalability
- Overall Performance Measure
- w1Accuracy w2Complexity w3Speed (wi -
weighting factors)
Performance Assessment Matrix
89CBM Performance Assessment
- Target Measure
- Behavior Measure
- Mean and Variance Measures
Output y(n)
Real yr(n)
Predicted yp(n)
tpf
Discrete time n
90Complexity/Cost-benefit Analysis
- Complexity Measure
- Cost/Benefit Analysis
- frequency of maintenance
- downtime for maintenance
- dollar cost
- etc.
- Overall Performance
91Cost/Benefit Analysis
- Establish Baseline Condition - estimate cost of
breakdown or time-based preventive maintenance
from maintenance logs - A good percentage of Breakdown Maintenance costs
may be counted as CBM benefits - If preventive maintenance is practiced, estimate
how many of these maintenance events may be
avoided. - The cost of such avoided maintenance events is
counted as benefit to CBM.
92Cost/Benefit Analysis (contd)
- Intangible benefits - Assign severity index to
impact of BM on system operations - Estimate the projected cost of CBM, i.e. cost
of instrumentation, computing, etc. - Aggregate life-cycle costs and benefits from the
information detailed above
93CINCLANTFLT Study
- Question What is the value of prognostics?
- Summary of findings
- (1) Notional Development and Implementation for
Predictive CBM Based on CINFCLANTFLT ID
Maintenance Cost Savings - (2) Assumptions
- CINCLANTFLT Annual 2.6B FY96 ID Maintenance
Cost - Fully Integrated CBM yields 30 reduction
- Full Realization Occurs in 2017, ST sunk cost
included - Full Implementation Costs 1 of Asset Acquisition
Cost - IT 21 or Equivalent in place Prior to CBM
Technology - (3) Financial Factors
- Inflation rate 4
- Investment Hurdle Rate 10
- Technology Maintenance Cost 10 Installed Cost
- (4) Financial Metrics 15 year 20 year
- NPV 337M 1,306M
- IRR 22 30
94Concluding Remarks
- CBM/PHM are relatively new technologies -
sufficient historical data are not available - CBM benefits currently based on avoided costs
- Cost of on-board embedded diagnostics primarily
associated with computing requirements - Advances in prognostic technologies (embedded
diagnostics, distributed architectures, etc.) and
lower hardware costs (sensors, computing,
interfacing, etc.) promise to bring CBM system
costs within 1-2 of a typical Army platform cost