Dependability Theory and Methods Part 1: Introduction and definitions - PowerPoint PPT Presentation

1 / 74

About This Presentation

Title:

Dependability Theory and Methods Part 1: Introduction and definitions

Description:

Reasoning. Predicting the behavior of a system. Need a model. A ... measurements on components (accelerated tests). A. Bobbio. Bertinoro, March 10-14, 2003 ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 75

Provided by: Bob101

Category:

more less

Transcript and Presenter's Notes

Title: Dependability Theory and Methods Part 1: Introduction and definitions

1
Dependability Theory and MethodsPart 1
Introduction and definitions

Andrea Bobbio
Dipartimento di Informatica
Università del Piemonte Orientale, A. Avogadro
15100 Alessandria (Italy)
bobbio_at_unipmn.it - http//www.mfn.unipmn.it/bob
bio

Bertinoro, March 10-14, 2003
2
Dependability Definition
Dependability is the property of a system to be
dependable in time, i.e. such that reliance can
justifiably be placed on the service it delivers.
Dependability extends the interest on the system
from the design and construction phase to the
operational phase (life cycle).
3
What dependability theory and practice wants to
avoid
4
Dependability Taxonomy
reliability availability maintainability safety se
curity
measures
dependability
5
Quantitative analysis
The quantitative analysis aims at numerically
evaluating measures to characterize the
dependability of an item

Risk assessment and safety
Design specifications
Technical assistance and maintenance
Life cycle cost
Market competition

6
Risk assessment and safety
The risk associated to an activity is given
proportional to the probability of occurrence of
the activity and to the magnitute of the
consequences.
R P ? M
A safety critical system is a system whose
incorrect behavior may cause a risk to occur,
causing undesirable consequences to the item, to
the operators, to the population, to the
environment.
7
Design specifications

Technological items must be dependable.
Some times, dependability requirements (both
qualitative and quantitative) are part of the
design specifications
Mean time between failures
Total down time

8
Technical assistance and maintenance
The planning of all the activity related to the
technical assistance and maintenance is linked to
the system dependability (expected number of
failure in time).

planning spare parts and maintenance crews
cost of the technical assistance (warranty
period)
preventive vs reactive maintenance.

9
Market competition

The choice of the consumers is strongly
influenced by the perceived dependability.
advertisement messages stress the
dependability
the image of a product or of a brand may depend
on the dependability.

10
Purpose of evaluation

Understanding a system
Observation
Operational environment
Reasoning
Predicting the behavior of a system
Need a model
A model is a convenient abstraction
Accuracy based on degree of extrapolation

11
Methods of evaluation

Measurement-Based
Most believable, most expensive
Not always possible or cost effective during
system design

Model-Based
Less believable, Less expensive
Analytic vs Discrete-Event Simulation
Combinatorial vs State-Space Methods

12
Measurement-Based

Most believable, most expensive
Data are obtained observing the behavior of
physical objects.
field observations
measurements on prototypes
measurements on components (accelerated tests).

13
Models
Closed-form Answers
Numerical Solution
Analytic
Simulation
All models are wrong some models are useful
14
Methods of evaluation

Measurements Models data bank

15
The probabilistic approach
The mechanisms that lead to failure a
technological object are very complex and depend
on many physical, chemical, technical, human,
environmental factors.
The time to failure cannot be expressed by a
determin-istic law.
We are forced to assume the time to failure as a
random variable. The quantitative dependability
analysis is based on a probabilistic approach.
16
Reliability
The reliability is a measurable attribute of the
dependability and it is defined as
The reliability R(t) of an item at time t is the
probability that the item performs the required
function in the interval (0 t) given the stress
and environmental conditions in which it operates.
17
Basic Definitions cdf

Let X be the random variable representing the
time to failure of an item.

The cumulative distribution function (cdf) F(t)
of the r.v. X is given by
F(t) Pr X ? t
F(t) represents the probability that the item is
already failed at time t (unreliability) .
18
Basic Definitions cdf

Equivalent terminoloy for F(t)
CDF (cumulative distribution function)
Probability distribution function
Distribution function

19
Basic Definitions cdf
F(t)
1
F(b)
F(a)
0
t
a
b
F(0) 0 lim F(t) 1 t?? F(t) non-decreasing
20
Basic Definitions Reliability

Let X be the random variable representing the
time to failure of an item.

The survivor function (sf) R(t) of the r.v. X is
given by
R (t) Pr X gt t 1 - F(t)
R(t) represents the probability that the item is
correctly working at time t and gives the
reliability function .
21
Basic Definitions

Equivalent terminology for R(t) 1 -F(t)
Reliability
Complementary distribution function
Survivor function

22
Basic Definitions Reliability
R(t)
1
R(a)
0
t
a
b
R(0) 1 lim R(t) 0 t?? R(t) non-increasing
23
Basic Definitions density

Let X be the random variable representing the
time to failure of an item and let F(t) be a
derivable cdf

The density function f(t) is defined as
d F(t) f (t)
dt
f (t) dt Pr t ? X lt t dt
24
Basic Definitions Density
f (t)
0
t
a
b
b
? f(x) dx Pr a lt X ? b F(b) F(a)
a
25
Basic Definitions Density
f (t)
1
0
t
26
Basic Definitions

Equivalent terminology pdf
probability density function
density function
density
f(t)

For a non-negative random variable
27
Quiz 1The higher the MTTF is, the higher the
item reliability is.

Correct
Wrong

The correct answer is wrong !!!
28
Hazard (failure) rate

h(t) ?t Conditional Prob. system will fail in
(t, t ?t) given that it is survived until
time t
f(t) ?t Unconditional Prob. System will fail in
(t, t ?t)

29
The Failure Rate of a Distribution

is the conditional probability that
the unit will fail in the interval
given that it is functioning at time t.
is the unconditional probability that
the unit will fail in the interval
Difference between the two sentences
probability that someone will die between 90 and
91, given that he lives to 90
probability that someone will die between 90 and
91

30
Bathtub curve
h(t)
(infant mortality burn in)
(wear-out-phase)
CFR Constant fail. rate (useful life)
DFR
IFR
t
Increasing fail. rate
Decreasing failure rate
31
Infant mortality (dfr)
Also called infant mortality phase or reliability
growth phase. The failure rate decreases with
time.

Caused by undetected hardware/software defects
Can cause significant prediction errors if
steady-state failure rates are used
Weibull Model can be used

32
Useful life (cfr)
The failure rate remains constant in time (age
independent) .

Failure rate much lower than in early-life
period.
Failure caused by random effects (as
environmental shocks).

33
Wear-out phase (ifr)
The failure rate increases with age.
It is characteristic of irreversible aging
phenomena (deterioration, wear-out, fatigue,
corrosion etc) Applicable for mechanical and
other systems. (Properly qualified electronic
parts do not exhibit wear-out failure during its
intended service life) Weibull Failure Model can
be used
34
Exponential Distribution
Failure rate is age-independent (constant).

Cumul. distribution function
Reliability
Density Function
Failure Rate (CFR)
Mean Time to Failure

35
The Cumulative Distribution Function of an
Exponentially Distributed Random Variable With
Parameter ? 1
F(t)
1.0
F(t) 1 - e
- ? t
0.5
2.50
0
1.25
3.75
5.00
t
36
The Reliability Function of an Exponentially
Distributed Random Variable With Parameter ? 1
R(t)
1.0
0.5
2.50
0
1.25
3.75
5.00
t
37
Exponential Density Function (pdf)
f(t)
MTTF 1/ ?
38
Memoryless Property of the Exponential
Distribution

Assume X gt t. We have observed that the
component has not failed until time t
Let Y X - t , the remaining (residual) lifetime

39
Memoryless Property of the Exponential
Distribution (cont.)

Thus Gt(y) is independent of t and is identical
to the original exponential distribution of X
The distribution of the remaining life does not
depend on how long the component has been
operating
An observed failure is the result of some
suddenly appearing failure, not due to gradual
deterioration

40
Quiz 3 If two components (say, A and B) have
independent identical exponentially distributed
times to failure, by the memoryless property,
which of the following is true?

1. They will always fail at the same time
2. They have the same probability of failing at
time t during operation
3. When these two components are operating
simultaneously, the component which has been
operational for a shorter duration of time will
survive longer

41
Weibull Distribution

Distribution Function
Density Function
Reliability

42
Weibull Distribution
? shape parameter ? scale parameter.
Failure Rate
Dfr
Cfr
Ifr
43
Failure Rate of the Weibull Distribution with
Various Values of ?
44
Weibull Distribution for Various Values of ?
Cdf
density
45
Failure Rate Models

We use a truncated Weibull Model
Infant mortality phase modeled by DFR Weibull and
the steady-state phase by the exponential

Figure 2.34 Weibull Failure-Rate Model
7 6 5 4 3 2 1 0
Failure-Rate Multiplier
0
2,190
4,380
6,570
8,760
10,950
13,140
15,330
17,520
Operating Times (hrs)
46
Failure Rate Models (cont.)

This model has the form
where
steady-state failure rate
is Weibull shape parameter
Failure rate multiplier

47
Failure Rate Models (cont.)

There are several ways to incorporate time
dependent failure rates in availability models
The easiest way is to approximate a continuous
function by a piecewise constant step function

Discrete Failure-Rate Model
7 6 5 4 3 2 1 0
Failure-Rate Multiplier
2,190
4,380
6,570
10,950
13,140
15,330
17,520
8,760
0
Operating Times (hrs)
48
Failure Rate Models (cont.)

Here the discrete failure-rate model is defined
by

49
A lifetime experiment
X 1
1
X 2
2
X 3
3
X 4
4
X N
N
t 0
N i.i.d components are put in a life test
experiment.
50
A lifetime experiment
X 1
1
X 2
2
X 3
3
4
X 4
X N
N
51
Repairable systemsAvailability
52
Repairable systems
X 1
X 2
X 3
UP

DOWN
t
Y 1
Y 2
X 1, X 2 . X n Successive UP times Y1, Y 2
. Y n Successive DOWN times
53
Repairable systems

The usual hypothesis in modeling repairable
systems is that
The successive UP times X 1, X 2 . X n are
i.i.d. random variable i.e. samples from a
common cdf F (t)
The successive DOWN times Y1, Y 2 . Y n are
i.i.d. random variable i.e. samples from a
common cdf G (t)

54
Repairable systems
X 1
X 2
X 3
UP

DOWN
t
Y 1
Y 2

The dynamic behaviour of a repairable system is
characterized by
the r.v. X of the successive up times
the r.v. Y of the successive down times

55
Maintainability

Let Y be the r.v. of the successive down times
G(t) Pr Y ? t (maintainability)
d G(t)
g (t) (density)
dt
g(t)
h g (t) (repair rate)
1 - G(t)
MTTR ? t g(t) dt (Mean Time To
Repair)

?
0
56
Availability
The measure to characterize a repairable system
is the availability (unavailability)
The avaiability A(t) of an item at time t is the
probability that the item is correctly working at
time t.
57
Availability

The measure to characterize a repairable system
is the availability (unavailability)
A(t) Pr time t, system UP
U(t) Pr time t, system DOWN
A(t) U(t) 1

58
Definition of Availability

An important difference between reliability and
availability is
reliability refers to failure-free operation
during an interval (0 t)
availability refers to failure-free operation at
a given instant of time t (the time when a
device or system is accessed to provide a
required function), independently on the number
of cycles failure/repair.

59
Definition of Availability
I(t)
1
Failed and being restored
Operating and providing a required function
Operating and providing a required function
0
t
1 working 0 failed
I(t) indicator function
System Failure and Restoration Process
60
Availability evaluation

In the special case when times to failure and
times to restoration are both exponentially
distributed, the alternating process can be
viewed as a two-state homogeneous Continuous Time
Markov Chain

Time-independent failure rate
? Time-independent repair rate ?
61
2-State Markov Availability Model

Transient Availability analysis
for each state, we apply a flow balance equation
Rate of buildup rate of flow IN - rate of flow
OUT

62
2-State Markov Availability Model
63
2-State Markov Availability Model
1
A(t)
Ass
64
2-State Markov Model
1) Pointwise availability A(t)
2) Steady state availability limiting value as

If there is no restoration (?0) the
availability
becomes the reliability A(t) R(t)

65
Steady-state Availability

Steady-state availability
In many system models, the limit
exists and is called the steady-state availability

The steady-state availability represents the
probability of finding a system operational after
many fail-and-restore cycles.
66
Steady-state Availability
1
0
UP
DOWN
t
Expected UP time EU(t) MUT MTTF
Expected DOWN time ED(t) MDT MTTR
67
Availability Example (I)
Let a system have a steady state availability Ass
0.95 This means that, given a mission time T,
it is expected that the system works correctly
for a total time of 0.95T. Or, alternatively,
it is expected that the system is out of service
for a total time Uss T (1- Ass) T
68
Availability Example (II)
Let a system have a rated productivity of W
/year. The loss due to system out of service can
be estimated as Uss W (1- Ass) W The
availability (unavailability) is an index to
estimate the real productivity, given the rated
productivity.
Alternatively, if the goal is to have a net
productivity of W /year, the plant must be
designed such that its rated productivity W
should satisfy Uss W W
69
Availability
We can show that This result is valid without
making any assumptions on the form of the
distributions of times to failure times to
repair. Also
70
Motivation High Availability
71
Maintainability

MDT (Mean Down Time or MTTR - mean time to
restoration).
The total down time (Y ) consists of
Failure detection time
Alarm notification time
Dispatch and travel time of the repair person(s)
Repair or replacement time
Reboot time

72
Maintainability

The total down time (Y ) consists of
Logistic time
Administrative times
Dispatch and travel time of the repair
person(s)
Waiting time for spares, tools
Effective restoration time
Access and diagnosis time
Repair or replacement time
Test and reboot time

73
Maintenance Costs

The total cost of a maintenance action consists
of
Cost of spares and replaced parts
Cost of person/hours for repair
Down-time cost (loss of productivity)
The down-time cost (due to a loss of
productivity) can be the most relevant cost
factor.

74
Maintenance Policy