Some thoughts for the industry session - PowerPoint PPT Presentation

1 / 67

About This Presentation

Title:

Some thoughts for the industry session

Description:

Composite Performance and Dependability: Degradable Levels of Performance ... Composite Performance and Dependability ' ... IN ORDER TO FULFILL OUR GOALS OF ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 68

Provided by: sram3

Category:

more less

Transcript and Presenter's Notes

Title: Some thoughts for the industry session

1
Some thoughts for the industry session
Cochin Conference Dec 18, 2002

Prof. Kishor S. Trivedi
Department of Electrical and Computer Engineering
Duke University
Durham, NC 27708-0291
Phone (919)660-5269
e-mail kst_at_ee.duke.edu
At present visiting Professor IIT Kanpur, CSE
Dept.

2
What does industry want?

Well trained students
Short term research problems solved
Short courses on timely topics

3
What do faculty want?

Funding for their research
Place their students in good company labs
Hope to get their research results transferred to
industry
To get to know important and difficult problems
that can drive their research

4
Some lessons learned

Student placement should be guided by the advisor
Start early with summer internship
Patience is needed in listening to problems from
industry
Patience is needed in getting the IP problems
resolved
Expect to do at least 50 more work than the
funding provided
Tech transfer is a double edged sword
Practical problems can give rise to respectable
research papers
Short courses are ideal entry points

5
Characteristics of the Systemsbeing Studied
Dependability (Reliability, Availability, Safety)

Redundancy Hardware (Static,Dynamic),
Information, Time
Fault Types Permanent, Intermittent, Transient,
Design
Fault Detection, Automated Reconfiguration
Imperfect Coverage
Maintenance scheduled, unscheduled

6
Characteristics of the Systemsbeing Studied

Performance
Resource Contention, Concurrency and
Synchronization
Timeliness (Have to Meet Deadlines)
Composite Performance and Dependability
Degradable Levels of Performance
Need Techniques and Tools that can Evaluate
Systems with All the Characteristics Above
Explicitly Address Complexity

7
MEASURES TO BE EVALUATED

Dependability
Reliability R(t), System MTTF
Availability Steady-state, Transient, Interval
Safety
Does it work, and for how long?''
Performance
Throughput, Loss Probability, Response Time
Given that it works, how well does it work?''

8
MEASURES TO BE EVALUATED

Composite Performance and Dependability
How much work will be done(lost) in a given
interval including the effects of
failure/repair/contention?''
Need Techniques and Tools That Can Evaluate
Performance, Dependability and Their Combinations

9
PURPOSE OF EVALUATION

Understanding a System
Observation
Operational Environment
Controlled Environment
Reasoning
A Model is a Convenient Abstraction

10
PURPOSE OF EVALUATION

Predicting Behavior of a System
Need a Model
Accuracy Based on Degree of Extrapolation
All Models are Wrong Some Models are Useful
Prediction is fine as long as it is not about the
future

11
Methods of Quantitative EVALUATION

Measurement-Based
Most believable, most expensive
Not always possible or cost effective during
system design

12
Methods of Quantitative Evaluation(Continued)

Model-Based
Less believable, Less expensive
1. Discrete-Event Simulation vs. Analytic
2. State-Space Methods vs. Non-State-Space
Methods
3. Hybrid Simulation Analytic (SPNP)
4. State Space Non-State Space (SHARPE)

13
Why MODEL?

Provides a framework for gathering, organizing,
understanding and evaluating information about a
system e.g. Zitel, USS,HP
A cost-effective means to evaluate a system
e.g. Boeing, USS, HP,IBM, Motorola,
Cisco,SUN

14
Why MODEL? (continued)

Provides a means of evaluating a set of
alternatives in a structured and quantitative
manner e.g. Zitel, DEC,HP
Sometimes needed due to legal and contractual
obligations e.g. FAA
Sometimes needed for business reasons Motorola,
SUN, Cisco

15
Compare two CLIENT-SERVER Architectures

Architecture 2

Architecture 1
16
Compare Connection Reliabilities

Connection reliability R(t) is the probability
that throughout the interval 0,t) at least one
path exists from the client to server on which
all components are operational.
From R(t), system mean time to failure can be
computed

17
Compare Connection Reliabilities
18
Compare Connection Availabilities

Connection (instantaneous, transient or point)
availability A(t) is the probability that at time
t at least one path exists from the client to
server on which all components are operational.
A(t)?R(t) and limiting or steady-state
Availability

19
Compare Connection Availabilities
20
MODELING THROUGHOUT SYSTEM LIFECYCLE

System Specification/Design Phase
Answer What-if Questions''
Compare design alternatives (Zitel,HP,Motorola)
Performance-Dependability Trade-offs (DEC)
Design Optimization (wireless handoff)

21
MODELING THROUGHOUT SYSTEM LIFECYCLE

Design Verification Phase
Use Measurements Models
E.g. Fault/Injection Reliability Model
Union Switch and Signals, Boeing, Draper
Configuration Selection Phase DEC
System Operational Phase Lucent

It is fun!

22
CASE STUDY ZITEL

Comparison of two different fault-tolerant
RAMdisks.
Stochastic Petri Net Package (SPNP) was used to
model the two systems for their reliability.

23
CASE STUDY ZITEL

Trivedi worked with the designers directly
Model Validation was done using face validation
and sanity checks.
Parameterization was easy due to the experience
of the designers.
One difficult research problem originated from
the study Subsequently solved and published in
Microelectronics and Reliability journal.

24
CASE STUDY VAXCLUSTER

Developed three models of Processor Subsystem
Two-Level Decomposition (IEEE-TR, Apr 89)
Inner Level
9-state Markov
Outer level n parallel diodes
A Detailed SPN Model (PNPM 89)
A Detailed SPN model for Heterogeneous Cluster
(Averesky book)

25
CASE STUDY VAXCLUSTER

Storage Subsystem Model A fixed-point iteration
over a set of Markov submodels. (IEEE-TR, to
appear)
Observed that availability is maximized with 2
processors (HCSS 90)
Many interesting reliability, availability,
performability measures computed

26
Case Study HP

Cluster Availability Modeling
Server Availability
Mass Storage Arrays Availability Modeling
Started with Markov chains via SHARPE
Progressed toward Stochastic Petri Nets
and Stochastic Reward nets via SPNP

27
CASE STUDY LUCENT

A Validated Model of Hardware-Software
Availability.
Worked with V. Mendiratta of Naperville.
Model is semi-Markov solved using SHARPE.
Parameters collected form field data.
Model results validated against actual
measurements.

28
CASE STUDY LUCENT, IBM, Motorola, SUN

Software Rejuvenation
A technique to counter software aging and
increase its availability to clients.
Evaluated optimum rejuvenation interval which
maximizes steady state availability (minimizes
expected cost) for IBM cluster, Motorola CMTS
cluster
Collected data from real systems to show aging
and to determine proactive fault management
strategies. Worked in our lab, with SUN
Microsystems

29
CASE STUDY MOTOROLA

Availability Performability Modeling
Modeled several configurations of Communication
Enterprise Common Platform.
Practical approaches for approximating steady
state measures in large, repairable, and highly
dependable system model decomposition, state
space truncation, etc.
Both SHARPE and SPNP used.

30
CASE STUDY MOTOROLA

Recovery strategies in wireless handoff
proposed and modeled several strategies
a patent being filed by Motorola
SPNP was used
Hierarchy of two-level models used
Fixed-point iteration was used

31
CASE STUDY BELLCORE

Architecture-based software reliability
proposed a methodology
applied the methodology to SHARPE
used Bellcores test coverage tool, ATAC, to
parameterize the model
Bellcore is currently enhancing ATAC to
incorporate our methodology

32
CASE STUDY DRAPER LAB

Overall aim was Verification of system with very
high reliability/availability specifications.
Prototype under consideration was FTPP cluster
3.
Hybrid approach proposed
Fault injection based measurements.
Statistical analysis of measured data to enable
parameterization of analytical models.

33
CASE STUDY DRAPER LAB

Reliability modeling of the prototype done
Parameterization done with the aid of existing
reliability databases.
Analytical solution provided exact closed form
expressions
Markov model solved using SHARPE
Petri net model solved using SPNP
Reliability bottlenecks found

34
CASE STUDY AT T

GSHARPE
A Preprocessor to SHARPE developed at Bell Labs
by a Duke Student.
User can specify Weibull Failure times and
lognormal and other repair time distributions.
GSHARPE fits these to phase type distributions
and produces a Markov model that is generated for
processing by SHARPE

35
CASE STUDY BOEING

An Integrated Reliability Environment
A working prototype
Developed a high-level modeling language (SDM)
Designed and implemented an intelligent
interpreter

36
CASE STUDY BOEING (Continued)

Interpreter determines which solution method is
applicable
Five different modeling engines are integrated
CAFTA, SETS, EHARP, SHARPE and SPNP.

37
QUANTITATIVE EVALUATION TAXONOMY
Closed-form solution
Numerical solution using a tool
38
MODELING TAXONOMY
39
STATE SPACE MODELING TAXONOMY
40
ANALYTIC MODELING TAXONOMY

NON-STATE SPACE MODELING TECHNIQUES

Product form queuing models
SP reliability block diagrams
Non-SP reliability block diagrams
41
State Space Modeling Taxonomy
discrete-time Markov chains
Markovian modeling
continuous-time Markov chains
Markov reward models
State space methods
Semi-Markov models
non-Markovian modeling
Markov regenerative models
Non-Homogeneous Markov
42
State-Space Based Models

Transition label
Probability (homogeneous) discrete-time Markov
chain (DTMC)
Time-independent Rate homogeneous
continuous-time Markov chain
Time-dependent Rate non-homogeneous
continuous-time Markov chain
Distribution function semi Markov process
Two Dist. Functions Markov Regenerative Process

43
IN ORDER TO FULFILL OUR GOALS OF

Modeling Performance, Dependability and
Performability
Modeling Complex Systems
We Need
Automatic Generation and Solution of Large Markov
Reward Models

44
IN ORDER TO FULFILL OUR GOALS OF

Facility for State Truncation, Hierarchical
composition of Non-State-Space and State-Space
Models, Fixed-Point Iteration
There are Two Tools that Potentially meet these
Goals
Stochastic Petri Net Package (SPNP)
Symbolic Hierarchical Automated Rel. and Perf.
Evaluator (SHARPE)

45
MODELING SOFTWARE PACKAGES

HARP - Hybrid Automated Reliability Predictor
(Duke Univ, funded by NASA
Langley)
SAVE - System Availability Estimator
(Duke Univ. funded by IBM)
SHARPE - Symbolic Hierarchical Automated
Reliability and Performance Evaluator installed
at nearly 280 locations (GUI available)
SPNP - Stochastic Petri Net Package installed at
nearly 120 locations (iSPN - GUI available)
D_RAMP for Union Switch and Signals by Duke, UVA
and CMU
SDM - Boeing Integrated Reliability Modeling
Environment (Jointly developed by Duke Univ.,
Univ. of Wash. and Boeing)
SDDS - Developed by Sohar with the help from K.
Trivedi
SREPT - Software Reliability Estimation and
Prediction Tool

46
Challenges in Modeling
47
COMPLEXITIES OF MODELS

Large State Space
Model construction problem
Model solution problem
Model Stiffness.
Fast and slow rates acting together
Failure And Recovery/Repair
Performance and failure

48
COMPLEXITIES OF MODELS

Modeling Non-Exponential Distributions
Combining performance and reliability
Believability/Understandability/Usability
Incorporation in the design process
Connection between measurements models
Parameterization
Validation

49
LARGENESS TOLERANCE

Automated Model Construction
Stochastic Petri nets (GreatSPN, SPNP, SHARPE,
DSPNexpress, ULTRASAN)
High level languages (SAVE, QNAP, ASSIST, SDM)
Fault-Tree Recovery Info (HARP)
Object-Oriented Approaches (TANGRAM)
Loops in the specification of CTMC (SHARPE)

50
LARGENESS TOLERANCE

Efficient numerical solution techniques
Sparse Storage
Accurate and Efficient Solution Methods
We have Generated and Solved Models
with 1,000,000 states (has gone up
considerably recently)
Steady-State NEAR-Optimal SOR
Transient Modified Jensen's method

51
MODEL SPECIFICATION LANGUAGES

Different languages can be used to specify a
single model type
SAVE,QNAP,SPNP all appear very different
underlying model type is Markov
Same language can be used to specify different
model typesRESQ input language used for PFQN or
EQN

52
LARGENESS AVOIDANCE

Non-State-Space methods
Reliability block diagrams
Fault-trees
Product-Form Queuing Networks
Approximate solutions
State Truncation
SAVE, SPNP, ASSIST (Kantz and Trivedi PNPM91)

53
LARGENESS AVOIDANCE

Approximate solutions
Hierarchical Decomposition (Chapter 11)
and Fixed-Point Iteration among submodels
Heidelberger and Trivedi IEEE-TC,1983
(Queueing Models)
Ciardo and Trivedi PNPM91 (SPN Models)
Tomek and Trivedi (Availability Models)
Singhal (IEEE-TPDS, 1992)
Chapter 11 of Sahner et al.

54
LARGENESS AVOIDANCE

Approximate solutions
Time-Scale Decomposition
Bobbio and Trivedi(IEEE-TC1986) Section 11.2
Fluid Approximation
Miltra Kulkarni Ciardo Nicol, and Trivedi
FSPN
Performability (Chapters 6 and 12)

55
Difficulties in Modeling Using MRMs

Stiffness
Causes numerical difficulties in solution
Stiffness Tolerance
Develop stiffness tolerant numerical
solution methods
Stiffness Avoidance
Avoid generating stiff models through
decomposition

56
STIFFNESS TOLERANCE

Automatic Detection of Stiffness (HARP)
Special Stable ODE Solver
Reibman and Trivedi (TR-BDF2)
Computers and Operations Research, 1988.
Malhotra and Trivedi (Pade, Implicit RK)

57
STIFFNESS TOLERANCE

Uniformization for Stiff Markov Chains
Muppala and Trivedi
We can solve models with rate ratios of 108 or
higher
Implemented in SHARPE SPNP

58
STIFFNESS AVOIDANCE

Model-level decomposition
Behavioral Decomposition (HARP, Bobbio Trivedi)
Fault-Occurrence vs. Fault/Error Handling
Hierarchical Composition (SHARPE) Composition of
Submodel solutions without generating a single
one-level overall model
Fixed-Point Iteration (Ciardo and Trivedi SPNP)

59
Non-Exponential Behavior

Non state space models Fault Trees, Reliability
Graphs, RBDs no problem

60
Non-Exponential Behaviorin State Space Models
61
NON-EXPONENTIAL DISTRIBUTIONS

Phase-Type Expansions
Malhotra and Reibman (GSHARPE)
See Figure 9.38 on p. 191(Red Book)
Non-Homogeneous Markov Chains
CARE III, HARP
Soft Reliability model with imperfect repairs
solved using SHARPE

62
NON-EXPONENTIAL DISTRIBUTIONS

Semi-Markov Chains
Ciardo et al, IEEE-TC Oct. 90
Markov Regenerative Processes
Choi, Logothetis, Kulkarni, Trivedi
DSPN and MRSPN
Choi, Kulkarni, Trivedi
Discrete-Event Simulation
Now in SPNP (FSPN an Non-Markovian SPN
Simulation), RESQ, QNAP

63
BELIEVABILITYUNDERSTANDABILITY

Integration of Measurements and Models
Measurements Provide Parameters to Models
Models Provide Guidelines For Measurements
Models Validated Against Measurements
Integration of Different Modeling Tools
Boeing SDM project
IDEAS project at Duke

64
BELIEVABILITY/UNDERSTANDABILITY

Many Case-Studies of Validations Needed
Vaxcluster Availability Model Wein Sathaye
Hsueh, Iyer and Trivedi IEEE-TC, Apr. 1988
AT T Validation of ESS
Technology Transfer
Seminars and Workshops
Development and Dissemination of Tools
Application of the Techniques and Tools

65
MODELING AND MEASUREMENTS INTERFACES

Measurements supply Input Parameters to Models
(Model Calibration or Parameterization)
Confidence Intervals should be obtained
Boeing, Draper, Union Switch projects
Model Sensitivity Analysis can suggest which
Parameters to Measure More Accurately Blake,
Reibman and Trivedi SIGMETRICS 1988.

66
MODELING AND MEASUREMENTS INTERFACES

Model Validation
1. Face Validation
2. Input-Output Validation
3. Validation of Model Assumptions
(Hypothesis Testing)
Rejection of a hypothesis regarding model
assumption based on measurement data leads to an
improved model

67
MODELING AND MEASUREMENTS INTERFACES