Some thoughts for the industry session - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Some thoughts for the industry session

Description:

Composite Performance and Dependability: Degradable Levels of Performance ... Composite Performance and Dependability ' ... IN ORDER TO FULFILL OUR GOALS OF ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 68
Provided by: sram3
Category:

less

Transcript and Presenter's Notes

Title: Some thoughts for the industry session


1
Some thoughts for the industry session
Cochin Conference Dec 18, 2002
  • Prof. Kishor S. Trivedi
  • Department of Electrical and Computer Engineering
  • Duke University
  • Durham, NC 27708-0291
  • Phone (919)660-5269
  • e-mail kst_at_ee.duke.edu
  • At present visiting Professor IIT Kanpur, CSE
    Dept.

2
What does industry want?
  • Well trained students
  • Short term research problems solved
  • Short courses on timely topics

3
What do faculty want?
  • Funding for their research
  • Place their students in good company labs
  • Hope to get their research results transferred to
    industry
  • To get to know important and difficult problems
    that can drive their research

4
Some lessons learned
  • Student placement should be guided by the advisor
  • Start early with summer internship
  • Patience is needed in listening to problems from
    industry
  • Patience is needed in getting the IP problems
    resolved
  • Expect to do at least 50 more work than the
    funding provided
  • Tech transfer is a double edged sword
  • Practical problems can give rise to respectable
    research papers
  • Short courses are ideal entry points

5
Characteristics of the Systemsbeing Studied
Dependability (Reliability, Availability, Safety)
  • Redundancy Hardware (Static,Dynamic),
    Information, Time
  • Fault Types Permanent, Intermittent, Transient,
    Design
  • Fault Detection, Automated Reconfiguration
  • Imperfect Coverage
  • Maintenance scheduled, unscheduled

6
Characteristics of the Systemsbeing Studied
  • Performance
  • Resource Contention, Concurrency and
    Synchronization
  • Timeliness (Have to Meet Deadlines)
  • Composite Performance and Dependability
  • Degradable Levels of Performance
  • Need Techniques and Tools that can Evaluate
  • Systems with All the Characteristics Above
  • Explicitly Address Complexity

7
MEASURES TO BE EVALUATED
  • Dependability
  • Reliability R(t), System MTTF
  • Availability Steady-state, Transient, Interval
  • Safety
  • Does it work, and for how long?''
  • Performance
  • Throughput, Loss Probability, Response Time
  • Given that it works, how well does it work?''

8
MEASURES TO BE EVALUATED
  • Composite Performance and Dependability
  • How much work will be done(lost) in a given
    interval including the effects of
    failure/repair/contention?''
  • Need Techniques and Tools That Can Evaluate
  • Performance, Dependability and Their Combinations

9
PURPOSE OF EVALUATION
  • Understanding a System
  • Observation
  • Operational Environment
  • Controlled Environment
  • Reasoning
  • A Model is a Convenient Abstraction

10
PURPOSE OF EVALUATION
  • Predicting Behavior of a System
  • Need a Model
  • Accuracy Based on Degree of Extrapolation
  • All Models are Wrong Some Models are Useful
  • Prediction is fine as long as it is not about the
    future

11
Methods of Quantitative EVALUATION
  • Measurement-Based
  • Most believable, most expensive
  • Not always possible or cost effective during
    system design

12
Methods of Quantitative Evaluation(Continued)
  • Model-Based
  • Less believable, Less expensive
  • 1. Discrete-Event Simulation vs. Analytic
  • 2. State-Space Methods vs. Non-State-Space
    Methods
  • 3. Hybrid Simulation Analytic (SPNP)
  • 4. State Space Non-State Space (SHARPE)

13
Why MODEL?
  • Provides a framework for gathering, organizing,
    understanding and evaluating information about a
    system e.g. Zitel, USS,HP
  • A cost-effective means to evaluate a system
  • e.g. Boeing, USS, HP,IBM, Motorola,
  • Cisco,SUN

14
Why MODEL? (continued)
  • Provides a means of evaluating a set of
    alternatives in a structured and quantitative
    manner e.g. Zitel, DEC,HP
  • Sometimes needed due to legal and contractual
    obligations e.g. FAA
  • Sometimes needed for business reasons Motorola,
    SUN, Cisco

15
Compare two CLIENT-SERVER Architectures
  • Architecture 2

Architecture 1
16
Compare Connection Reliabilities
  • Connection reliability R(t) is the probability
    that throughout the interval 0,t) at least one
    path exists from the client to server on which
    all components are operational.
  • From R(t), system mean time to failure can be
    computed

17
Compare Connection Reliabilities
18
Compare Connection Availabilities
  • Connection (instantaneous, transient or point)
    availability A(t) is the probability that at time
    t at least one path exists from the client to
    server on which all components are operational.
  • A(t)?R(t) and limiting or steady-state
    Availability

19
Compare Connection Availabilities
20
MODELING THROUGHOUT SYSTEM LIFECYCLE
  • System Specification/Design Phase
  • Answer What-if Questions''
  • Compare design alternatives (Zitel,HP,Motorola)
  • Performance-Dependability Trade-offs (DEC)
  • Design Optimization (wireless handoff)

21
MODELING THROUGHOUT SYSTEM LIFECYCLE
  • Design Verification Phase
  • Use Measurements Models
  • E.g. Fault/Injection Reliability Model
  • Union Switch and Signals, Boeing, Draper
  • Configuration Selection Phase DEC
  • System Operational Phase Lucent
  • It is fun!

22
CASE STUDY ZITEL
  • Comparison of two different fault-tolerant
    RAMdisks.
  • Stochastic Petri Net Package (SPNP) was used to
    model the two systems for their reliability.

23
CASE STUDY ZITEL
  • Trivedi worked with the designers directly
  • Model Validation was done using face validation
    and sanity checks.
  • Parameterization was easy due to the experience
    of the designers.
  • One difficult research problem originated from
    the study Subsequently solved and published in
    Microelectronics and Reliability journal.

24
CASE STUDY VAXCLUSTER
  • Developed three models of Processor Subsystem
  • Two-Level Decomposition (IEEE-TR, Apr 89)
    Inner Level
    9-state Markov
    Outer level n parallel diodes
  • A Detailed SPN Model (PNPM 89)
  • A Detailed SPN model for Heterogeneous Cluster
    (Averesky book)

25
CASE STUDY VAXCLUSTER
  • Storage Subsystem Model A fixed-point iteration
    over a set of Markov submodels. (IEEE-TR, to
    appear)
  • Observed that availability is maximized with 2
    processors (HCSS 90)
  • Many interesting reliability, availability,
    performability measures computed

26
Case Study HP
  • Cluster Availability Modeling
  • Server Availability
  • Mass Storage Arrays Availability Modeling
  • Started with Markov chains via SHARPE
  • Progressed toward Stochastic Petri Nets
  • and Stochastic Reward nets via SPNP

27
CASE STUDY LUCENT
  • A Validated Model of Hardware-Software
    Availability.
  • Worked with V. Mendiratta of Naperville.
  • Model is semi-Markov solved using SHARPE.
  • Parameters collected form field data.
  • Model results validated against actual
    measurements.

28
CASE STUDY LUCENT, IBM, Motorola, SUN
  • Software Rejuvenation
  • A technique to counter software aging and
    increase its availability to clients.
  • Evaluated optimum rejuvenation interval which
    maximizes steady state availability (minimizes
    expected cost) for IBM cluster, Motorola CMTS
    cluster
  • Collected data from real systems to show aging
    and to determine proactive fault management
    strategies. Worked in our lab, with SUN
    Microsystems

29
CASE STUDY MOTOROLA
  • Availability Performability Modeling
  • Modeled several configurations of Communication
    Enterprise Common Platform.
  • Practical approaches for approximating steady
    state measures in large, repairable, and highly
    dependable system model decomposition, state
    space truncation, etc.
  • Both SHARPE and SPNP used.

30
CASE STUDY MOTOROLA
  • Recovery strategies in wireless handoff
  • proposed and modeled several strategies
  • a patent being filed by Motorola
  • SPNP was used
  • Hierarchy of two-level models used
  • Fixed-point iteration was used

31
CASE STUDY BELLCORE
  • Architecture-based software reliability
  • proposed a methodology
  • applied the methodology to SHARPE
  • used Bellcores test coverage tool, ATAC, to
    parameterize the model
  • Bellcore is currently enhancing ATAC to
    incorporate our methodology

32
CASE STUDY DRAPER LAB
  • Overall aim was Verification of system with very
    high reliability/availability specifications.
    Prototype under consideration was FTPP cluster
    3.
  • Hybrid approach proposed
  • Fault injection based measurements.
  • Statistical analysis of measured data to enable
    parameterization of analytical models.

33
CASE STUDY DRAPER LAB
  • Reliability modeling of the prototype done
    Parameterization done with the aid of existing
    reliability databases.
  • Analytical solution provided exact closed form
    expressions
  • Markov model solved using SHARPE
  • Petri net model solved using SPNP
  • Reliability bottlenecks found

34
CASE STUDY AT T
  • GSHARPE
  • A Preprocessor to SHARPE developed at Bell Labs
    by a Duke Student.
  • User can specify Weibull Failure times and
    lognormal and other repair time distributions.
  • GSHARPE fits these to phase type distributions
    and produces a Markov model that is generated for
    processing by SHARPE

35
CASE STUDY BOEING
  • An Integrated Reliability Environment
  • A working prototype
  • Developed a high-level modeling language (SDM)
  • Designed and implemented an intelligent
    interpreter

36
CASE STUDY BOEING (Continued)
  • Interpreter determines which solution method is
    applicable
  • Five different modeling engines are integrated
  • CAFTA, SETS, EHARP, SHARPE and SPNP.

37
QUANTITATIVE EVALUATION TAXONOMY
Closed-form solution
Numerical solution using a tool
38
MODELING TAXONOMY
39
STATE SPACE MODELING TAXONOMY
40
ANALYTIC MODELING TAXONOMY
  • NON-STATE SPACE MODELING TECHNIQUES

Product form queuing models
SP reliability block diagrams
Non-SP reliability block diagrams
41
State Space Modeling Taxonomy
discrete-time Markov chains
Markovian modeling
continuous-time Markov chains
Markov reward models
State space methods
Semi-Markov models
non-Markovian modeling
Markov regenerative models
Non-Homogeneous Markov
42
State-Space Based Models
  • Transition label
  • Probability (homogeneous) discrete-time Markov
    chain (DTMC)
  • Time-independent Rate homogeneous
    continuous-time Markov chain
  • Time-dependent Rate non-homogeneous
    continuous-time Markov chain
  • Distribution function semi Markov process
  • Two Dist. Functions Markov Regenerative Process

43
IN ORDER TO FULFILL OUR GOALS OF
  • Modeling Performance, Dependability and
    Performability
  • Modeling Complex Systems
  • We Need
  • Automatic Generation and Solution of Large Markov
    Reward Models

44
IN ORDER TO FULFILL OUR GOALS OF
  • Facility for State Truncation, Hierarchical
    composition of Non-State-Space and State-Space
    Models, Fixed-Point Iteration
  • There are Two Tools that Potentially meet these
    Goals
  • Stochastic Petri Net Package (SPNP)
  • Symbolic Hierarchical Automated Rel. and Perf.
    Evaluator (SHARPE)

45
MODELING SOFTWARE PACKAGES
  • HARP - Hybrid Automated Reliability Predictor
    (Duke Univ, funded by NASA
    Langley)
  • SAVE - System Availability Estimator
    (Duke Univ. funded by IBM)
  • SHARPE - Symbolic Hierarchical Automated
    Reliability and Performance Evaluator installed
    at nearly 280 locations (GUI available)
  • SPNP - Stochastic Petri Net Package installed at
    nearly 120 locations (iSPN - GUI available)
  • D_RAMP for Union Switch and Signals by Duke, UVA
    and CMU
  • SDM - Boeing Integrated Reliability Modeling
    Environment (Jointly developed by Duke Univ.,
    Univ. of Wash. and Boeing)
  • SDDS - Developed by Sohar with the help from K.
    Trivedi
  • SREPT - Software Reliability Estimation and
    Prediction Tool

46
Challenges in Modeling
47
COMPLEXITIES OF MODELS
  • Large State Space
  • Model construction problem
  • Model solution problem
  • Model Stiffness.
  • Fast and slow rates acting together
  • Failure And Recovery/Repair
  • Performance and failure

48
COMPLEXITIES OF MODELS
  • Modeling Non-Exponential Distributions
  • Combining performance and reliability
  • Believability/Understandability/Usability
  • Incorporation in the design process
  • Connection between measurements models
  • Parameterization
  • Validation

49
LARGENESS TOLERANCE
  • Automated Model Construction
  • Stochastic Petri nets (GreatSPN, SPNP, SHARPE,
    DSPNexpress, ULTRASAN)
  • High level languages (SAVE, QNAP, ASSIST, SDM)
  • Fault-Tree Recovery Info (HARP)
  • Object-Oriented Approaches (TANGRAM)
  • Loops in the specification of CTMC (SHARPE)

50
LARGENESS TOLERANCE
  • Efficient numerical solution techniques
  • Sparse Storage
  • Accurate and Efficient Solution Methods
  • We have Generated and Solved Models
  • with 1,000,000 states (has gone up
  • considerably recently)
  • Steady-State NEAR-Optimal SOR
  • Transient Modified Jensen's method

51
MODEL SPECIFICATION LANGUAGES
  • Different languages can be used to specify a
    single model type
  • SAVE,QNAP,SPNP all appear very different
    underlying model type is Markov
  • Same language can be used to specify different
    model typesRESQ input language used for PFQN or
    EQN

52
LARGENESS AVOIDANCE
  • Non-State-Space methods
  • Reliability block diagrams
  • Fault-trees
  • Product-Form Queuing Networks
  • Approximate solutions
  • State Truncation
  • SAVE, SPNP, ASSIST (Kantz and Trivedi PNPM91)

53
LARGENESS AVOIDANCE
  • Approximate solutions
  • Hierarchical Decomposition (Chapter 11)
  • and Fixed-Point Iteration among submodels
  • Heidelberger and Trivedi IEEE-TC,1983
  • (Queueing Models)
  • Ciardo and Trivedi PNPM91 (SPN Models)
  • Tomek and Trivedi (Availability Models)
  • Singhal (IEEE-TPDS, 1992)
  • Chapter 11 of Sahner et al.

54
LARGENESS AVOIDANCE
  • Approximate solutions
  • Time-Scale Decomposition
  • Bobbio and Trivedi(IEEE-TC1986) Section 11.2
  • Fluid Approximation
  • Miltra Kulkarni Ciardo Nicol, and Trivedi
  • FSPN
  • Performability (Chapters 6 and 12)

55
Difficulties in Modeling Using MRMs
  • Stiffness
  • Causes numerical difficulties in solution
  • Stiffness Tolerance
  • Develop stiffness tolerant numerical
  • solution methods
  • Stiffness Avoidance
  • Avoid generating stiff models through
  • decomposition

56
STIFFNESS TOLERANCE
  • Automatic Detection of Stiffness (HARP)
  • Special Stable ODE Solver
  • Reibman and Trivedi (TR-BDF2)
  • Computers and Operations Research, 1988.
  • Malhotra and Trivedi (Pade, Implicit RK)

57
STIFFNESS TOLERANCE
  • Uniformization for Stiff Markov Chains
  • Muppala and Trivedi
  • We can solve models with rate ratios of 108 or
    higher
  • Implemented in SHARPE SPNP

58
STIFFNESS AVOIDANCE
  • Model-level decomposition
  • Behavioral Decomposition (HARP, Bobbio Trivedi)
    Fault-Occurrence vs. Fault/Error Handling
  • Hierarchical Composition (SHARPE) Composition of
    Submodel solutions without generating a single
    one-level overall model
  • Fixed-Point Iteration (Ciardo and Trivedi SPNP)

59
Non-Exponential Behavior
  • Non state space models Fault Trees, Reliability
    Graphs, RBDs no problem

60
Non-Exponential Behaviorin State Space Models
61
NON-EXPONENTIAL DISTRIBUTIONS
  • Phase-Type Expansions
  • Malhotra and Reibman (GSHARPE)
  • See Figure 9.38 on p. 191(Red Book)
  • Non-Homogeneous Markov Chains
  • CARE III, HARP
  • Soft Reliability model with imperfect repairs
  • solved using SHARPE

62
NON-EXPONENTIAL DISTRIBUTIONS
  • Semi-Markov Chains
  • Ciardo et al, IEEE-TC Oct. 90
  • Markov Regenerative Processes
  • Choi, Logothetis, Kulkarni, Trivedi
  • DSPN and MRSPN
  • Choi, Kulkarni, Trivedi
  • Discrete-Event Simulation
  • Now in SPNP (FSPN an Non-Markovian SPN
  • Simulation), RESQ, QNAP

63
BELIEVABILITYUNDERSTANDABILITY
  • Integration of Measurements and Models
  • Measurements Provide Parameters to Models
  • Models Provide Guidelines For Measurements
  • Models Validated Against Measurements
  • Integration of Different Modeling Tools
  • Boeing SDM project
  • IDEAS project at Duke

64
BELIEVABILITY/UNDERSTANDABILITY
  • Many Case-Studies of Validations Needed
  • Vaxcluster Availability Model Wein Sathaye
  • Hsueh, Iyer and Trivedi IEEE-TC, Apr. 1988
  • AT T Validation of ESS
  • Technology Transfer
  • Seminars and Workshops
  • Development and Dissemination of Tools
  • Application of the Techniques and Tools

65
MODELING AND MEASUREMENTS INTERFACES
  • Measurements supply Input Parameters to Models
  • (Model Calibration or Parameterization)
  • Confidence Intervals should be obtained
  • Boeing, Draper, Union Switch projects
  • Model Sensitivity Analysis can suggest which
    Parameters to Measure More Accurately Blake,
    Reibman and Trivedi SIGMETRICS 1988.

66
MODELING AND MEASUREMENTS INTERFACES
  • Model Validation
  • 1. Face Validation
  • 2. Input-Output Validation
  • 3. Validation of Model Assumptions
  • (Hypothesis Testing)
  • Rejection of a hypothesis regarding model
    assumption based on measurement data leads to an
    improved model

67
MODELING AND MEASUREMENTS INTERFACES
  • Model Structure Based on Measurement Data Hsueh,
    Iyer and Trivedi IEEE TC, April 1988 Gokhale et
    al, IPDS 98
Write a Comment
User Comments (0)
About PowerShow.com