Probability for Computer Science - PowerPoint PPT Presentation

About This Presentation
Title:

Probability for Computer Science

Description:

Microsoft Regular full page ad on 99.999% availability in USA Today. 15 ... is non-deterministic and dependent on the software reaching very rare states ... – PowerPoint PPT presentation

Number of Views:240
Avg rating:3.0/5.0
Slides: 123
Provided by: sram3
Category:
Tags: computer | in | map | of | probability | science | states | usa

less

Transcript and Presenter's Notes

Title: Probability for Computer Science


1
Probability for Computer Science
IIT Kanpur
  • Kishor S. Trivedi
  • Visiting Prof. Of Computer Science and
    Engineering, IITK
  • Prof. Department of Electrical and Computer
    Engineering
  • Duke University
  • Durham, NC 27708-0291
  • Phone 7576
  • e-mail kst_at_ee.duke.edu
  • URL www.ee.duke.edu/kst

2
Outline
  • Introduction
  • Preliminaries Sample Space, Probability Axioms,
    Independence, Conditioning,Binomial Trials
  • Random Variables Binomial, Poisson, Exponential,
    Weibull, Erlang, Hyperexponential,
    Hypoexponential, Pareto, Defective
  • Reliability, Hazard Rate
  • Average Case Analysis of Program Performance
  • Reliability Analysis Using Block Diagrams and
    Fault Trees
  • Reliability of Standby Systems
  • Statistical Inference Including Confidence
    Intervals
  • Hypothesis Testing
  • Regression

3
Schedule Textbook
  • Schedule Jan 21, 23, 28 and Feb 6, 18, 25, 27
  • Probability Statistics with reliability,
    queuing,
  • and computer science applications, K. S.
    Trivedi, second edition, John Wiley Sons, 2001
    (Indian paperback).

4
Program Performance Evaluation
  • Worst-case vs. Average case analysis
  • Data-structure-oriented vs. Control
    structure-oriented
  • Sequential vs. Concurrent
  • Centralized vs. Distributed
  • Structured vs. with unrestricted transfer of
    control
  • Unlimited (hardware) resources vs. limited
    resources
  • Software architecture modules, their
    characteristics (execution time) and interactions
    (branching, looping)
  • Characteristics of hardware on which the software
    is run
  • Measures completion time (mean, variance
    dist.), thruput
  • Measurements or Models (simulation vs. analytic)
  • analytic models combinatorial, DTMC, SMP,
    CTMC, SPN

5
System Performance Evaluation
  • Workload traffic arrivals, service time
    distributions
  • pattern of resource
    requests
  • Hardware architecture and software architecture
  • Resource Contention, Scheduling Allocation
  • Concurrency, Synchronization, distributed
    processing
  • Timeliness (Have to Meet Deadlines)
  • Measures Thruput, Goodput, loss probability,
  • response time or delay
    (mean, variance dist.)
  • Low-level (Cache, memory interference ch. 7)
  • System-level (CPU-I/O, multiprocessing ch. 8,9)
  • Network-level (protocols, handoff in wireless
    ch. 7,8)
  • Measurements or models (simulation or analytic)
  • analytic models DTMC, CTMC, PFQN, SPN

6
System Performance Evaluation
  • Workload
  • Single vs. multiple types of requests (classes,
    chains) in the latter case, the following three
    items needed for each type of request
  • traffic arrivals one time vs. a stream
  • stream Poisson (Bernoulli), General renewal,
    IPP (IBP), MMPP(MMBP), MAP, BMAP, NHPP,
    Self-similar
  • service time distributions Exponential
    (geometric), deterministic, uniform, Erlang,
    Hyperexponential, Hypoexponential, Phase-type,
    general (with finite mean and variance), Pareto
  • pattern of resource requests service time
    distribution (or the mean) at each resource per
    visit, branching probabilities often described
    as a DTMC (discrete-time Markov chain) and can
    also be seen as the behavior of an individual
    program
  • All this information should be collected from
    actual measurements (if possible) followed by
    statistical inference

7
Software Reliability
  • Black-box (measurements statistical inference)
    vs. Architecture-based approach (models)
  • Black-box approaches treat software as a
    monolithic whole, considering only its
    interactions with external environment, without
    an attempt to model its internal structure
  • With growing emphasis on reuse, software
    development process moves toward component-based
    software design
  • White-box approach may be better to analyze a
    system with many software components and how they
    fit together

8
Software Architecture
  • Software behavior with respect to the manner in
    which different components interact
  • May include the information about the execution
    time of each component
  • Use control flow graph to represent architecture
  • Sequential program architecture modeled by
  • Discrete Time Markov Chain (DTMC)
  • Continuous Time Markov Chain (CTMC)
  • Semi-Markov process (SMP)

9
Failure Behavior of Components and Interfaces
  • Failure can happen
  • during the execution of any component or
  • during the transfer of control between components
  • Failure behavior can be specified in terms of
  • reliability
  • constant failure rate
  • time-dependent failure intensity

10
System Reliability/Availability
  • Faultload fault types, fault arrivals,
    repair/recovery procedures and delay time
    distributions
  • Hardware architecture and software architecture
  • Minimum Resource Requirements
  • Dynamic failures
  • Performance/Reliability interdependence
  • Measures Reliability, Availability, MTTF,
    Downtime
  • Low-level (Physics of failures, chip level)
  • System-level (CPU-I/O, multiprocessing ch. 8,9)
  • Software and Hardware combined together
  • Network-level
  • Measurements or models (simulation or analytic)
  • analytic models RBD, FTREE, CTMC, SPN

11
Definition of Reliability
  • Recommendations E.800 of the International
    Telecommunications Union (ITU-T) defines
    reliability as follows
  • The ability of an item to perform a required
    function under given conditions for a given time
    interval.
  • In this definition, an item may be a circuit
    board, a component on a circuit board, a module
    consisting of several circuit boards, a base
    transceiver station with several modules, a
    fiber-optic transport-system, or a mobile
    switching center (MSC) and all its subtending
    network elements. The definition includes systems
    with software.

12
Definition of Availability
  • Availability is closely related to reliability,
    and is also defined in ITU-T Recommendation E.800
    as follows1
  • "The ability of an item to be in a state to
    perform a required function at a given instant of
    time or at any instant of time within a given
    time interval, assuming that the external
    resources, if required, are provided."
  • An important difference between reliability and
    availability is that reliability refers to
    failure-free operation during an interval, while
    availability refers to failure-free operation at
    a given instant of time, usually the time when a
    device or system is first accessed to provide a
    required function or service

13
High Reliability/Availability/Safety
  • Traditional applications
  • (long-life/life-critical/safety-critical)
  • Space missions, aircraft control, defense,
    nuclear systems
  • New applications
  • (non-life-critical/non-safety-critical,
    business critical)
  • Banking, airline reservation, e-commerce
    applications, web-hosting, telecommunication
  • Scientific applications
  • (non-critical)

14
Motivation High Availability
  • Scott McNealy, Sun Microsystems Inc.
  • "We're paying people for uptime.The only thing
    that really matters is uptime, uptime, uptime,
    uptime and uptime. I want to get it down to a
    handful of times you might want to bring a Sun
    computer down in a year. I'm spending all my time
    with employees to get this design goal
  • SUN Microsystems SunUP RASCAL program for
    high-availability
  • Motorola - 5NINES Initiative
  • HP, Cisco, Oracle, SAP - 5nines5minutes Alliance
  • IBM Cornhusker clustering technology for
    high-availability, eLiza, autonomic computing
  • Microsoft Trustable computing initiative
  • John Hennessey in IEEE Computer
  • Microsoft Regular full page ad on 99.999
    availability in USA Today

15
Motivation High Availability
16
Need for a new term
  • Reliability is used in a generic sense
  • Reliability used as a precisely defined
    mathematical function
  • To remove confusion, IFIP WG 10.4 has proposed
    Dependability as an umbrella term

17
Dependability Umbrella term
Trustworthiness of a computer system such that
reliance can justifiably be placed on the service
it delivers
18
IFIP WG10.4
  • Failure occurs when the delivered service no
    longer complies with the specification
  • Error is that part of the system state which is
    liable to lead to subsequent failure
  • Fault is adjudged or hypothesized cause of an
    error

Faults are the cause of errors that may lead to
failures
Fault
Error
Failure
19
DependabilityReliability, Availability,Safety,
Security
  • Redundancy Hardware (Static,Dynamic),
    Information, Time, software
  • Fault Types Permanent (needs repair or
    replacement), Intermittent (reboot/restart or
    replacement), Transient (retry), Design
    Heisenbugs, Aging related bugs
  • Bohrbugs
  • Fault Detection, Automated Reconfiguration
  • Imperfect Coverage
  • Maintenance scheduled, unscheduled

20
Software Fault Classification
  • Many software bugs are reproducible, easily
    found and fixed during the testing and debugging
    phase

Bohrbugs
  • Other bugs that are hard to find and fix remain
    in the software during the operational phase
  • These bugs may never be fixed, but if the
    operation is retried or the system is rebooted,
    the bugs may not manifest themselves as failures
  • manifestation is non-deterministic and dependent
    on the software reaching very rare states

Heisenbugs
21
Software Fault Classification
22
Failure Classification (Cristian)
  • Failures
  • Omission failures (Send/receive failures)
  • Crash failures
  • Infinite loop
  • Timing failures
  • Early
  • Late (performance or dynamic failures)
  • Response failures
  • Value failures
  • State-transition failures

23
Security
  • Security intrusions cause a system to fail
  • Security Failure
  • Integrity Destruction/Unauthorized modification
    of information
  • Confidentiality Theft of information
  • Availability e.g., Denial of Services (DoS)
  • Similarity (as well as differences) between
  • Malicious vs. accidental faults
  • Security vs. reliability/availability
  • Intrusion tolerance vs. fault tolerance

24
The Need of Performability Modeling
  • New technologies, services standards need new
    modeling methodologies
  • Pure performance modeling too optimistic!
  • Outage-and-recovery behavior not considered
  • Pure dependability modeling too conservative!
  • Different levels of performance not considered

25
ilities besides performance
Performability measures of the systems ability to
perform designated functions
R.A.S.-ability concerns grow. High-R.A.S. not
only a selling point for equipment vendors and
service providers. But, regulatory outage report
required by FCC for public switched telephone
networks (PSTN) may soon apply to wireless.
26
Evaluation vs. Optimization
  • Evaluation of system for desired measures given a
    set of parameters
  • Sensitivity Analysis
  • Bottleneck analysis
  • Reliability importance
  • Optimization
  • StaticLinear,integer,geometric,nonlinear,
    multi-objective constrained or unconstrained
  • Dynamic Dynamic programming, Markov decision
    process, semi-Markov decision process

27
PURPOSE OF EVALUATION
  • Understanding a system
  • Observation
  • Operational environment
  • Controlled environment
  • Reasoning
  • A model is a convenient abstraction
  • Predicting behavior of a system
  • Need a model
  • Accuracy based on degree of extrapolation

28
PURPOSE OF EVALUATION(Continued)
  • These famous quotes bring out the difficulty of
    prediction
  • based on models
  • All Models are Wrong Some Models are Useful
    George Box
  • Prediction is fine as long as it is not about
    the future
  • Mark Twain

29
Basic Definitions
  • Reliability R(t)
  • X time to failure of a system
  • F(t) distribution function of system lifetime
  • Mean Time To system Failure
  • f(t) density function of system lifetime

30
Availability (Continued)
  • Instantaneous (point) Availability A(t)
  • A(t) P (system working at t)
  • Let H(t) be the convolution of F and G
  • g(t) density function of system repair time
  • Then

  • Inst. Availability , ,
    Reliability

31
Availability
Never failed in (0,t), prob R(t)
  • System working at time t

First failed and got repaired at time xltt UP at
end of interval (x,t), prob
x dx
t
x
0
First repair completed here
32
Availability (Continued)
  • MTTR Mean Time to Repair
  • Y repair period of the system
  • Availability and Reliability are related but
    different!

33
Availability (Continued)
  • Steady-State Availability
  • We can show that for systems without redundancy
  • For a system with redundancy
  • where MTTFeq MTTReq must be carefully
    defined
  • Also

34
MEASURES TO BE EVALUATED
  • Dependability
  • Reliability R(t), System MTTF
  • Availability Steady-state, Transient
  • Downtime
  • Performance
  • Throughput, Blocking Probability, Response Time

Does it work, and for how long?''
Given that it works, how well does it work?''
35
MEASURES TO BE EVALUATED (Continued)
  • Composite Performance and Dependability
  • Need Techniques and Tools That Can Evaluate
  • Performance, Dependability and Their Combinations

How much work will be done(lost) in a given
interval including the effects of
failure/repair/contention?''
36
Methods of EVALUATION
  • Measurement-Based
  • Most believable, most expensive
  • Not always possible or cost effective during
    system design
  • Statistical techniques are very important here
  • Model-Based

37
Methods of EVALUATION(Continued)
  • Model-Based
  • Less believable, Less expensive
  • 1. Discrete-Event Simulation vs. Analytic
  • 2. State-Space Methods vs. Non-State-Space
    Methods
  • 3. Hybrid Simulation Analytic (SPNP)
  • 4. State Space Non-State Space (SHARPE)

38
Methods of EVALUATION(Continued)
  • Measurements Models
  • Vaidyanathan et al ISSRE 99

39
QUANTITATIVE EVALUATION TAXONOMY
Closed-form solution
Numerical solution using a tool
40
Note that
  • Both measurements simulations imply statistical
    analysis of outputs (ch. 10,11)
  • Statistical inference
  • Hypothesis testing
  • Design of experiments
  • Analysis of variance
  • Regression (linear, nonlinear)
  • Distribution driven simulation requires
    generation of random deviates (variates) (ch. 3,
    4, 5)
  • Probability and Statistics are different yet
    highly related
  • Probability models need inputs that generally
    come from measurement data (followed by
    statistical inference)
  • Statistics in turn uses probability theory

41
MODELING THROUGHOUT SYSTEM LIFECYCLE
  • System Specification/Design Phase
  • Answer What-if Questions''
  • Compare design alternatives (Bedrock, Wireless
    handoff)
  • Performance-Dependability Trade-offs (Wireless
    Handoff)
  • Design Optimization (optimizing the number of
    guard channels)

42
MODELING THROUGHOUT SYSTEM LIFECYCLE (Continued)
  • Design Verification Phase
  • Use Measurements Models
  • E.g. Fault/Injection Availability Model
  • Union Switch and Signals, Boeing, Draper
  • Configuration Selection Phase DEC, HP
  • System Operational Phase IDEN Project
  • Workload based adaptive rejuvenation
  • It is fun!

43
MODELING TAXONOMY
44
MODELER'S DILEMMA
  • Should I Use Discrete-Event Simulation?
  • Point Estimates and Confidence Intervals
  • How many simulation runs are sufficient?
  • What Specification Language to use?
  • C, SIMULA, SIMSCRIPT, MODSIM, GPSS, RESQ, SPNP
    v6, Bones, SES workbench

45
MODELER'S DILEMMA (Continued)
  • Simulation
  • Detailed System Behavior including
    non-exponential distributions non-Poisson or
    processes
  • Performance, Availability and Performability
    Modeling Possible
  • - Long Execution Time (Variance Reduction
    Possible)
  • Importance Sampling, importance splitting,
    regenerative simulation.
  • Parallel and Distributed Simulation
  • - Many users in practice do not realize the need
    to calculate confidence intervals

46
MODELER'S DILEMMA (Continued)
Should I Use Non-State-Space Methods?
  • Model Solved Without Generating State Space
  • Also Known as Combinatorial Models
  • Use Order Statistics, Mixing, Convolution
  • Common Dependability Model Types
  • also called Combinatorial Models
  • Series-Parallel Reliability Block Diagrams (RBD)
  • Non-Series-Parallel Block Diagrams (or
    Reliability Graphs)
  • Fault-Trees Without Repeated Events
  • Fault-Trees With Repeated Events

47
RBD example
48
RELIABILITY GRAPH Example
49
Fault tree without repeated events
50
FAULT TREE WITH REPEATED EVENTS
EXAMPLE
51
(No Transcript)
52
Combinatorial Models
  • These techniques easy to use and solve for
  • Mincuts
  • System Availability(steady-state, inst.)
  • Downtime in minutes/year
  • System Reliability, System MTTF
  • Each component can have attached to it
  • A probability of failure
  • A failure rate
  • A distribution of time to failure
  • A failure rate and a repair rate

53
Combinatorial Modeling (Continued)
  • These models can be solved using fast algorithms
    assuming stochastic independence between system
    components. Systems with several hundred
    components can be handled.
  • For series-parallel RBDs fault trees w/o
    repeated events
  • Series-parallel composition algorithms
  • For fault trees with repeated events and
    reliability graphs
  • Factoring (conditioning) algorithms
  • Sum of disjoint products (SDP) algorithms after
    first finding all mincuts
  • Binary decision diagrams (BDD) algorithms

54
Combinatorial Modeling (Continued)
  • Easy specification, fast computation, no
    distributional assumption
  • Can easily solve models with 100s of
    components
  • - Failure/Repair Dependencies are often present
    RBDs, FTREEs cannot easily handle these
  • (e.g., shared repair, warm/cold spares, imperfect
    coverage, non-zero switching time, travel time of
    repair person, reliability with repair)

55
COMBINATORIAL MODELING TAXONOMY
SP reliability block diagrams
Non-SP reliability block diagrams
56
Markov chain
  • To model more complicated interactions between
    components, use other kinds of models like Markov
    chains or more generally state space models.
  • Many examples of dependencies among system
    components have been observed in practice and
    captured by Markov models.

57
State-Space-Based Models
  • States and labeled state transitions
  • State can keep track of
  • Number of functioning resources of each type
  • States of recovery for each failed resource
  • Number of tasks of each type waiting at each
    resource
  • Allocation of resources to tasks
  • A transition
  • Can occur from any state to any other state
  • Can represent a simple or a compound event

58
State-Space-Based Models (Continued)
  • Transitions between states represent the change
    of the system state due to the occurrence of an
    event
  • Drawn as a directed graph
  • Transition label
  • Probability homogeneous discrete-time Markov
    chain (DTMC)
  • Rate homogeneous continuous-time Markov chain
    (CTMC)
  • Time-dependent rate non-homogeneous CTMC
  • Distribution function semi-Markov process (SMP)
  • Two distribution functions Markov regenerative
    process (MRGP)

59
MODELER'S DILEMMA (Continued)
  • Should I Use Markov Models?
  • State-Space-Based Methods
  • Model Fault-Tolerance and Recovery/Repair
  • Combined Modeling of hardware and software
  • Model Dependencies
  • Model Contention for Resources
  • Model Concurrency and Timeliness

60
Condition-Based Maintenance
  • Failure model is stage type with k stages
  • Inspection carried out randomly to determine
    degradation stage
  • Determine optimal inspection interval
  • Many extensions to this model are available

61
Condition-Based Maintenance
Availability
  • Mean time between inspections

62
Webserver Availability Model with Warm Replication
  • Two nodes for hardware redundancy
  • Each node has a copy of the webserver (software
    redundancy replication)
  • Primary node can fail
  • Secondary node can fail
  • Primary process can fail
  • Secondary process can fail
  • Failures may have imperfect coverage
  • Time delay for fault detection
  • Model of a real system developed at Avaya Labs
  • Both hardware software faults included

63
Markov Model with Software and Hardware Faults
Performance and Reliability Evaluation of
Passive Replication Schemes in Application Level
Fault-Tolerance S. Garg, Y. Huang, C. Kintala,
K. S. Trivedi and S. Yagnik Proc. of the 29th
Intl. Symp. On Fault-Tolerant Computing, FTCS-29,
June 1999.
64
Parameters
  • Process MTTF 10 days (1/?p)
  • Node MTTF 20 days (1/?n)
  • Process polling interval 2 seconds (1/?p)
  • Mean process restart time 30 seconds (1/?p)
  • Mean process failover time 2 minutes (1/?n)
  • Switching time with mean 1/ ?s
  • C 0.95

65
Solution for Warm replication
66
MULTIPROCESSOR AVAILABILITY MODEL
  • n Processors, at least 1 Needed for System to be
    UP
  • Each Processor Fails at Rate ?
  • Each Processor is Repaired at Rate ?
  • Coverage Probability c
  • Average Reconfiguration Delay After a Covered
    Failure 1/?
  • Ave. Reboot Delay After an Uncovered Failure 1/?
  • Not possible to capture these realistic aspects
    in a combinatorial model
  • Model System Availability Using a Markov Chain

67
MULTIPROCESSOR AVAILABILITY MODEL
Dn
Dn-1
...............
n
n-1
n-2
1
0
Bn
Bn-1
68
(No Transcript)
69
(No Transcript)
70
LESSONS
  • To Realize Availability Benefits of
    Multiprocessing
  • Coverage Must be Near-Perfect
  • Reconfiguration Delay Must be Very Small
  • .
  • Must Consider Different Levels of (Degradable)
    Performance

71
Markov Reward Models (MRMs)
  • Modeling any system with a pure reliability /
    availability model can lead to incomplete, or, at
    least, less precise results.
  • Gracefully degrading systems may be able to
    survive the failure of one or more of their
    active components and continue to provide service
    at a reduced level.
  • Markov reward model is commonly used technique
    for the modeling of gracefully degradable system

72
Markov Reward Models (MRMs)
  • Continuous Time Markov Chains are useful models
    for performance as well as availability
    prediction
  • Extension of CTMC to Markov reward models make
    them even more useful
  • Attach a reward rate ri to state i of CTMC
  • X(t) is instantaneous reward rate of CTMC

73
Markov Reward Models (MRMs) (Continued)
  • Expected instantaneous reward rate at time t
  • this generalizes instantaneous availability
    where
  • is the prob. that the Markov chain is in state
    i at time t
  • Expected steady-state reward rate
  • this generalizes steady-state availability
    where
  • is the prob. that the Markov chain is in state
    i in steady-state

74
Performance model
  • Use a Finite Buffer Queuing Model To Determine
    The Prob. Task is Rejected Due to Buffer Full
  • Task Arrival Rate ? task Service Rate ?
  • Number of Buffers b
  • Buffer Full Prob. qb(i) with i Processors
  • Results from the lower level performance model
    used to assign reward rates to the upper level
    availability model
  • Queuing model
  • M/M/i/b

1
. . .
i
b
75
TOTAL BLOCKING PROBABILITY
  • ri 1 if i is a down state
  • if i is an up state

76
TOTAL BLOCKING PROBABILITY
77
(No Transcript)
78
MODELER'S DILEMMA (Continued)
  • Should I Use Markov Models?
  • Generalize to Markov Reward Models for Modeling
    Degradable Performance
  • Generalize to Markov Regenerative Models for
    Allowing Generally Distributed Event Times
  • Generalize to Non-Homogeneous Markov Chains for
    Allowing Weibull Failure Distributions
  • Performance, Availability and Performability
    Modeling Possible
  • - Large (Exponential) State Space

79
State Space Modeling Taxonomy
discrete-time Markov chains
Markovian modeling
continuous-time Markov chains
Markov reward models
State space methods
Semi-Markov models
non-Markovian modeling
Markov regenerative models
Non-Homogeneous Markov
80
State Space Explosion
  • State space explosion can be handled in two ways
  • Largeness tolerance
  • Model specification use more concise (and
    smaller) model specification (GSPN and SRN
    models)
  • Automatically generate solve underlying
    Markov (reward) model
  • Largeness avoidance
  • Hierarchical model composition fixed-point
    iteration
  • combine results from different
    kinds of models
  • Possible to use state-space methods
    for those parts of a system
    that require them, and use non-state-space
    methods for the more well-behaved parts
    of the system.
  • State Truncation

81
LARGENESS TOLERANCE
  • The Markov chains tend to be large and complex
  • leading too
  • Model generation problem
  • Use automated means of generating the Markov
    chains Stochastic Petri Nets, Stochastic Reward
    Nets

82
LARGENESS TOLERANCE(Continued)
  • Model solution problem
  • Use sparse storage for the matrices
  • Use sparsity preserving solution methods
  • Sucessive Overrelaxation,
  • Gauss-Seidel,
  • Uniformization,
  • ODE-solution methods

83
Stochastic Petri Net (SPN)
  • Introduced in 1980s by Natkin, Florin, Molloy,
    Ajmone Marsan, Balbo, Conte, Bobbio, Trivedi,
    others
  • A modeling formalism for the automated generation
    and solution of Markovian stochastic systems
  • Many extensions to the original formalism gspn,
    srn, dspn, mrspn, fspn

84
GSPN Model for Multiprocessor
GSPN Model of a Multiprocessor note that the
gspn is the same for all n
85
ERG for Multiprocessor Model (n2)
Tfail
tcov
Trep
2,0,0,0,0
1,1,0,0,0
1,0,1,0,0
0,0,0,0,2
Tuncov
tquick
Treboot
Trecon
1,0,0,1,0
1,0,0,0,1
0,1,0,0,1
Tfail
Trep
Extended Reachability Graph for Multiprocessor
model
?c
2,0,0,0,0
1,0,1,0,0
?(1-c)
?
?
?
1,0,0,1,0
1,0,0,0,1
0,0,0,0,2
?
?
Reduced ERG (Markov chain) for Multiprocessor
model
86
Stochastic Reward Net (SRN)
  • Introduced by Ciardo, Muppala and Trivedi 1989
  • Structural characteristics
  • Extensive Marking dependency allowed for firing
    rates and firing probabilities
  • Transition Priorities
  • Guards (Enabling functions) for Transitions
  • Variable cardinality arcs

87
Stochastic Reward Net (SRN)
  • Stochastic characteristics
  • Allow definition of reward rates in terms of net
    level entities
  • Automatically generate the reward rates for the
    markings
  • Enables computation of required measures of
    interest

88
Example Reward Rates for Multiprocessor
Availability
  • Reward rate at the net level for steady state
    availability
  • Reward rate at the CTMC level for steady-state
    availability (n2)

89
Analysis Procedure of SRN
Stochastic Reward Nets
Reachability Analysis
Extended Reachability Graphs
Eliminates vanishing markings
Markov Reward Model
Solve MRM (transient or steady-state)
Measures of Interest
90
LARGENESS AVOIDANCE
  • Non-State-Space methods
  • Reliability block diagrams
  • Fault-trees
  • Product-Form Queuing Networks
  • Approximate solutions
  • State Truncation
  • SAVE, SPNP (Kantz and Trivedi PNPM91)

91
Case Study JPL REE System Availability Modeling
in Spacecraft Architecture
92
LARGENESS AVOIDANCE (Cont.)
  • Stochastic Petri Nets (State-space-based
    modeling)
  • State truncation by introducing guard function
  • Guard g is defined as
  • If (?mark(_dn) gt K)
  • return (0)
  • else
  • return (1)

93
SPN MODELING
94
AVAILABILITY MEASURES
95
LARGENESS AVOIDANCE (Continued)
  • Approximate solutions
  • Hierarchical Decomposition
  • and Fixed-Point Iteration among submodels
  • Heidelberger and Trivedi IEEE-TC,1983
  • (Queueing Models)
  • Ciardo and Trivedi PNPM91 (SPN Models)
  • Tomek and Trivedi (Availability Models)
  • Lanus, Liang Trivedi (Bedrock)
  • Wireless handoff work Ma, Han Trivedi

96
Hierarchical example
  • Blocks colored red are expanded into submodels 

97
LARGENESS AVOIDANCE (Continued)
  • Approximate solutions
  • Performability
  • Multiprocessor example
  • Fluid Approximation
  • Mitra Kulkarni Ciardo Nicol, and Trivedi
  • FSPN

98
Summary- Modeling Techniques
  • Combinatorial techniques like RBDs and FTREEs are
    easy to represent and solve
  • Combinatorial models cannot represent intricate
    dependencies
  • State space based models like Markov chains can
    handle dependencies
  • State space explosion problem
  • Use automated generation methods stochastic
    Petri nets
  • Hierarchical models

99
IN ORDER TO FULFILL OUR GOALS
  • Modeling Performance, Availability and
    Performability
  • Modeling Complex Systems
  • We Need
  • Automatic Generation and Solution of Large Markov
    Reward Models

100
IN ORDER TO FULFILL OUR GOALS (Continued)
  • Facility for State Truncation, Hierarchical
    composition of Non-State-Space and State-Space
    Models, Fixed-Point Iteration
  • There are Two Tools that Potentially meet these
    Goals
  • Stochastic Petri Net Package (SPNP)
  • Symbolic Hierarchical Automated Reliability and
    Performance Evaluator (SHARPE)

101
Model-based Availability evaluation
  • Choice of the model type is dictated by
  • Measures of interest
  • Level of detailed system behavior to be
    represented
  • Ease of model specification and solution
  • Representation power of the model type
  • Access to suitable tools or toolkits

102
SPNP Software Package
103
SPNP
  • Installed at over 250 Sites companies
    universities
  • Ported to Most Architectures and Operating
    Systems
  • Used For Performance, Dependability and
    Performability
  • Steady-State as well as Transient Analysis
  • Analytic-numeric methods for Markovian models.
  • Simulation for non-Markovian and fluid models
  • Written in C Language
  • GUI now available

104
SOME INDUSTRIAL USES
  • HP
  • Cluster Availability Modeling
  • Server Availability
  • Mass Storage Arrays Availability Modeling
  • MOTOROLA
  • Recovery strategies in wireless handoff
  • proposed and modeled several strategies
  • Fixed-point iteration used
  • Software rejuvenation in CMTS
  • IBM
  • Software rejuvenation for a cluster system
  • Boeing, EMC,

105
DISCRETE EVENT SIMULATION ANALYSIS
  • Can be used for
  • Markovian SRN
  • non-Markovian SRN
  • Fluid SPN
  • FSPN (Fluid Stochastic Petri net)
  • Used as a model for
  • Systems involving fluid variables
  • Approx. of models with a large number of tokens
  • No need to generate the reachability graph
  • Possibility to give the number of replications or
    the desired relative error.

106
DISTRIBUTIONS AVAILABLE FOR SIMULATION
  • Exponential
  • Constant (including Immediate)
  • Uniform
  • Truncated normal
  • Weibull
  • Lognormal
  • Geometric
  • Erlang
  • Pareto
  • Cauchy
  • Beta
  • Gamma
  • Poisson

107
Solution Technique in SPNP
108
An Introduction to SHARPE software tool
109
Overview of SHARPE
  • SHARPE Symbolic-Hierarchical Automated
    Reliability and Performance Evaluator
  • Well-known modeling tool (Installed at over 300
    Sites companies and universities)
  • Combines flexibility of Markov models and
    efficiency of combinatorial models
  • Ported to most architectures and operating
    systems
  • Used for Education, Research, Engineering Practice

110
Overview of SHARPE (cont.)
  • Graphical User Interface is available
  • Used for analysis of performance(traffic),
    dependability and performability
  • Hierarchy facilitates largeness stiffness
    avoidance
  • Steady-state as well as transient analysis
  • Written in C language
  • Used as an engine by several other tools

111
Architecture of SHARPE interface
Fault tree
MRGP
Reliability Block Diagrams
Markov chain
Hierarchical Hybrid Compositions
Petri net (GSPN SRN)
Reliability graph
Task graph
Pfqn, Mfqn
Reliability/Availability
Performance
Performability
112
Modeling Steps
  • Model construction
  • Model calibration or parameterization
  • Model solution
  • Result interpretation
  • Model Validation

113
MODEL CALIBRATION
  • What is ??
  • Fault Model for Each Component
  • Design,Manufacturing Heisenbugs, Bohrbugs
  • Operational Permanent, Intermittent,Transient
  • Human
  • Fault Arrival Processes (PP,Weibull,NHPP)
  • Failure Rates (SourcesMIL-STD)

114
MODEL CALIBRATION (Continued)
  • What is c ?
  • Field Data
  • Fault/Error Injection (FIAT,MESSALINE)
  • Analytic Coverage Model
  • What is ? ?
  • Maintenance Model

Corrective dispatch , travel, repair time, dead
on arrival, imperfect repair
Preventive
115
MODEL CALIBRATION (Continued)
  • What is r ?
  • Binary Up Down
  • Capacity-Oriented
  • Number of Operational Resources in Each State
  • Performance-Oriented
  • Evaluate Perf. in Each Degraded Level of Syst.
    Config.
  • 1. Measurements
  • 2. Simulation Model
  • 3. Analytic Model -- SHARPE, SPNP

116
VALIDATION VERIFICATION
  • Validation Does the conceptual model faithfully
    reflect the behavior of the system?
  • Verification Has the conceptual model been
    correctly implemented?

117
MODEL VALIDATION (Continued)
  • Three step process outlined by Naylor and Finger
  • Face validation Discussion with the experts
  • Input-Output validation Compare results obtained
    from model with those from measurements
  • Validation of model assumptions Either prove
    that the assumptions are correct or do
    statistical testing
  • Rejection of a hypothesis regarding model
    assumption based on measurement data leads to an
    improved model

118
MODEL ASSUMPTIONS/ERRORS
  • Errors in Model Structure
  • Missing or Extra Arcs
  • Missing or Extra States
  • Use Face Validation to avoid these errors.
  • Errors Due to Non-Independence
  • Distributional Errors
  • Parametric Errors

119
MODEL ASSUMPTIONS/ ERRORS(Continued)
  • Errors Due Approximations
  • Decomposition/Aggregation/Iteration
  • State Truncation
  • Numerical Solution Errors
  • Discretization Errors
  • Round-Off Errors

120
Model Verification
  • Programming Errors
  • Approximation errors Tight bounds due to
    approximations are desirable
  • Numerical Errors in numerical algorithms should
    be bounded

121
MODELING AND MEASUREMENTS INTERFACES
  • Measurements supply Input Parameters to Models
  • (Model Calibration or Parameterization)
  • Confidence Intervals should be obtained
  • Boeing, Draper, Union Switch projects
  • Model Sensitivity Analysis can suggest which
    Parameters to Measure More Accurately Blake,
    Reibman and Trivedi SIGMETRICS 1988.

122
MODELING AND MEASUREMENTS INTERFACES
  • Model Structure Based on Measurement Data
  • Hsueh, Iyer and Trivedi IEEE TC, April 1988
  • Gokhale et al, IPDS 98
  • Vaidyanathan et al, ISSRE99
Write a Comment
User Comments (0)
About PowerShow.com