EGR 518 Performability Performance and Dependability Analysis for Computer Systems - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

EGR 518 Performability Performance and Dependability Analysis for Computer Systems

Description:

Performability (Performance and Dependability) Analysis for Computer Systems ... conduct and evaluate a dependability analysis using Reliability Block Diagram ... – PowerPoint PPT presentation

Number of Views:183

Avg rating:3.0/5.0

Slides: 34

Provided by: jame301

Category:

more less

Transcript and Presenter's Notes

Title: EGR 518 Performability Performance and Dependability Analysis for Computer Systems

1
EGR 518Performability (Performance and
Dependability) Analysis for Computer Systems

Instructor Meng-Lai Yin
Office Bldg. 9, Room 511
Tel 909-869-2535
emailmyin_at_csupomona.edu

2
Expectation

A student, after take this class, is expected to
Know the terminology and state-of-the-art
technology in reliability, availability,
performance, and performability
Grasp an overall picture of the system being
analyzed
Recognize and determine the type of analysis
needed for a particular task
Construct corresponding models for the analysis
Get familiar with provided computer-aided
analysis tools to conduct the analysis
Obtain quantitative as well as qualitative
results from the models
Validate the modeling results

3
Course Outline

Basic Concepts about Performability Modeling
Probability Review
Fault Tolerance Techniques
Concepts about Modeling Approaches
Modeling Tools
Reliability Block Diagram
Markov Modeling Technique
Fault Tree Analysis
Performance Analysis Queuing Models
Integrate Performance and Dependability
Case Studies, Project Presentations

4
Assessment

Homework 20
Quizzes 20
Midterm 20
Final 20
Project 20

5
Text References

Kishor S. Trivedi, Probability and Statistics
with Reliability, Queuing and Computer Science
Applications, second edition, John Wiley Sons,
Inc. 2002, ISBN 0-471-33341-7.
References
1 Robin A. Sahner, Kishor S. Trivedi, Antonio
Puliafito,
Performance and Reliability Analysis of Computer
Systems
An Example-Based Approach Using the SHARPE
Software Package,
Kluwer Academic Publishers, 1996. ISBN
0-7923-9650-2.
2Martin L. Shooman, Reliability of Computer
Systems and Networks,Fault Tolerance, Analysis,
and Design, John Wiley Sons, Inc., 2002. ISBN
0-471-29342-3.
3 http//www.crhc.uiuc.edu/PERFORM/home.html
4 http//www.eecs.umich.edu/jfm/
5 http//www.ee.duke.edu/kst/

6
Ok. So, what is "Performability"?
The needs of High Performance, Fault Tolerant
Computing
7
Fault-Tolerant Computing

Fault-tolerant computing is a generic term
describing redundant design techniques with
duplicate components or repeated computations
enabling uninterrupted (tolerant) operation in
response to component failure (faults).

8
Links
Performability

http//www.crhc.uiuc.edu/PERFORM/home.html
http//www.eecs.umich.edu/jfm/

Conferences
http//www.dsn.org http//www.rams.org
9
An Example
The purpose of this example is to showthe
existences of performance degradable systems
10
An email received on July 20, 2005 433PM

We are experiencing problems with the AIX user
account file systems. We need to take the AIX
system off-line immediately to fix the problem.
We expect the AIX file systems to be off line for
approximately an hour and a half. We hope to
have the file systems back on-line by 600PM.
Sorry for any inconvenience.
Sys Admin Team

11
Later that day July 20, 2005 626PM

All AIX file systems are back on-line except
wei_snoop which is in a rebuild stage. Wei_snoop
file system will be back on-line by 0600 tomorrow
morning.
Thanks,
Sys Admin Team

12
Observations

The system is not totally failed even with the
failed AIX file system
The system can operate without the wei_snoop file
system
The system can be upgraded while operating

More and more systems become performance
degradable
13
Performance Degradable Systems

Performance degradable systems have the
capability of continuing to operate failure-free
in the presence of certain faults or errors by
diminishing the level of quality of service 7.

Normal Scenario A system starts with all
components operational and performs at its
maximum capability. When a component fails, the
system will reconfigure itself and operate with
degraded performance, etc.
14
Reasons for Performability Modeling

Two separate measures in the past
Traditional dependability analysis assumes no
performance degraded states.
Performance measures always are applied to fully
operational state.
Need an integrated, meaningful metric
For performance degradable systems, where the
system can operate in many different states, how
do you address the systems performance?
Traditional metrics (performance, reliability,
availability. etc.) and the corresponding
modeling techniques cannot catch the overall
performance feature for performance degradable
systems.

15
The Beginning of Performability

The term Performability was introduced almost
three decades ago 4, by Prof. J. F. Meyer.

John F. Meyer Address 4111 EECS Phone (734)
763-0037Fax (734) 763-1503 Professor Emeritus,
Electrical Engr Computer ScienceDegree Ph.D.,
U-Michigan
16
A Tribute to M. D. Beaudry

Before Dr. John F. Meyer gave the name
performability to the world, several works
actually had already been devoted to address the
issue of providing appropriate metrics for
performance degradable systems.
In Particular, the work conducted by Danielle
Beaudry 1 has been referenced in many places.
In 1, she addressed the performance-related
reliability measures for gracefully degraded
systems (performance degradable systems ).

17
Course Objectives

At the conclusion of this course, a participant
will be able to
know the basic concepts about performability
know how to
conduct and evaluate a dependability analysis
using Reliability Block Diagram (RBD) or Markov
techniques
conduct and evaluate a performance analysis using
Queuing models
conduct and evaluate a performability analysis
using various modeling techniques

18
Approach
19
Performance Analysis

Purpose
To assess workload, traffic arrival rates,
service time distributions, etc.
To evaluate resource Contention Scheduling
To assess the effects of Concurrency and
Synchronization
Measures
Throughput
Response time (mean dist.)
others

20
How about dependability?
So many terms have been used in this area, such
as reliability, availability, ..
21
Reliability, Availability, Dependability

They are all probabilities.
What are the differences?

Definition of Reliability The probability of an
item to perform a required function under given
conditions for a given time interval. (without
any failure)
Definition of Availability "The probability of
an item to be in a state to perform a required
function at a given instant of time, assuming
that the external resources, if required, are
provided. (can have failures with repairs)
22
Picture the Differences
time
t0
?
Reliability the probability that the item
survive theduration t0, ?)
time
t0
?
Availability the probability that the item is
working at time ?, given that the item was
working at time t0.
23
Picture the Differences
1.0
Steady state availability
A typical reliability figure (without repair)
A typical availability figure (with repair)
24
Calculating Reliability Availability

Let ? be the failure rate for a component, and ?
be the repair rate for that component.
Assume exponential distribution for the failures
Then reliability can be calculated as R(t) e
-? t
SS (Steady-State) -Availability can be assessed
as
or

25
Dependability Umbrella term
Trustworthiness of a computer system such that
reliance can justifiably be placed on the service
it delivers
Copied from course materials provided by prof.
Trivedi
26
Modeling Taxonomy
27
Combinatorial Approach

If a system consisting of n components, and every
component is either working or failed, then we
can simply list out of all the possible
combinations and calculate the probability for
each combination.

28
Complexity Concerns

How many possible combinations of the status of
these n components?
What can be done to manage the complexity?
During model construction
Need a more intelligent way to describe the
systems failure behavior
Series and parallel RBD (Reliability Block
Diagram) approach
During model solution
Need more efficient ways of calculations, rather
than counting individual probabilities

29
Structured Combinatorial Approach

Reliability block diagrams
Integrate certain probability events into a
module, which contains the info
A probability of failure
A failure rate
A distribution of time to failure
Steady-state and instantaneous unavailability
Organize the modules in a structured way,
according to the effects of each modules failure
Statistical independence Assumption
Failures independence
Repairs independence

30
Some Basic Terminology

Redundancy Hardware (Static,Dynamic),
information, Time, software
Fault Types Permanent (needs repair or
replacement), Intermittent (reboot/restart or
replacement), Transient (retry),
Fault, error, failure
Fault detection, imperfect Coverage
Maintenance scheduled (preventive), unscheduled
(corrective)

31
Terminology Continue

Failure occurs when the delivered service no
longer complies with the specification
Error is that part of the system state which is
liable to lead to subsequent failure
Fault is adjudged or hypothesized cause of an
error

Faults are the cause of errors that may lead to
failures
Fault
Error
Failure
32
High Availability Intents

Scott McNealy, Sun Microsystems Inc.
"We're paying people for uptime.The only thing
that really matters is uptime, uptime, uptime,
uptime and uptime. I want to get it down to a
handful of times you might want to bring a Sun
computer down in a year. I'm spending all my time
with employees to get this design goal
SUN Microsystems SunUP RASCAL program for
high-availability
Motorola - 5NINES Initiative
HP, Cisco, Oracle, SAP - 5nines5minutes Alliance
IBM Cornhusker clustering technology for
high-availability, eLiza, autonomic computing
Microsoft Trustable computing initiative
Microsoft Regular full page ad on 99.999
availability in USA Today