EGR 518 Performability Performance and Dependability Analysis for Computer Systems - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

EGR 518 Performability Performance and Dependability Analysis for Computer Systems

Description:

Performability (Performance and Dependability) Analysis for Computer Systems ... conduct and evaluate a dependability analysis using Reliability Block Diagram ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 34
Provided by: jame301
Category:

less

Transcript and Presenter's Notes

Title: EGR 518 Performability Performance and Dependability Analysis for Computer Systems


1
EGR 518Performability (Performance and
Dependability) Analysis for Computer Systems
  • Instructor Meng-Lai Yin
  • Office Bldg. 9, Room 511
  • Tel 909-869-2535
  • emailmyin_at_csupomona.edu

2
Expectation
  • A student, after take this class, is expected to
  • Know the terminology and state-of-the-art
    technology in reliability, availability,
    performance, and performability
  • Grasp an overall picture of the system being
    analyzed
  • Recognize and determine the type of analysis
    needed for a particular task
  • Construct corresponding models for the analysis
  • Get familiar with provided computer-aided
    analysis tools to conduct the analysis
  • Obtain quantitative as well as qualitative
    results from the models
  • Validate the modeling results

3
Course Outline
  • Basic Concepts about Performability Modeling
  • Probability Review
  • Fault Tolerance Techniques
  • Concepts about Modeling Approaches
  • Modeling Tools
  • Reliability Block Diagram
  • Markov Modeling Technique
  • Fault Tree Analysis
  • Performance Analysis Queuing Models
  • Integrate Performance and Dependability
  • Case Studies, Project Presentations

4
Assessment
  • Homework 20
  • Quizzes 20
  • Midterm 20
  • Final 20
  • Project 20

5
Text References
  • Kishor S. Trivedi, Probability and Statistics
    with Reliability, Queuing and Computer Science
    Applications, second edition, John Wiley Sons,
    Inc. 2002, ISBN 0-471-33341-7.
  • References
  • 1 Robin A. Sahner, Kishor S. Trivedi, Antonio
    Puliafito,
  • Performance and Reliability Analysis of Computer
    Systems
  • An Example-Based Approach Using the SHARPE
    Software Package,
  • Kluwer Academic Publishers, 1996. ISBN
    0-7923-9650-2.
  • 2Martin L. Shooman, Reliability of Computer
    Systems and Networks,Fault Tolerance, Analysis,
    and Design, John Wiley Sons, Inc., 2002. ISBN
    0-471-29342-3.
  • 3 http//www.crhc.uiuc.edu/PERFORM/home.html
  • 4 http//www.eecs.umich.edu/jfm/
  • 5 http//www.ee.duke.edu/kst/

6
Ok. So, what is "Performability"?
The needs of High Performance, Fault Tolerant
Computing
7
Fault-Tolerant Computing
  • Fault-tolerant computing is a generic term
    describing redundant design techniques with
    duplicate components or repeated computations
    enabling uninterrupted (tolerant) operation in
    response to component failure (faults).

8
Links
Performability
  • http//www.crhc.uiuc.edu/PERFORM/home.html
  • http//www.eecs.umich.edu/jfm/

Conferences
http//www.dsn.org http//www.rams.org
9
An Example
The purpose of this example is to showthe
existences of performance degradable systems
10
An email received on July 20, 2005 433PM
  • We are experiencing problems with the AIX user
    account file systems. We need to take the AIX
    system off-line immediately to fix the problem.
    We expect the AIX file systems to be off line for
    approximately an hour and a half. We hope to
    have the file systems back on-line by 600PM.
  • Sorry for any inconvenience.
  • Sys Admin Team

11
Later that day July 20, 2005 626PM
  • All AIX file systems are back on-line except
    wei_snoop which is in a rebuild stage. Wei_snoop
    file system will be back on-line by 0600 tomorrow
    morning.
  • Thanks,
  • Sys Admin Team

12
Observations
  • The system is not totally failed even with the
    failed AIX file system
  • The system can operate without the wei_snoop file
    system
  • The system can be upgraded while operating

More and more systems become performance
degradable
13
Performance Degradable Systems
  • Performance degradable systems have the
    capability of continuing to operate failure-free
    in the presence of certain faults or errors by
    diminishing the level of quality of service 7.

Normal Scenario A system starts with all
components operational and performs at its
maximum capability. When a component fails, the
system will reconfigure itself and operate with
degraded performance, etc.
14
Reasons for Performability Modeling
  • Two separate measures in the past
  • Traditional dependability analysis assumes no
    performance degraded states.
  • Performance measures always are applied to fully
    operational state.
  • Need an integrated, meaningful metric
  • For performance degradable systems, where the
    system can operate in many different states, how
    do you address the systems performance?
  • Traditional metrics (performance, reliability,
    availability. etc.) and the corresponding
    modeling techniques cannot catch the overall
    performance feature for performance degradable
    systems.

15
The Beginning of Performability
  • The term Performability was introduced almost
    three decades ago 4, by Prof. J. F. Meyer.

John F. Meyer Address 4111 EECS Phone (734)
763-0037Fax (734) 763-1503 Professor Emeritus,
Electrical Engr Computer ScienceDegree Ph.D.,
U-Michigan
16
A Tribute to M. D. Beaudry
  • Before Dr. John F. Meyer gave the name
    performability to the world, several works
    actually had already been devoted to address the
    issue of providing appropriate metrics for
    performance degradable systems.
  • In Particular, the work conducted by Danielle
    Beaudry 1 has been referenced in many places.
  • In 1, she addressed the performance-related
    reliability measures for gracefully degraded
    systems (performance degradable systems ).

17
Course Objectives
  • At the conclusion of this course, a participant
    will be able to
  • know the basic concepts about performability
  • know how to
  • conduct and evaluate a dependability analysis
    using Reliability Block Diagram (RBD) or Markov
    techniques
  • conduct and evaluate a performance analysis using
    Queuing models
  • conduct and evaluate a performability analysis
    using various modeling techniques

18
Approach
19
Performance Analysis
  • Purpose
  • To assess workload, traffic arrival rates,
    service time distributions, etc.
  • To evaluate resource Contention Scheduling
  • To assess the effects of Concurrency and
    Synchronization
  • Measures
  • Throughput
  • Response time (mean dist.)
  • others

20
How about dependability?
So many terms have been used in this area, such
as reliability, availability, ..
21
Reliability, Availability, Dependability
  • They are all probabilities.
  • What are the differences?

Definition of Reliability The probability of an
item to perform a required function under given
conditions for a given time interval. (without
any failure)
Definition of Availability "The probability of
an item to be in a state to perform a required
function at a given instant of time, assuming
that the external resources, if required, are
provided. (can have failures with repairs)
22
Picture the Differences
time
t0
?
Reliability the probability that the item
survive theduration t0, ?)
time
t0
?
Availability the probability that the item is
working at time ?, given that the item was
working at time t0.
23
Picture the Differences
1.0
Steady state availability
A typical reliability figure (without repair)
A typical availability figure (with repair)
24
Calculating Reliability Availability
  • Let ? be the failure rate for a component, and ?
    be the repair rate for that component.
  • Assume exponential distribution for the failures
  • Then reliability can be calculated as R(t) e
    -? t
  • SS (Steady-State) -Availability can be assessed
    as
  • or

25
Dependability Umbrella term
Trustworthiness of a computer system such that
reliance can justifiably be placed on the service
it delivers
Copied from course materials provided by prof.
Trivedi
26
Modeling Taxonomy
27
Combinatorial Approach
  • If a system consisting of n components, and every
    component is either working or failed, then we
    can simply list out of all the possible
    combinations and calculate the probability for
    each combination.

28
Complexity Concerns
  • How many possible combinations of the status of
    these n components?
  • What can be done to manage the complexity?
  • During model construction
  • Need a more intelligent way to describe the
    systems failure behavior
  • Series and parallel RBD (Reliability Block
    Diagram) approach
  • During model solution
  • Need more efficient ways of calculations, rather
    than counting individual probabilities

29
Structured Combinatorial Approach
  • Reliability block diagrams
  • Integrate certain probability events into a
    module, which contains the info
  • A probability of failure
  • A failure rate
  • A distribution of time to failure
  • Steady-state and instantaneous unavailability
  • Organize the modules in a structured way,
    according to the effects of each modules failure
  • Statistical independence Assumption
  • Failures independence
  • Repairs independence

30
Some Basic Terminology
  • Redundancy Hardware (Static,Dynamic),
    information, Time, software
  • Fault Types Permanent (needs repair or
    replacement), Intermittent (reboot/restart or
    replacement), Transient (retry),
  • Fault, error, failure
  • Fault detection, imperfect Coverage
  • Maintenance scheduled (preventive), unscheduled
    (corrective)

31
Terminology Continue
  • Failure occurs when the delivered service no
    longer complies with the specification
  • Error is that part of the system state which is
    liable to lead to subsequent failure
  • Fault is adjudged or hypothesized cause of an
    error

Faults are the cause of errors that may lead to
failures
Fault
Error
Failure
32
High Availability Intents
  • Scott McNealy, Sun Microsystems Inc.
  • "We're paying people for uptime.The only thing
    that really matters is uptime, uptime, uptime,
    uptime and uptime. I want to get it down to a
    handful of times you might want to bring a Sun
    computer down in a year. I'm spending all my time
    with employees to get this design goal
  • SUN Microsystems SunUP RASCAL program for
    high-availability
  • Motorola - 5NINES Initiative
  • HP, Cisco, Oracle, SAP - 5nines5minutes Alliance
  • IBM Cornhusker clustering technology for
    high-availability, eLiza, autonomic computing
  • Microsoft Trustable computing initiative
  • Microsoft Regular full page ad on 99.999
    availability in USA Today

33
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com