Latency as a Performability Metric for Internet Services - PowerPoint PPT Presentation

About This Presentation

Title:

Latency as a Performability Metric for Internet Services

Description:

A goal of ROC project: develop metrics to evaluate new recovery techniques ... 2 Zona Research and Keynote Systems, The Need for Speed II, 2001. Current Progress ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 21

Provided by: petebro

Learn more at: http://roc.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Latency as a Performability Metric for Internet Services

1
Latency as a Performability Metric for Internet
Services

Pete Broadwell
pbwell_at_cs.berkeley.edu

2
Outline

Performability background/review
Latency-related concepts
Project status
Initial test results
Current issues

3
Motivation

A goal of ROC project develop metrics to
evaluate new recovery techniques
Problem basic concept of availability assumes
system is either up or down at a given time
Nines only describe fraction of uptime over a
certain interval

4
Why Is Availability Insufficient?

Availability doesnt describe durations or
frequencies of individual outages
Both can strongly influence user perception of
service, as well as revenue
Availability doesnt capture systems capacity to
support degraded service
degraded performance during failures
reduced data quality during high load (Web)

5
What is performability?

Combination of performance and dependability
measures
Classical defn probabilistic (model-based)
measure of a systems ability to perform in the
presence of faults1
Concept from traditional fault-tolerant systems
community, ca. 1978
Has since been applied to other areas, but still
not in widespread use

1 J. F. Meyer, Performability Evaluation Where
It Is and What Lies Ahead, 1994
6
Performability Example
Discrete-time Markov chain (DTMC) model of a
RAID-5 disk array1
1 Hannu H. Kari, Ph.D. Thesis, Helsinki
University of Technology, 1997
7
Visualizing Performability
Throughput
I/O operations/sec
Time
8
Metrics for Web Services

Throughput - requests/sec
Latency render time, time to first byte
Data quality
harvest (response completeness)
yield ( queries answered)1

1 E. Brewer, Lessons from Giant-Scale Internet
Services, 2001
9
Applications of Metrics

Modeling the expected failure-related performance
of a system, prior to deployment
Benchmarking the performance of an existing
system during various recovery phases
Comparing the reliability gains offered by
different recovery strategies

10
Related Projects

HP Automating Data Dependability
uses time to data access as one objective for
storage systems
Rutgers PRESS/Mendosus
evaluated throughput of PRESS server during
injected failures
IBM Autonomic Storage
Numerous ROC projects

11
Arguments for Using Latency as a Metric

Originally, performability metrics were meant to
capture end-user experience1
Latency better describes the experience of an end
user of a web site
response time gt8 sec site abandonment
lost income 2
Throughput describes the raw processing ability
of a service
best used to quantify expenses

1 J. F. Meyer, Performability Evaluation Where
It Is and What Lies Ahead, 1994
2 Zona Research and Keynote Systems, The Need for
Speed II, 2001
12
Current Progress

Using Mendosus fault injection system on a 4-node
PRESS web server (both from Rutgers)
Running latency-based performability tests on the
cluster
Inject faults during load test
Record page-load times before, during and after
faults

13
Test Setup
PRESS web server Mendosus
Test clients
Page
Emulatedswitch
Normal version cooperative caching HA version
cooperative caching heartbeat monitoring
14
Effect of Component Failure on Performability
Metrics
Perform- ability metric
Throughput
Latency
Time
REPAIR
FAILURE
15
Observations

Below saturation, throughput is more dependent on
load than latency
Above saturation, latency is more dependent on
load

Thru 3/s Lat .14s
Thru 6/s Lat .14s
Thru 7/s Lat .4s
1
2
3
4
5
Time
16
How to Represent Latency?

Average response time over a given time period
Make a distinction between render time time
to first byte?
Deviation from baseline latency
Impose a greater penalty for deviations toward
longer wait times?

17
Response Time with Load Shedding Policy
Responsetime (sec)
Abandonment threshold
8s
Load-shedding threshold
Time
REPAIR
FAILURE
18
Load Shedding Issues

Load shedding means returning 0 data quality a
different kind of performability metric
To combine load shedding and latency, define a
demerit system
Such systems quickly lose generality, however

- Server too busy msg 3 demerits - 8 sec
response time 1 demerit/sec
19
Further Work

Collect more experimental results!
Compare throughput and latency-based results of
normal and high-availability versions of PRESS
Evaluate usefulness of demerit systems to
describe the user experience (latency and data
quality)

20
Latency as a Performability Metric for Internet
Services