Evaluating%20Undo:%20Human-Aware%20Recovery%20Benchmarks - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating%20Undo:%20Human-Aware%20Recovery%20Benchmarks

Description:

Combine traditional recovery benchmarks with human user studies. apply workload and faultload ... less than a user study, more than a perf. benchmark. A ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 26
Provided by: aaron
Category:

less

Transcript and Presenter's Notes

Title: Evaluating%20Undo:%20Human-Aware%20Recovery%20Benchmarks


1
Evaluating Undo Human-Aware Recovery Benchmarks
  • Aaron Brown
  • with Leonard Chung, Calvin Ling,and William
    Kakes
  • January 2004 ROC Retreat

2
Recap ROC Undo
  • We have developed built a ROC Undo Tool
  • a recovery tool for human operators
  • lets operators take a system back in time to undo
    damage, while preserving end-user work
  • We have evaluated its feasibility via performance
    and overhead benchmarks
  • Now we must answer the key question
  • does Undo-based recovery improve dependability?

3
Approach Recovery Benchmarks
  • Recovery benchmarks measure the dependability
    impact of recovery
  • behavior of system during recovery period
  • speed of recovery

4
What About the People?
  • Existing recovery/dependability benchmarks ignore
    the human operator
  • inappropriate for undo, where human drives
    recovery
  • To measure Undo, we need benchmarks that capture
    human-driven recovery
  • by including people in the benchmarking process

5
Outline
  • Introduction
  • Methodology
  • overview
  • faultload development
  • managing human subjects
  • Evaluation of Undo
  • Discussion and conclusions

6
Methodology
  • Combine traditional recovery benchmarks with
    human user studies
  • apply workload and faultload
  • measure system behavior during recovery from
    faults
  • run multiple trials with a pool of human subjects
    acting as system operators
  • Benchmark measures system, not humans
  • indirectly captures human aspects of recovery
  • quality of situational awareness, applicability
    of tools, usability error-proneness of recovery
    procedures

7
Human-Aware Recovery Benchmarks
  • Key components
  • workload reuse performance benchmark
  • faultload survey plus cognitive walkthrough
  • metrics performance, correctness, and
    availability
  • human operators handle non-self-healing recovery
  • Key components
  • workload reuse performance benchmark
  • faultload survey plus cognitive walkthrough
  • metrics performance, correctness, and
    availability
  • human operators handle recovery tasks/tools

8
Developing the Faultload
  • ROC approach combines surveys and cognitive
    walkthrough
  • surveys to establish common failure modes,
    symptoms, and error-prone administrative tasks
  • domain-specific, system-independent
  • cognitive walkthrough to translate to
    system-specific faultload
  • Faultload specifies generic errors and events
  • provides system-independence, broader
    applicability
  • cognitive walkthrough maps to system-specific
    faults

9
Example E-mail Service Faultload
  • Web-based survey of e-mail admins
  • core questions
  • Describe any incidents in the past 3 months
    where data was lost or the service was
    unavailable.
  • Describe any administrative tasks you performed
    in the past 3 months that were particularly
    challenging.
  • cost 4 x 50 gift certificate to amazon.com
  • raffled off as incentive for participation
  • response 68 respondents from SAGE mailing list

10
E-mail Survey Results
  • Results
  • results dominated by
  • configuration errors (e.g., mail filters)
  • botched software/platform upgrades
  • hardware environmental failures
  • Undo potentially useful for majority of problems

11
From Survey to Faultload
  • Cognitive walkthrough example SW upgrade
  • platform sendmail on linux
  • task upgrade from sendmail-8.2.9 to
    sendmail-8.2.10
  • approach
  • 1. configure/locate existing sendmail-linux
    system
  • 2. clone system to test machine (or use virtual
    machine)
  • 3. attempt upgrade, identifying possible failure
    points
  • benchmarker must understand system to do this
  • 4. simulate failures and select those that match
    symptom report from task survey
  • sample result simulate failed upgrade that
    disables spam filtering by omitting -DMILTER
    compile-time flag

12
Human-Aware Recovery Benchmarks
  • Key components
  • workload reuse performance benchmark
  • faultload survey plus cognitive walkthrough
  • metrics performance, correctness, and
    availability
  • human operators handle non-self-healing recovery
  • Key components
  • workload reuse performance benchmark
  • faultload survey plus cognitive walkthrough
  • metrics performance, correctness, and
    availability
  • human operators handle recovery tasks/tools

13
Human Subject Protocol
  • Benchmarks structured as human trials
  • Protocol
  • human subject plays the role of system operator
  • subjects complete multiple sessions
  • in each session
  • apply workload to test system
  • select random scenario and simulate problem
  • give human subject 30 minutes to complete recover
  • Results reflect statistical average across
    subjects

14
The Variability Challenge
  • Must control human variability to get
    reproducible, meaningful results
  • Techniques
  • subject pool selection
  • screening
  • training
  • self-comparison
  • each subject faces same recovery scenario on all
    systems
  • systems score determined by fraction of subjects
    with better recovery behavior
  • powerful, but only works for comparison benchmarks

15
Outline
  • Introduction
  • Methodology
  • Evaluation of Undo
  • setup
  • per-subject results
  • aggregate results
  • Discussion and conclusions

16
Evaluating Undo Setup
  • Faultload scenarios
  • 1. SPAM filter configuration error
  • 2. failed e-mail server upgrade
  • 3. simple software crash (undo not useful here)
  • Subject pool (after screening)
  • 12 UCB Computer Science graduate students
  • Self-comparison protocol
  • each subject given same scenario in each of 2
    sessions
  • undo available in first session only
  • imposes learning bias against undo, but lowers
    variability

17
Sample Single User Result
Without Undo
With Undo
  • Undo significantly improves correctness
  • with some (partially-avoidable) availability cost

18
Overall Evaluation
Sessions where Undo used
  • Undo significantly improves correctness
  • and reduces variance across operators
  • statistically-justified, p-value 0.045
  • Undo hurts IMAP availability
  • several possible workarounds exist
  • Overall, Undo has a positive impact on
    dependability

19
Outline
  • Introduction
  • Methodology
  • Evaluation of Undo
  • Discussion and conclusions

20
Discussion
  • Undo-based recovery improves dependability
  • reduces incorrectly-handled mail in common
    failure cases
  • More can still be done
  • tweaks to Undo implementation will reduce
    availability impact
  • Benchmark methodology is effective at controlling
    human variability
  • self-comparison protocol gives statistically-justi
    fied results with 9 subjects (vs 15 for random
    design)

21
Future Directions Controlling Cost
  • Human subject experiments are still costly
  • recruiting and compensating participants
  • extra time spent on training, multiple benchmark
    runs
  • extra demands on benchmark infrastructure
  • less than a user study, more than a perf.
    benchmark
  • A necessary price to pay!
  • Techniques for cost reduction
  • best-case results using best-of-breed operator
  • remote web-based participation
  • avoid human trials extended cognitive walkthrough

22
Evaluating Undo Human-Aware Recovery Benchmarks
  • For more info
  • abrown_at_cs.berkeley.edu
  • http//roc.cs.berkeley.edu/
  • paper
  • A. Brown, L. Chung et al. Dependability
    Benchmarking of Human-Assisted Recovery
    Processes. Submitted to DSN 2004, June 2004.

23
Backup Slides
24
Example E-mail Service Faultload
  • Results of e-mail task survey

Lost E-mail
(12 reports)
Unknown (8)
Configurationproblems (25)
Hardware/Envt (17)
Software error (8)
Upgrade-related (17)
Externalresource (8)
Operator error (8)
Usererror (8)
25
Full Summary Dataset
Write a Comment
User Comments (0)
About PowerShow.com