FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms - PowerPoint PPT Presentation

About This Presentation
Title:

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Description:

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms ... Purpose: investigating novel techniques for building highly-dependable Internet services ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 22
Provided by: petebro
Category:

less

Transcript and Presenter's Notes

Title: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms


1
FIG A Prototype Tool for On-Line Verification of
Recovery Mechanisms
  • Naveen Sastry, Pete Broadwell,Jonathan Traupman,
    David Patterson
  • University of California, Berkeley

2
Presentation Outline
  • Introduction
  • Objective/Motivation
  • Background
  • Methods
  • Implementation
  • Test setup
  • Evaluation
  • Test results
  • Conclusions

3
The Berkeley/Stanford ROC Project
  • Purpose investigating novel techniques for
    building highly-dependable Internet services
  • Example techniques
  • Advanced support for operator undo
  • Stability through targeted restarts
  • Integrated root cause analysis
  • Online verification of recovery mechanisms

4
FIG Project Objective/Motivation
  • Objective
  • Develop a lightweight, extensible tool for
    injecting errors to test recovery code/mechanisms
  • Motivation
  • Testing and production environments are always
    different
  • Large systems will require recovery code, which
    should be tested as part of normal operation

5
Softwares Invisible Users
User Input
User interface
Application
Other libraries
Other apps
System libraries (libc)
OS
Concept Jim Whittaker Florida Institute of
Technology
6
Related Testing Methods
  • Ballista (DeVale, Koopman, Siewiorek)
  • Top-down testing of POSIX-compliant OS and
    library interfaces
  • Fuzz (Miller, Fredriksen, So)
  • Tested UNIX applications by feeding them random
    input streams
  • Holodeck (Whittaker et al.)
  • Similar approach to ours, but only for Windows
    2000/XP

7
FIG Implementation
  • Thin stub library between app libraries
  • Traps API calls
  • Logs them
  • Inserts faults
  • Can be inserted into any app without modification
  • Uses LD_PRELOAD

Application
libfig.so
libc.so, other libs
OS
8
Extensibility
  • API stubs are automatically generated
  • Very easy to add new APIs to log
  • Fault injection is under script control
  • Can simulate multiple fault models (e.g., memory
    pressure)

Sample control file
  • MALLOC_INDEX
  • interval 82 to infinity return 0
  • errno ENOMEM probability 0.03
  • OPEN_INDEX
  • // device out of space.
  • interval 100 to infinity return
  • 1 errno ENOSPC probability 0.001
  • // kernel out of memory.
  • interval 100 to 120 return 1
  • errno ENOMEM probability 0.1
  • // too many files open.
  • callnumber 108 return -1 errno EMFILE
  • probability 1.0

9
Test Setup Applications
  • GNU file utilities (ls, mv, etc.)
  • Emacs 20.7.1 with and without X
  • Apache 1.3.22
  • Berkeley DB 4.0.14
  • Netscape Navigator 4.76
  • MySQL server 3.23.36

10
Test SetupInstrumented Calls Their Errors
  • malloc() memory exhaustion
  • read() I/O error, system call was
    interrupted
  • write() I/O error, no space left on
    device, call interrupted
  • open() memory exhaustion, no space
    on device, too many files open
  • select() memory exhaustion

11
Test Results Client Apps
read() read() write() write() select() malloc()
EINTR EIO ENOSPC EIO ENOMEM ENOMEM
Emacs no X o.k. exit warn warn o.k. crash
Emacs -w/X o.k. crash o.k. crash crash/exit crash
Netscape warn exit exit exit n/a exit
12
Test Results Server Apps
read() read() write() write() select() malloc()
EINTR EIO ENOSPC EIO ENOMEM ENOMEM
Berkeley DB Xact retry detect Xact abort Xact abort n/a Xact abort
Berkeley DB no Xact retry detect data loss data loss n/a detect, or data loss
MySQL Server Xact abort retry, warn Xact abort Xact abort retry restart process
Apache o.k. req. drop req. drop req. drop o.k. n/a
13
Netscape Reacts
14
Test Results Overhead
Time (s) Overhead
No FIG 33.46 N/A
FIG, no logging 34.28 2.5
Logging w/o timestamps 47.83 42.9
Logging w/timestamps 61.74 84.5
strace (all syscalls) 112.85 237.3
Timing using Berkeley DB (non-transactional) to
read, sort and write one million words.
  • Note FIG communicates with a separate logging
    daemon through shared memory to reduce logging
    overhead.

15
Strategies forReliable Services
  • Intelligent retry
  • ls bounded retry of malloc()
  • Resource preallocation
  • Apache allocates buffer pool at startup
  • Degraded service
  • Apache deactivates logging if disk full
  • Process pools
  • Apache and MySQL

16
FIG as a Prototype for Online Error Injection
  • Low run-time overhead
  • Easy to enable/disable
  • Easy to configure
  • Extensible
  • Can simulate multiple fault models

17
A Case for OnlineError Injection
  • Recovery code is not usually exercised during
    normal operation
  • Deployed environments tend to differ from testing
    environments
  • Can run error injection tests on a subset of
    deployed systems
  • FIG can simulate common environmental errors

18
Conclusions
  • FIG exposed a variety of deficiencies in how our
    test applications handled environmental errors
  • Server apps are generally more robust than client
    applications
  • FIG exhibits low overhead
  • FIG is suitable for online error injection

19
(No Transcript)
20
Future Directions
  • Limitations of FIG
  • Only for UNIX-like OSes
  • Limited to app/library interface (proxy for
    app/OS interaction)
  • Make FIG part of a larger test suite
  • Include clock time and event based error triggers
  • Greater flexibility in configuration file

21
Other Related Work
  • Xept (Vo et al.)
  • Instruments object code to ensure that error
    handling code exists
  • Processor memory errors
  • DOCTOR, HYBRID, DEFINE
  • Process memory corruption
  • FERRARI, DEFINE
Write a Comment
User Comments (0)
About PowerShow.com