FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

About This Presentation

Title:

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

Description:

FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms ... Purpose: investigating novel techniques for building highly-dependable Internet services ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 22

Provided by: petebro

Learn more at: http://roc.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: FIG: A Prototype Tool for On-Line Verification of Recovery Mechanisms

1
FIG A Prototype Tool for On-Line Verification of
Recovery Mechanisms

Naveen Sastry, Pete Broadwell,Jonathan Traupman,
David Patterson
University of California, Berkeley

2
Presentation Outline

Introduction
Objective/Motivation
Background
Methods
Implementation
Test setup
Evaluation
Test results
Conclusions

3
The Berkeley/Stanford ROC Project

Purpose investigating novel techniques for
building highly-dependable Internet services
Example techniques
Advanced support for operator undo
Stability through targeted restarts
Integrated root cause analysis
Online verification of recovery mechanisms

4
FIG Project Objective/Motivation

Objective
Develop a lightweight, extensible tool for
injecting errors to test recovery code/mechanisms
Motivation
Testing and production environments are always
different
Large systems will require recovery code, which
should be tested as part of normal operation

5
Softwares Invisible Users
User Input
User interface
Application
Other libraries
Other apps
System libraries (libc)
OS
Concept Jim Whittaker Florida Institute of
Technology
6
Related Testing Methods

Ballista (DeVale, Koopman, Siewiorek)
Top-down testing of POSIX-compliant OS and
library interfaces
Fuzz (Miller, Fredriksen, So)
Tested UNIX applications by feeding them random
input streams
Holodeck (Whittaker et al.)
Similar approach to ours, but only for Windows
2000/XP

7
FIG Implementation

Thin stub library between app libraries
Traps API calls
Logs them
Inserts faults
Can be inserted into any app without modification
Uses LD_PRELOAD

Application
libfig.so
libc.so, other libs
OS
8
Extensibility

API stubs are automatically generated
Very easy to add new APIs to log
Fault injection is under script control
Can simulate multiple fault models (e.g., memory
pressure)

Sample control file

MALLOC_INDEX
interval 82 to infinity return 0
errno ENOMEM probability 0.03
OPEN_INDEX
// device out of space.
interval 100 to infinity return
1 errno ENOSPC probability 0.001
// kernel out of memory.
interval 100 to 120 return 1
errno ENOMEM probability 0.1
// too many files open.
callnumber 108 return -1 errno EMFILE
probability 1.0

9
Test Setup Applications

GNU file utilities (ls, mv, etc.)
Emacs 20.7.1 with and without X
Apache 1.3.22
Berkeley DB 4.0.14
Netscape Navigator 4.76
MySQL server 3.23.36

10
Test SetupInstrumented Calls Their Errors

malloc() memory exhaustion
read() I/O error, system call was
interrupted
write() I/O error, no space left on
device, call interrupted
open() memory exhaustion, no space
on device, too many files open
select() memory exhaustion

11
Test Results Client Apps
read() read() write() write() select() malloc()
EINTR EIO ENOSPC EIO ENOMEM ENOMEM
Emacs no X o.k. exit warn warn o.k. crash
Emacs -w/X o.k. crash o.k. crash crash/exit crash
Netscape warn exit exit exit n/a exit
12
Test Results Server Apps
read() read() write() write() select() malloc()
EINTR EIO ENOSPC EIO ENOMEM ENOMEM
Berkeley DB Xact retry detect Xact abort Xact abort n/a Xact abort
Berkeley DB no Xact retry detect data loss data loss n/a detect, or data loss
MySQL Server Xact abort retry, warn Xact abort Xact abort retry restart process
Apache o.k. req. drop req. drop req. drop o.k. n/a
13
Netscape Reacts
14
Test Results Overhead
Time (s) Overhead
No FIG 33.46 N/A
FIG, no logging 34.28 2.5
Logging w/o timestamps 47.83 42.9
Logging w/timestamps 61.74 84.5
strace (all syscalls) 112.85 237.3
Timing using Berkeley DB (non-transactional) to
read, sort and write one million words.

Note FIG communicates with a separate logging
daemon through shared memory to reduce logging
overhead.

15
Strategies forReliable Services

Intelligent retry
ls bounded retry of malloc()
Resource preallocation
Apache allocates buffer pool at startup
Degraded service
Apache deactivates logging if disk full
Process pools
Apache and MySQL

16
FIG as a Prototype for Online Error Injection

Low run-time overhead
Easy to enable/disable
Easy to configure
Extensible
Can simulate multiple fault models

17
A Case for OnlineError Injection

Recovery code is not usually exercised during
normal operation
Deployed environments tend to differ from testing
environments
Can run error injection tests on a subset of
deployed systems
FIG can simulate common environmental errors

18
Conclusions

FIG exposed a variety of deficiencies in how our
test applications handled environmental errors
Server apps are generally more robust than client
applications
FIG exhibits low overhead
FIG is suitable for online error injection

19
(No Transcript)
20
Future Directions