Title: Configuring Debugging as Search: Finding the Needle in the Haystack
1Configuring Debugging as Search Finding the
Needle in the Haystack
- Andrew Whitaker, Richard S. Cox and Steven D.
Gribble. - University of Washington
- Presented by Aditya Y.S.V.
2What does the paper talk about?
- This work addresses the problem of diagnosing
configuration errors that cause a system to
function incorrectly. - The basic idea is to search for the time when the
system transitioned to a failed state. - The paper presents a tool CHRONUS which automates
this.
3Motivation
1970s
Total ownership cost breakdown
Hardware costs
2000s
People costs
4Existing Approaches
- Prevention never known to work for anything
- Recovery Windows XP restore. The problem with
this is that it is a transition in itself and so
it isnt always safe. - Expert Systems Static Database of known error
configurations. Correction from this can be
automated. Better example than the one given in
paper is an Intrusion Detection System.
5The Basic Approach
System failure
Why?
6System Overview
- Chronus reveals when a system failed
- Chronus pro-actively logs system states
Time
system was NOT working
system was working
7Problem Formulation
Time
- Requirements time travel, testing, search
8System Overview
Design components
Design choices
Time Travel Time travel disks, virtual machines
Testing Software probes, copy-on-write disks
Search Binary search
9Time Travel
- Persistent vs. Transient state captures
- Chronus - Only persistent storage.
- Application layer restarts are not useful where
configurations outside the application(like in
the OS) also play a role in its working.
10Storage layer Trade off
RDMS
CVS
Semantics
File System
Disk
Completeness
11Time-travel Disk Overhead
12Virtual Machines
- The various states are checked by doing a virtual
reboot of the system. - Virtual reboot is faster than physical reboot
- Good way for terminating failed tests.
Potentially be able to check more than one state
at a time. (they dont do this in the paper)
13Disadvantages of VM
- Performance Overhead
- May not be able to expose the latest devices and
device drivers - Cannot diagnose errors within the virtualization
layer itself such as updates to physical device
driver.
14Testing
- Automated diagnosis uses a user supplied
software probe. - It has a manual method of software probe if all
you remember is a series of GUI actions - There exist non-deterministic errors, and they
cannot be reproduced.
15Search
- Binary search
- Spurious Errors
- Strategy to overcome spurious errors.
16Phase 1 Normal operation
Time-travel disk
disk requests
Parent Virtual Machine
Child Virtual Machine
?Denali Virtual Machine Monitor
- Child VM runs normal user programs
- Parent VM records disk writes to a time-travel
disk - Each block write represents an instant in time
17Phase 2 Debug Mode
User command search Tbegin Tend
Parent Virtual Machine
?Denali Virtual Machine Monitor
18Why?
- Chronus only tells you WHEN and not why a system
failed. - For answering why, we need to have other tools.
- Unix diff is mentioned as one of them.
19Case Study Mozilla Web Browser
- Mozilla Web Browser on the NetBSD OS
- Methodology install several extensions
- Symptom Mozilla freezes on startup
- Fails to respond to user input
20Debugging the Mozilla Hang
- Step 1 write a probe that tests the behavior
21Mozilla Hang ..
- Step 2 invoke search over a time range
22Mozilla Hang .
- Step 3 compute the change
- attach time-travel-disk 173552 173553
- diff -r /before /after
file /.mozilla/default/zc1irw5u.slt/chrome/chrome.
rdf differs ltRDFDescription about"urnmozilla
packagestockticker ... cauthor"Jeremy
Gillick" cauthorURL"http//jgillick.nettripper.c
om/" cdescription"Shows your favorite stocks in
a customized ticker." cdisplayName"StockTicke
r 0.4.2
23