Title: Static Specification Analysis for Termination of SpecificationBased Data Structure Repair
1Static Specification Analysis for Termination of
Specification-Based Data Structure Repair
- Brian Demsky
- Martin Rinard
- Laboratory for Computer Science
- Massachusetts Institute of Technology
2Motivation
Broken Data Structure
- Errors
- Missing elements
- Inappropriate sharing
- Dangling references
- Out of bounds array indices
- Inconsistent values
F 20 G 5
F 20 G 10
I 5
J 2
3Goal
Broken Data Structure
Consistent Data Structure
F 10 G 5
F 20 G 10
F 2 G 1
F 20 G 5
F 20 G 10
Repair Algorithm
I 3
I 5
J 2
J 2
4Goal
Broken Data Structure
Consistent Data Structure
Consistency Properties From Developer
F 10 G 5
F 20 G 10
F 2 G 1
F 20 G 5
F 20 G 10
Repair Algorithm
I 3
I 5
J 2
J 2
5What Does Repair Algorithm Produce?
- Data structure that
- Satisfies consistency properties, and
- Heuristically close to broken data structure
- Not necessarily the same data structure as
(hypothetical) correct program would produce - But enough to keep program operating successfully
6Precursors
- Data structure repair has historically appeared
in systems with extreme reliability goals - 5ESS switch hand coded audit routines
- IBM MVS operating system hand coded failure
recovery routines - Key component of these systems
7Where Is This Likely To Be Useful?
- Not for systems with slack - can just reboot
- Cause of error must go away after reboot
- Must be OK to lose volatile state
- Must be OK to wait for reboot
- Persistent data structures
- (file systems, application files)
- Autonomous and/or safety critical systems
- Monitor/control unstable physical phenomena
- Largely independent subcomputations
- Moving time window
8Architecture
Broken Abstract Model
Repaired Abstract Model
Internal Consistency Properties
External Consistency Properties
Model Definition Translation
1011100110001111011 1010101011110011101 1010111000
111101110
1010011110001111011 1010110101110011010 1010111011
001100010
Broken Bits
Repaired Bits
9Architecture Rationale
- Why go through the abstract model?
- Simple, uniform structure
- Sets of objects
- Relations between objects
- Simplifies both
- Expression of consistency properties
- Repair algorithm
- Enables system to support full range of
efficient, heavily encoded data structures
10File System Example
abst
intro
0
2
1
-5
1
-1
Directory Entries
Disk Blocks
struct Disk Entry dirNumEntries Block
blockNumBlocks Disk D
- struct Entry
- byte nameLength
- int firstBlock
-
- struct Block
- int nextBlock
- data byteBlockSize
11Model Definition
- Sets of objects
- set blocks of integer partition used free
- Relations between objects values of object
fields, referencing relationships between objects - relation next used, used
blocks
used
free
next
12Model Translation
- Bits translated to sets and relations in abstract
model using statements of the form - Quantifiers, Condition ? Inclusion Constraint
- for i in 0..NumEntries, 0 ? D.diri.firstBlock
and D.diri.firstBlock lt NumBlocks ? - D.diri.firstBlock in used
- for b in used, 0 ? D.blockb.nextBlock and
D.blockb.nextBlock lt NumBlocks ?
?b,D.blockb.nextBlock? in next - for ?b,n? in next, true ? n in used
- for b in 0..NumBlocks, not (b in used) ? b in free
13Model in Example
abst
intro
0
2
1
-5
1
-1
Directory Entries
Disk Blocks
blocks
used
0
next
free
1
3
next
2
14Internal Consistency Properties
- Quantifiers, Body
- Body is first-order property of basic
propositions - Inequality constraints on values of numeric
fields - V.R E, V.R lt E, V.R ? E, V.R ? E, V.R gt E
- Presence of required number of objects
- size(S) C, size(S) ? C, size(S) ? C
- Topology of region surrounding each object
- size(V.R) C, size(V.R) ? C, size(V.R) ? C
- size(R.V) C, size(R.V) ? C, size(R.V) ? C
- Inclusion constraints V in S, V1 in V2.R,
?V1,V2? in R - Example for b in used, size(next.b) ? 1
15Internal Consistency Violations
- Evaluate consistency properties, find violations
- for b in used, size(next.b) ? 1 is false for b 1
blocks
used
0
next
free
1
3
next
2
16Repairing Violations of Internal Consistency
Properties
- Violation provides binding for quantified
variables - Convert Body to disjunctive normal form
- (p1 ? ? pn ) ? ? (q1 ? ? qm )
- p1 pn , q1 qm are basic propositions
- Choose a conjunction to satisfy
- Repair violated basic propositions in conjunction
17Repairing Violations of Basic Propositions
- Inequality constraints on values of numeric
fields - V.R E, V.R lt E, V.R ? E, V.R ? E, V.R gt E
- Compute value of expression, assign field
- Presence of required number of objects
- size(S) C, size(S) ? C, size(S) ? C
- Remove or insert objects from/to set
- Topology of region surrounding each object
- size(V.R) C, size(V.R) ? C, size(V.R) ? C
- size(R.V) C, size(R.V) ? C, size(R.V) ? C
- Remove or insert pairs from/to relation
- Inclusion constraints V in S, V1 in V2.R,
?V1,V2? in R - Remove or add the object or pair from/to set or
- relation
18Repair in Example
for b in used, size(next.b) ? 1 is false for b
1 Must repair size(next.1) ? 1 Can remove either
?0,1? or ?2,1? from next
blocks
used
0
next
free
1
3
next
2
19Repair in Example
for b in used, size(next.b) ? 1 is false for b
1 Must repair size(next.1) ? 1 Can remove either
?0,1? or ?2,1? from next
blocks
used
0
next
free
1
3
2
20Acyclic Repair Dependences
- Questions
- Isnt it possible for the repair of one
constraint to invalidate another constraint? - What about infinite repair loops?
- What about unsatisfiable specifications?
- Answer
- We require specifications to have no cyclic
repair dependences between constraints - So all repair sequences terminate
- Repair can fail only because of resource
limitations
21Formalizing Repair DependencesConstraint
Dependence Graph
- Nodes Conjuncts from DNF
- Edges
- conjunction to dependent conjunction
- if repairing conjunction could falsify
conjunction, or - if repairing conjunction could increase
quantifier scope
(a1 ? ? an )
(b1 ? ? bn )
(c1 ? ? cn )
(d1 ? ? dn )
(e1 ? ? en )
(f1 ? ? fn )
22Formalizing Repair DependencesConstraint
Dependence Graph
- Absence of cycles implies valid repair schedule
- Conjunction removal for cycle elimination
- (must leave at least one conjunction per
constraint)
(a1 ? ? an )
(b1 ? ? bn )
(c1 ? ? cn )
(d1 ? ? dn )
(e1 ? ? en )
(f1 ? ? fn )
23Formalizing Repair DependencesConstraint
Dependence Graph
- Absence of cycles implies valid repair schedule
- Conjunction removal for cycle elimination
- (must leave at least one conjunction per
constraint)
(a1 ? ? an )
(b1 ? ? bn )
(c1 ? ? cn )
(d1 ? ? dn )
(e1 ? ? en )
24External Consistency Constraints
- Quantifiers, Condition ? Body
- Body of form V E, V.F E, V.FI E
- Example
- for b in free, true ? D.blockb.nextBlock -2
- for ?i,j? in next, true ? D.blocki.nextBlock
j - for b in used, size(b.next) 0 ?
D.blockb.nextBlock -1 - Repair simply performs assignments
- Translates model repairs to bit repairs
25Repair in Example
Inconsistent File System
Repaired File System
26When to Test for Consistency and Repair
- Persistent data structures
- Repair can be independent activity, or
- Repair when data written out or read in
- Volatile data structures in running program
- Under programmer control
- Transaction-based approach
- Identify transaction start and end
- Repair at start, end, or both
- Failure-based approach
- Wait until program fails
- Repair and restart from latest safe point
27Experience
- We acquired four benchmarks (written in C/C)
- CTAS (air-traffic control tool)
- Simplified Linux file system
- Freeciv interactive game
- Microsoft Word files
- We developed specifications for all four
- Very little development time (days, not weeks)
- Most of time spent figuring out Freeciv and CTAS
- Each benchmark has
- Workload
- Fault insertion methodology
- Ran benchmarks with and without repair
28CTAS
- Set of air-traffic control tools
- Traffic management
- Arrival planning
- Flow visualization
- Shortcut planning
- Deployed in centers around country (Dallas/Ft.
Worth, Los Angeles, Denver, Miami,
Minneapolis/St. Paul, Atlanta, Oakland) - Approximately 1 million lines of C/C code
29CTAS Screen Shot
30Results
- Workload recorded radar feed from DFW
- Fault insertion
- Simulate error in flight plan processing
- Bad airport index in flight plan data structure
- Without repair
- System crashes segmentation fault
- With repair
- Aircraft has different origin or destination
- System continues to execute
- Anomaly eventually flushed from system
31Aspects of CTAS
- Lots of independent subcomputations
- System processes hundreds of aircraft problem
with one should not affect others - Multipurpose system
(visualization, arrival planning, shortcuts, )
problem in one purpose should not affect others - Sliding time window anomalies eventually flushed
- Rebooting ineffective system will crash again
as soon as it sees the problematic flight plan
32Simplified Linux File System
intro
110
0
1011
directory block
inode bitmap block
block bitmap block
inode
inode
super block
group block
disk blocks
inode block
- Some Consistency Properties
- inode bitmap consistent with inode usage
- block bitmap consistent with block usage
- directory entries refer to valid inodes
- files contain valid blocks only
- files do not share blocks
33Results
- Workload write and verify several files
- Fault insertion crash file system
- Inode and block bitmap errors
- Partially initialized directory and inode entries
- Without repair
- Incorrect file contents because of inode and disk
block sharing - With repair
- Bitmaps repaired preventing illegal sharing,
correct file contents
34Freeciv
Terrain Grid
- Consistency Properties
- Tiles have valid terrain values
- Cities are not in the ocean
- Each city has exactly one reference from city
location grid - City locations are consistent in
- City structures and
- tile grid
O Ocean
P
O
M
M
P Plain
O
O
M
P
M Mountain
P
O
M
M
City Structures
P
P
M
P
loc 3,0
loc 2,3
35Results
- Workload Freeciv software plays against itself
- Fault insertion randomly corrupt terrain values
- Without repair program fails (seg fault)
- With repair
- Game runs just fine
- But game plays out differently because of the
different terrain values
36Microsoft Word Files
- Files consist of a sequence of streams
- Streams stored using FAT-based data structure
- Consistency Properties
- FAT blocks exist and contain valid entries
- FAT streams are properly terminated
- Free blocks properly marked
- Streams contain valid blocks
- No sharing of blocks between streams
abst
1
intro
7
0
1
9
2
-1
-1
-2
1
Directory Entries
FAT
Disk Blocks
37Results
- Workload several Microsoft Word files
- Fault insertion scramble FAT
- Without repair
- If blocks containing the FAT were incorrectly
marked as free, Word successfully loads file - Otherwise, The document name or path is
not valid - With repair
- Word loads all files
38Recent Work
- External consistency properties translate model
repairs to data structure repairs - Errors may cause data structures to remain
inconsistent even after repair
39Recent Work
- Current strategy
- Eliminate external consistency properties
- Analyze model definition rules and internal
consistency properties - Automatically generate data structure repairs
40Recent Work
Broken Abstract Model
Repaired Abstract Model
Abstract Repair
. . . .
Model Definition Translation
10111001011 10101011101 10101110110
10111001011 10101011101 10101110110
10111001011 10101011101 10101110110
. . . .
Automatically Generated Concrete Repair
Broken Bits
Repaired Bits
Result Repaired bits guaranteed to satisfy
consistency constraints
41Recent Work
- Efficient evaluation of consistency properties
- Compilation to remove interpreter overhead (4.7x
speedup) - Fixed point elimination (210x speedup)
- Relation construction elimination (500x speedup)
- Set construction elimination (3900x speedup)
- Model-based error localization
- User study shows benefit from approach
- Users with tool take 11 minutes on average to
find and fix a bug - Users without tool mostly failed to find a bug
within the hour allocated
42Related Work
- Hand-coded repair
- Lucent 5ESS switch
- IBM MVS operating system
- Integrity Maintenance in Databases (Ceri, Widom,
Urban) - Self-stabilizing algorithms
- Log-based recovery for database systems
- Recovery-oriented computing
- Recursive restartability
- Undo framework
43Conclusion
- Data structure repair interesting way to
(potentially) improve reliability - Specification-based approach promises to make
technique more widely applicable - Moving towards more robust, probabilistic,
continuous concept of system behavior
44Consistency Properties
- The FAT blocks exist
- FAT contains valid values only
- -1 terminates FAT streams
- -2 indicates free blocks
- Valid disk block index next block in stream
- FAT streams properly terminated
- Free blocks properly marked
- Streams contain valid blocks only
- Streams do not share blocks
45Pointers
- Sets in model can include
- Primitive types (int, char, )
- Structs (identified by pointer to struct)
- Standard linked list example
- struct node int value node next
- set nodes of node
- relation next node, node
- for n in nodes, true ? n.next in nodes
- for n in nodes, true ? ?n,n.next? in next
46What About Corrupted Pointers?
- System only allows valid structs in model
- struct must be completely in valid memory
- one struct may be nested inside another struct
(but must agree on memory format) - If encounter invalid or null pointer, the
(invalid) struct does not appear in model - Implementation must track operations that affect
valid regions of address space - malloc, free
- mmap, munmap
47CTAS in Action
FAST at DFW TRACON
TMA at Fort Worth Center
48Usage Scenarios
- Reduced development effort
- Invest less effort in finding and fixing bugs
- Rely on repair to deliver reliable system
- Afraid to fix bug
- Cheap insurance policy
- No good quantitative justification
- But repair seems like a good idea
49Current Work
- Support for recursive data structures
- Support for adding or removing individual
elements - Support for acyclicity constraints
- Repairing back-links
50Issues
- Unclear relationship between repaired bits and
bits from correct execution of program - Identifying results involving repaired data
- Characterizing likely errors
- Data races in multithreaded programs
- Failure to update correlated data structures
- Caching inconsistencies
- Unanticipated failures/exit points
- Constraint language expressivity
- Coverage of desired properties
- Limitations from acyclicity requirement
- When to test for consistency and repair
51What About Corrupted Pointers?
- Sets may contain pointers to structs
- System only allows valid structs in model
- struct must be completely in valid memory
- one struct may be nested inside another struct
(but must agree on memory format)
Invalid Struct
Valid Struct
Valid Structs
Valid Memory
Invalid Memory
52Interesting Nuggets
- Small specifications
- Global invariant advantages