Title: Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detec
1Towards Automatic Discovery of Deviations in
Binary Implementations with Applications to Error
Detection and Fingerprint Generation
- David Brumley, Juan Caballero, Zhenkai Liang,
James Newsome, and Dawn Song - Usenix Security 2007 Best Paper
2Identifying differences in implementations of a
protocol
GET /index.html HTTP1.1 Host 10.0.0.21
A
?
B
3What is a Deviation?
RPC(version,flags, seqnum,getTime)
A
Error
B
1216
4What is a Deviation?
RPC(version,flags, seqnum,getTime)
A
1216
B
1218
5Authors accomplishments
- Works on binary implementations
- x86
- No source code
- No need to understand the protocol
- (Except for eliminating false positives)
- Requires one (or a few) sample protocol runs.
- Tested
- 2 NTP implementations
- 3 HTTPD implementations
6Classic Approach
g23WsbT1CqcVGzXO2cVWG MHrr8eVG2jtDbHfTYNFy3DMb WJK
i3m6K3JJCf34j7w8BaeHxi Eqd3EK2enCqoFTdJak3NlF01t c
R4w6QsxnJ8QXCbjw9xpUvVL
A
?
B
7Their Approach
Scotch Macaskill
8Their Approach
www.freewebs.com/tigerstemple/furcolors.htm
9Their Approach
John Schwieder / Accent Alaska
10SAT Solver
11Causes of differences
- Coding errors
- Ambiguous specification
- Corner cases
- Bugs
12Differences are useful
GET /index.html HTTP1.1 Host 10.0.0.21
A
ABC5EFG
B
ABCDEFG
13Differences are useful
- Avoid formal specification
- Compare to another known good design
- No model necessary
GET /index.html HTTP1.1 Host 10.0.0.21
Reference Design
?
A
14Differences are useful
- Check a reimplementation
- Compare to another known good design
- Lost source code
GET /index.html HTTP1.1 Host 10.0.0.21
Unknown Design
?
Reimple- mentation
15This talk
- Intuition
- Details
- Execution tracing of the binary code.
- Simplification and symbolic execution of it.
- Boolean formula generation.
- SAT solver.
- Checking the result.
- Evaluation
16Intuition
- Given two programs P1, P2 from input x to output
s - Find x such that P1(x) ! P2(x)
- Translate into f1(x) such that
- f1(x) True when P1(x) s
- Find x such that
- f1(x) ? f2(x) is satisifed
x
P
s
17Intuition
- Weakest precondition
- wp(P,Q)x True when evaluating program P on x
terminates in a state where Q is true
All of x
Q may be false
wp(P,Q) Q is true
Q always satisfied
18Intuition
19The algebra
- Weakest precondition
- wp(P,Q)x True when evaluating program P on x
terminates in a state where Q is true - Given f1 wp(P1,Q1) f2
- Q is true when outputs are equivalent.
- Q1 Q ... Q2 Q ...
- We have equivalance when
- (f1 ? f2) is true with a SAT solver.
20Details The four stages
- Log an execution trace of the binary
- Generate a boolean symbolic formula
- Translate into a simplified IR
- Generate the post-condition Q
- Generate the weakest precondition f
- Invoke a SAT solver
- Verify the difference
- Invoke the appication
- Human examination
211. Record execution trace2a. And translate into
IR.
E
C
4
8
2
A
R1 3 R2 4 R3 (Rb) R4 (R0R1) R5
(R0R2) R6 (R0R3) R7 R4R3 R8 R5-R6 (Ra1)
0
R1 3 R2 4 R3 5 R4 input2 R5 input3 R6
input4 R7 R4R3 R8 R5-R6 (Ra1) 0
222b. Finding postcondition
- The output should be s
- Plus side conditions
- Execution path must follow every jump identical
to the trace. - No data-dependent-jumps may be different
- May be weakened
232c. Finding the weakest precondition
- Optimize/Simplify the IR
- General compiler techniques.
- SSA form, random other stuff.
- Translate to GCL
lhs e // lhs register or ram Assume e
// assumes e is true (used for
conditionals) Assert e // e must be true for
execution to continue s1 s2 // Statement
s1 then s2 s1 ? s2 // Used for conditional
242c. Finding the weakest precondition
- Translate into GCL
- Add in asserts before every branch
- Assert all of the output bytes
- Compute wp(P,Q) from GCL
- Reverse walk over program.
- Involves converting arithmatic expressions into
boolean formulas.
25Problem Memory reads and writes
- HACK Add clause to postcondition Q do not
consider executions that access outside of set X - For reads
- Use hack, with X address used in trace
- For writes
- Use hack, with X address used in trace
26Enjoy some relief
http//www.hp.uab.edu/image_archive/ulg/ulgb.html
273 Run a SAT solver
http//www.inventgeek.com/Projects/alpharad/Page1.
aspx
284. Verify difference
- Feed through the original program
- Confirm its real
- Needs human to check
- Protocol knowledge
- to make sure it is real semantic difference
29Software tools
- BitBlaze for binary analysis, IR, and GCL
conversion and weakest precondition. - STP SAT solver
- Designed for bit vectors
30Evaluation HTTP
GET /index.html HTTP1.1 Host 10.0.0.21
31Bugs found
- Server M
- Does not verify the / is a / as required
- Accepts illegal values in the version string
- Server S
- May return File Not Found instead of a
well-formed 404 response
GET /index.html HTTP1.1 Host 10.0.0.21
32Evaluation NTP
- Two implementations
- Two differences
- Unused args were treated differently
- With a mode field set to illegal value
- Server X replied, server Y ignored it
- Both OK. Follow different inconsistent versions
of spec - Acts as a fingerprint
- Domain knowledge needed
33Evaluation Performance
- HTTP
- lt60s for everything
- NTP
- lt10 seconds for everything
34Related work
- Symbolic execution
- Static source code analysis
- Protocol error detection
- Protocol fingerprinting
35Future work
- Analyze multi-round protocol interactions
- Does not cover rarely used paths
- Must have sample input for every execution path
- Online formula generation
36Conclusion
37My thoughts
- fuzztesting would have found the described bugs
- Take existing message, randomly mutate individual
bytes - Although maybe not as fast