Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detec - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detec

Description:

HACK: Add clause to postcondition Q do not consider executions that access outside of set X' ... Use hack, with X = address used in trace. Enjoy some relief ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 38
Provided by: scro3
Category:

less

Transcript and Presenter's Notes

Title: Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detec


1
Towards Automatic Discovery of Deviations in
Binary Implementations with Applications to Error
Detection and Fingerprint Generation
  • David Brumley, Juan Caballero, Zhenkai Liang,
    James Newsome, and Dawn Song
  • Usenix Security 2007 Best Paper

2
Identifying differences in implementations of a
protocol
GET /index.html HTTP1.1 Host 10.0.0.21
A
?
B
3
What is a Deviation?
RPC(version,flags, seqnum,getTime)
A
Error
B
1216
4
What is a Deviation?
RPC(version,flags, seqnum,getTime)
A
1216
B
1218
5
Authors accomplishments
  • Works on binary implementations
  • x86
  • No source code
  • No need to understand the protocol
  • (Except for eliminating false positives)
  • Requires one (or a few) sample protocol runs.
  • Tested
  • 2 NTP implementations
  • 3 HTTPD implementations

6
Classic Approach
  • Fuzz testing.

g23WsbT1CqcVGzXO2cVWG MHrr8eVG2jtDbHfTYNFy3DMb WJK
i3m6K3JJCf34j7w8BaeHxi Eqd3EK2enCqoFTdJak3NlF01t c
R4w6QsxnJ8QXCbjw9xpUvVL
A
?
B
7
Their Approach
Scotch Macaskill
8
Their Approach
www.freewebs.com/tigerstemple/furcolors.htm
9
Their Approach
John Schwieder / Accent Alaska
10
SAT Solver
11
Causes of differences
  • Coding errors
  • Ambiguous specification
  • Corner cases
  • Bugs

12
Differences are useful
  • Fingerprint applications

GET /index.html HTTP1.1 Host 10.0.0.21
A
ABC5EFG
B
ABCDEFG
13
Differences are useful
  • Avoid formal specification
  • Compare to another known good design
  • No model necessary

GET /index.html HTTP1.1 Host 10.0.0.21
Reference Design
?
A
14
Differences are useful
  • Check a reimplementation
  • Compare to another known good design
  • Lost source code

GET /index.html HTTP1.1 Host 10.0.0.21
Unknown Design
?
Reimple- mentation
15
This talk
  • Intuition
  • Details
  • Execution tracing of the binary code.
  • Simplification and symbolic execution of it.
  • Boolean formula generation.
  • SAT solver.
  • Checking the result.
  • Evaluation

16
Intuition
  • Given two programs P1, P2 from input x to output
    s
  • Find x such that P1(x) ! P2(x)
  • Translate into f1(x) such that
  • f1(x) True when P1(x) s
  • Find x such that
  • f1(x) ? f2(x) is satisifed

x
P
s
17
Intuition
  • Weakest precondition
  • wp(P,Q)x True when evaluating program P on x
    terminates in a state where Q is true

All of x
Q may be false
wp(P,Q) Q is true
Q always satisfied
18
Intuition
19
The algebra
  • Weakest precondition
  • wp(P,Q)x True when evaluating program P on x
    terminates in a state where Q is true
  • Given f1 wp(P1,Q1) f2
  • Q is true when outputs are equivalent.
  • Q1 Q ... Q2 Q ...
  • We have equivalance when
  • (f1 ? f2) is true with a SAT solver.

20
Details The four stages
  • Log an execution trace of the binary
  • Generate a boolean symbolic formula
  • Translate into a simplified IR
  • Generate the post-condition Q
  • Generate the weakest precondition f
  • Invoke a SAT solver
  • Verify the difference
  • Invoke the appication
  • Human examination

21
1. Record execution trace2a. And translate into
IR.
E
C
4
8
2
A
R1 3 R2 4 R3 (Rb) R4 (R0R1) R5
(R0R2) R6 (R0R3) R7 R4R3 R8 R5-R6 (Ra1)
0
R1 3 R2 4 R3 5 R4 input2 R5 input3 R6
input4 R7 R4R3 R8 R5-R6 (Ra1) 0
22
2b. Finding postcondition
  • The output should be s
  • Plus side conditions
  • Execution path must follow every jump identical
    to the trace.
  • No data-dependent-jumps may be different
  • May be weakened

23
2c. Finding the weakest precondition
  • Optimize/Simplify the IR
  • General compiler techniques.
  • SSA form, random other stuff.
  • Translate to GCL

lhs e // lhs register or ram Assume e
// assumes e is true (used for
conditionals) Assert e // e must be true for
execution to continue s1 s2 // Statement
s1 then s2 s1 ? s2 // Used for conditional
24
2c. Finding the weakest precondition
  • Translate into GCL
  • Add in asserts before every branch
  • Assert all of the output bytes
  • Compute wp(P,Q) from GCL
  • Reverse walk over program.
  • Involves converting arithmatic expressions into
    boolean formulas.

25
Problem Memory reads and writes
  • HACK Add clause to postcondition Q do not
    consider executions that access outside of set X
  • For reads
  • Use hack, with X address used in trace
  • For writes
  • Use hack, with X address used in trace

26
Enjoy some relief
http//www.hp.uab.edu/image_archive/ulg/ulgb.html
27
3 Run a SAT solver
http//www.inventgeek.com/Projects/alpharad/Page1.
aspx
28
4. Verify difference
  • Feed through the original program
  • Confirm its real
  • Needs human to check
  • Protocol knowledge
  • to make sure it is real semantic difference

29
Software tools
  • BitBlaze for binary analysis, IR, and GCL
    conversion and weakest precondition.
  • STP SAT solver
  • Designed for bit vectors

30
Evaluation HTTP
  • HTTP servers

GET /index.html HTTP1.1 Host 10.0.0.21
31
Bugs found
  • Server M
  • Does not verify the / is a / as required
  • Accepts illegal values in the version string
  • Server S
  • May return File Not Found instead of a
    well-formed 404 response

GET /index.html HTTP1.1 Host 10.0.0.21
32
Evaluation NTP
  • Two implementations
  • Two differences
  • Unused args were treated differently
  • With a mode field set to illegal value
  • Server X replied, server Y ignored it
  • Both OK. Follow different inconsistent versions
    of spec
  • Acts as a fingerprint
  • Domain knowledge needed

33
Evaluation Performance
  • HTTP
  • lt60s for everything
  • NTP
  • lt10 seconds for everything

34
Related work
  • Symbolic execution
  • Static source code analysis
  • Protocol error detection
  • Protocol fingerprinting

35
Future work
  • Analyze multi-round protocol interactions
  • Does not cover rarely used paths
  • Must have sample input for every execution path
  • Online formula generation

36
Conclusion
  • It works

37
My thoughts
  • fuzztesting would have found the described bugs
  • Take existing message, randomly mutate individual
    bytes
  • Although maybe not as fast
Write a Comment
User Comments (0)
About PowerShow.com