Efficient Dynamic Detection of Input-Related Security Faults - PowerPoint PPT Presentation

About This Presentation

Title:

Efficient Dynamic Detection of Input-Related Security Faults

Description:

Efficient Dynamic Detection of Input-Related Security Faults Eric Larson Dissertation Defense University of Michigan April 29, 2004 Security Faults Keeping computer ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 42

Provided by: EricL60

Learn more at: http://web.eecs.umich.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Dynamic Detection of Input-Related Security Faults

1
Efficient Dynamic Detection of Input-Related
Security Faults

Eric Larson
Dissertation Defense
University of Michigan
April 29, 2004

2
Security Faults

Keeping computer data and accesses secure is a
tough problem
Software errors cost companies millions of
dollars
Different types of errors can lead to exploits
Protocol errors
Configuration errors
Implementation errors (most common)
Even with a well-designed security protocol, a
program can be compromised if it contains bugs!

3
Input-Related Software Faults

Common implementation error is to improperly
bound input data
checks are not present in many cases
when checks are present, they can be wrong
especially important for network data
Common security exploit buffer overflow
array references
string library functions in C
Widespread problem
2/3 of CERT security advisories in 2003 were due
to buffer overflows
buffer overflow bugs have recently been found in
Windows and Linux

4
Example Buffer Overflow Attack

Attacking the program involves two steps

foo
bar
5
Overwriting the Return Address
void bar() char buffer100
gets(buffer) printf(String is s, buffer)

Return address
temporary value 1
temporary value 2
buf99
buf98

buf0
Stack grows to lower addresses
Data grows to higher addresses
6
Overwriting the Return Address
void bar() char buffer100
gets(buffer) printf(String is s, buffer)

0xbadc0de
0xbadc0de
0xbadc0de
buf99
buf98

buf0
Stack grows to lower addresses
The location of the return address is not always
known, so overwrite everything!
Data grows to higher addresses
7
Outline of Talk

Background and Related Work (Ch. 2)
Detecting Input-Related Software Faults (Ch. 3)
MUSE Instrumentation Infrastructure (Ch. 4)
Implementation and Results (Ch. 5)
Reducing Performance Overhead (Ch. 6)
Conclusions (Ch. 7)

8
When Should I Look for Software Bugs?

Compile-time (static) bug detection
no dependence on input
can prove that a dangerous operation is safe in
some cases
often computationally infeasible (too many states
or paths)
scope is limited either high false alarm rate or
low bug finding rate
hard to analyze heap data
Run-time (dynamic) bug detection
can analyze all variables (including those on
the heap)
execution is on a real path ? fewer false
alarms
error may not manifest as an error in the output
depends on program input
impacts performance of program

Our approach is dynamic, addressing its
deficiencies by borrowing ideas from static bug
detection
9
Contributions of this Thesis

Dynamically Detecting Input-Related Software
Faults
Relaxes dependence on input
MUSE Instrumentation Infrastructure
Developed for rapid prototyping of bug detection
tools for this and future research
Removing Unnecessary Instrumentation
Reduces performance overhead
Improved Shadow State Management
Tighter integration with the compiler, improves
performance

10
Selected Related Work

Jones Kelly dynamic approach to catching
memory access errors, tracks all valid objects in
memory using a table
Tainted Perl prevents unsafe actions from
unvalidated input
STOBO uses allocation sizes rather than string
sizes
CCured type system used to catch memory access
errors, instrumentation is added when static
analysis fails
BOON derives and solves a system of integer
range constraints statically to find buffer
overruns
CSSV model checking system to find buffer
overflows in C, keeps track of potential string
lengths and null termination
MetaCompilation checks for uses of unbounded
input, does not verify if the checks are correct

11
Detection of Input-Related Software Faults

Program instrumentation tracks data derived from
input
possible range of integer variables
maximum size and termination of strings
Dangerous operations are checked over entire
range of possible values
Found 17 bugs in 9 programs, including 2 known
high security faults in OpenSSH

Relaxes constraint that the user provides an
input that exposes the bug
12
Detecting Array Buffer Overflows

Interval constraint variables are introduced when
external inputs are read
Holds the lower and upper bounds for each input
value
Initial values encompass the entire range
Control points narrow the bounds
Arithmetic operations adjust the bounds
Potentially dangerous operations are checked
Array indexing
Controlling a loop or memory allocation size
Arithmetic operations (overflow)

Code Sequence
int x
int array5
x get_input_int()
if (x lt 0 x gt 4)
fatal(bounds)
x
y arrayx

Range of x
-MAX_INT ? x ? MAX_INT
0 ? x ? 4
1 ? x ? 5
1 ? x ? 5

Value of x 2 2 3 3
ERROR! When x 5, array reference is out of
bounds!
14
Detecting Dangerous String Operations

Strings are shadowed by
max_str_size largest possible size of the string
known_null set if string is known to contain a
null character
Checking string operations
source string will fit into the destination
source strings are guaranteed to be null
terminated
Operations involving a string length can narrow
the maximum string size
our size counts the null character, the strlen
function does not
Integers that store string lengths are shadowed
by
base address of corresponding string
difference between its value and actual string
length

15
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src MAX_INT TRUE
16
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst MAX_INT 16 16 TRUE FALSE FALSE
17
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst src MAX_INT 16 16 17 TRUE FALSE FALSE TRUE
18
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst src tmp MAX_INT 16 16 17 16 TRUE FALSE FALSE TRUE FALSE
19
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst src tmp MAX_INT 16 16 17 16 TRUE FALSE FALSE TRUE FALSE
ERROR! tmp may not be null terminated during
strcpy
20
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strcpy(dst, src) return dst src dst src MAX_INT 16 17 TRUE FALSE TRUE
ERROR! src may not fit into dst during strcpy
21
MUSE Implementation Infrastructure

Developed for rapid prototyping of bug detection
tools for this and future research
General-purpose instrumentation tool
can also be used to created profilers, coverage
tools, and debugging aids
Implemented in GCC at the abstract syntax tree
(AST) level
Simplification phase breaks up complex C
statements
removes C side effects and other nuances
allows matching in the middle of a complex
expression
Specification consists of pattern-function pairs
patterns match against statements, expressions,
and special events
on a match, call is made to corresponding
external function

22
Testing Process
23
Input Checker Implementation

Shadow state stores checker bookkeeping info
integers bounds and string length information
arrays maximum string size, null flag, and
actual size
Stored in hash tables (shadow state table)
hash tables are indexed by address
separate hash tables for integers and arrays
Pointers use the array hash table
Debug tracing mode can help find source of error

x
Shadow State Table
int x
shadow state for x
lb 0 ub 5
24
Results Bugs Found
Program Description Defects Found Addl False Alarms
anagram anagram generator 2 0
ft fast Fourier transform 2 0
ks graph partitioning 3 0
yacr2 channel router 2 1
betaftpd file transfer protocol daemon 2 1
gaim instant messaging client 1 1
ghttpd web server 3 2
openssh secure shell client / server 2 0
thttpd web server 0 1
TOTAL TOTAL 17 6
25
Results Comparison to Static Approaches

Program
anagram
ft
ks
yacr2
betaftpd
gaim
ghttpd
openssh
thttpd

My approach 2 2 3 2 2 1 3 2 0
BOON 0 0 0 0 0 core dump 0 core dump 0
MetaCompilation Could not get access to their bug
detection system.
26
Initial Performance Results
27
Eliminating Unnecessary Instrumentation

Many variables do not need shadow state
Variables that never hold input data
Variables that do not produce results used in
dangerous operations
Use static analysis to only apply instrumentation
to variables that need shadow state
At least 83 of instrumentation sites are
useless!
Algorithm is similar to that of constant
propagation in a compiler
Implemented in Dflow, a whole program dataflow
analysis tool we created

28
Example Removing Unneeded Instrumentation

int a, b, c, d, x5
a get_input_int()
b get_input_int()
c 2
d b
xa 3
xc 6
printf(d\n, d)

29
Example Removing Unneeded Instrumentation

int a, b, c, d, x5
create_array_state(x)
a get_input_int()
create_int_bound_state(a)
b get_input_int()
create_int_bound_state(b)
c 2
remove_int_state(c)
d b
copy_int_state(d, b)
check_array_ref(x, a)
xa 3
check_array_ref(x, c)
xc 6
printf(d\n, d)

30
Example Removing Unneeded Instrumentation

int a, b, c, d, x5
create_array_state(x)
a get_input_int()
create_int_bound_state(a)
b get_input_int()
create_int_bound_state(b)
c 2
remove_int_state(c)
d b
copy_int_state(d, b)
check_array_ref(x, a)
xa 3
check_array_ref(x, c)
xc 6
printf(d\n, d)

Unnecessary! c never holds input data
31
Example Removing Unneeded Instrumentation

int a, b, c, d, x5
create_array_state(x)
a get_input_int()
create_int_bound_state(a)
b get_input_int()
create_int_bound_state(b)
c 2
remove_int_state(c)
d b
copy_int_state(d, b)
check_array_ref(x, a)
xa 3
check_array_ref(x, c)
xc 6
printf(d\n, d)

Unnecessary! input value in b never used in
dangerous operation
32
Results Removing Unneeded Instrumentation
33
Results Removing Unneeded Instrumentation
34
Approaches to Shadow State Management

Shadow state table (Example Jones Kelly)
Slow to maintain and access
Does not modify the variables within the program
Fat variables (Example Safe C)
Fast to access, shadow state is contained within
the variable
Variables no longer fit in within a register
All variables of a particular type must be
instrumented
Must account for functions that were not compiled
using fat variables

35
Referencing Local Shadow State by Name

Compiler creates separate variable to store
shadowed state for local variables
Quick to access, lookup to table not necessary
Original variable is not modified in any form
Only created for local variables that need
shadowed state
Still need shadow state table for
heap variables
aliased local variables (used in the address-of
() operator)

36
Results Shadow State by Name (Performance)
37
Results Shadow State by Name (Integer Shadow
State Table Accesses)
38
Overall Performance Results
39
Conclusion

Our dynamic approach detects input-related faults
reducing the dependence on the precise input
Shadows variables derived from input with
additional state
Integers upper and lower bounds
Strings maximum string size and known null flag
Found 17 bugs in 9 programs
2 known high security faults in OpenSSH
Improved performance by 58
removing unneeded instrumentation sites
improved shadow state management

40
Future Work

Reduce the dependence on the control path
Improve performance overhead by eliminating
redundant instrumentation
Add symbolic analysis support
Address these common scenarios
pointer walking (manual string handling)
multiple string concatenation into a single
buffer
Add static bug detection work to prove operations
safe
Combine MUSE and Dflow into a single standalone
tool
Explore other correctness properties

41
Questions and Answers
42
Inserting Malicious Code

The injected code is typically very simple
often a lone system call that invokes a shell
Do not know the precise address ahead of time
Keep on guessing until you get it right
Precede code with a sequence of nops to reduce
the number of guesses
Disassembling the code can help
Malicious code need not reside on the stack
(Example environment variable)
Also possible to exploit a buffer overflow on the
heap

43
Software Verification

Verification determines if a program is
functionally correct
Complete program verification only possible for
trivial programs
Instead, programs are shown to satisfy properties
that are simple
that have well-known behavior
Verification schemes are gauged by
soundness every possible error is found
completeness every reported error is a true error

44
Typical Static Bug Detection Scheme
Parse
Program
Remove parts of code not relevant to property
Abstract
Optimize
Correctness Specification
Translate
Program Model
Can be done using model checker, theorem
prover, constraint solver, or interpreter
Check
45
Dynamic Bug Detection Systems

Bug prevention schemes
used in the field, needs to be fast
add safety checks around dangerous operations
bugs are still present
Bug detection schemes
designed to be used during testing
finding bugs is more important than speed
high performance overhead
typically use shadow state to find bugs that do
not manifest in an output error

46
Example Static Bug Detection Systems

SLAM Uses predicate abstraction to create a
Boolean program that is used to verify Windows
device drivers.
PREfix Traverses the call graph bottom-up using
summary models for analyzed functions.
ARCHER Uses static analysis and a constraint
solver to find errors in the Linux kernel.
Splint Uses annotation to analyze programs for
security vulnerabilities.
SPIN Designed for verifying distributed system
protocols. The protocol must be manually written
using PROMELA.

47
Tainted Data Analysis Algorithm