Efficient Dynamic Detection of Input-Related Security Faults - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Dynamic Detection of Input-Related Security Faults

Description:

Efficient Dynamic Detection of Input-Related Security Faults Eric Larson Dissertation Defense University of Michigan April 29, 2004 Security Faults Keeping computer ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 42
Provided by: EricL60
Category:

less

Transcript and Presenter's Notes

Title: Efficient Dynamic Detection of Input-Related Security Faults


1
Efficient Dynamic Detection of Input-Related
Security Faults
  • Eric Larson
  • Dissertation Defense
  • University of Michigan
  • April 29, 2004

2
Security Faults
  • Keeping computer data and accesses secure is a
    tough problem
  • Software errors cost companies millions of
    dollars
  • Different types of errors can lead to exploits
  • Protocol errors
  • Configuration errors
  • Implementation errors (most common)
  • Even with a well-designed security protocol, a
    program can be compromised if it contains bugs!

3
Input-Related Software Faults
  • Common implementation error is to improperly
    bound input data
  • checks are not present in many cases
  • when checks are present, they can be wrong
  • especially important for network data
  • Common security exploit buffer overflow
  • array references
  • string library functions in C
  • Widespread problem
  • 2/3 of CERT security advisories in 2003 were due
    to buffer overflows
  • buffer overflow bugs have recently been found in
    Windows and Linux

4
Example Buffer Overflow Attack
  • Attacking the program involves two steps

foo
bar
5
Overwriting the Return Address
void bar() char buffer100
gets(buffer) printf(String is s, buffer)

Return address
temporary value 1
temporary value 2
buf99
buf98


buf0
Stack grows to lower addresses
Data grows to higher addresses
6
Overwriting the Return Address
void bar() char buffer100
gets(buffer) printf(String is s, buffer)

0xbadc0de
0xbadc0de
0xbadc0de
buf99
buf98


buf0
Stack grows to lower addresses
The location of the return address is not always
known, so overwrite everything!
Data grows to higher addresses
7
Outline of Talk
  • Background and Related Work (Ch. 2)
  • Detecting Input-Related Software Faults (Ch. 3)
  • MUSE Instrumentation Infrastructure (Ch. 4)
  • Implementation and Results (Ch. 5)
  • Reducing Performance Overhead (Ch. 6)
  • Conclusions (Ch. 7)

8
When Should I Look for Software Bugs?
  • Compile-time (static) bug detection
  • no dependence on input
  • can prove that a dangerous operation is safe in
    some cases
  • often computationally infeasible (too many states
    or paths)
  • scope is limited either high false alarm rate or
    low bug finding rate
  • hard to analyze heap data
  • Run-time (dynamic) bug detection
  • can analyze all variables (including those on
    the heap)
  • execution is on a real path ? fewer false
    alarms
  • error may not manifest as an error in the output
  • depends on program input
  • impacts performance of program

Our approach is dynamic, addressing its
deficiencies by borrowing ideas from static bug
detection
9
Contributions of this Thesis
  • Dynamically Detecting Input-Related Software
    Faults
  • Relaxes dependence on input
  • MUSE Instrumentation Infrastructure
  • Developed for rapid prototyping of bug detection
    tools for this and future research
  • Removing Unnecessary Instrumentation
  • Reduces performance overhead
  • Improved Shadow State Management
  • Tighter integration with the compiler, improves
    performance

10
Selected Related Work
  • Jones Kelly dynamic approach to catching
    memory access errors, tracks all valid objects in
    memory using a table
  • Tainted Perl prevents unsafe actions from
    unvalidated input
  • STOBO uses allocation sizes rather than string
    sizes
  • CCured type system used to catch memory access
    errors, instrumentation is added when static
    analysis fails
  • BOON derives and solves a system of integer
    range constraints statically to find buffer
    overruns
  • CSSV model checking system to find buffer
    overflows in C, keeps track of potential string
    lengths and null termination
  • MetaCompilation checks for uses of unbounded
    input, does not verify if the checks are correct

11
Detection of Input-Related Software Faults
  • Program instrumentation tracks data derived from
    input
  • possible range of integer variables
  • maximum size and termination of strings
  • Dangerous operations are checked over entire
    range of possible values
  • Found 17 bugs in 9 programs, including 2 known
    high security faults in OpenSSH

Relaxes constraint that the user provides an
input that exposes the bug
12
Detecting Array Buffer Overflows
  • Interval constraint variables are introduced when
    external inputs are read
  • Holds the lower and upper bounds for each input
    value
  • Initial values encompass the entire range
  • Control points narrow the bounds
  • Arithmetic operations adjust the bounds
  • Potentially dangerous operations are checked
  • Array indexing
  • Controlling a loop or memory allocation size
  • Arithmetic operations (overflow)

13
  • Code Sequence
  • int x
  • int array5
  • x get_input_int()
  • if (x lt 0 x gt 4)
  • fatal(bounds)
  • x
  • y arrayx
  • Range of x
  • -MAX_INT ? x ? MAX_INT
  • 0 ? x ? 4
  • 1 ? x ? 5
  • 1 ? x ? 5

Value of x 2 2 3 3
ERROR! When x 5, array reference is out of
bounds!
14
Detecting Dangerous String Operations
  • Strings are shadowed by
  • max_str_size largest possible size of the string
  • known_null set if string is known to contain a
    null character
  • Checking string operations
  • source string will fit into the destination
  • source strings are guaranteed to be null
    terminated
  • Operations involving a string length can narrow
    the maximum string size
  • our size counts the null character, the strlen
    function does not
  • Integers that store string lengths are shadowed
    by
  • base address of corresponding string
  • difference between its value and actual string
    length

15
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src MAX_INT TRUE
16
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst MAX_INT 16 16 TRUE FALSE FALSE
17
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst src MAX_INT 16 16 17 TRUE FALSE FALSE TRUE
18
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst src tmp MAX_INT 16 16 17 16 TRUE FALSE FALSE TRUE FALSE
19
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char tmp16 char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strncpy(tmp, src, 16) strcpy(dst, tmp) return dst src tmp dst src tmp MAX_INT 16 16 17 16 TRUE FALSE FALSE TRUE FALSE
ERROR! tmp may not be null terminated during
strcpy
20
String Fault Detection Example
Code Segment Str. max_str_size known_null
char bad_copy(char src) char dst (char)malloc(16) if (strlen(src) gt 16) return NULL strcpy(dst, src) return dst src dst src MAX_INT 16 17 TRUE FALSE TRUE
ERROR! src may not fit into dst during strcpy
21
MUSE Implementation Infrastructure
  • Developed for rapid prototyping of bug detection
    tools for this and future research
  • General-purpose instrumentation tool
  • can also be used to created profilers, coverage
    tools, and debugging aids
  • Implemented in GCC at the abstract syntax tree
    (AST) level
  • Simplification phase breaks up complex C
    statements
  • removes C side effects and other nuances
  • allows matching in the middle of a complex
    expression
  • Specification consists of pattern-function pairs
  • patterns match against statements, expressions,
    and special events
  • on a match, call is made to corresponding
    external function

22
Testing Process
23
Input Checker Implementation
  • Shadow state stores checker bookkeeping info
  • integers bounds and string length information
  • arrays maximum string size, null flag, and
    actual size
  • Stored in hash tables (shadow state table)
  • hash tables are indexed by address
  • separate hash tables for integers and arrays
  • Pointers use the array hash table
  • Debug tracing mode can help find source of error

x
Shadow State Table
int x
shadow state for x
lb 0 ub 5
24
Results Bugs Found
Program Description Defects Found Addl False Alarms
anagram anagram generator 2 0
ft fast Fourier transform 2 0
ks graph partitioning 3 0
yacr2 channel router 2 1
betaftpd file transfer protocol daemon 2 1
gaim instant messaging client 1 1
ghttpd web server 3 2
openssh secure shell client / server 2 0
thttpd web server 0 1
TOTAL TOTAL 17 6
25
Results Comparison to Static Approaches
  • Program
  • anagram
  • ft
  • ks
  • yacr2
  • betaftpd
  • gaim
  • ghttpd
  • openssh
  • thttpd

My approach 2 2 3 2 2 1 3 2 0
BOON 0 0 0 0 0 core dump 0 core dump 0
MetaCompilation Could not get access to their bug
detection system.
26
Initial Performance Results
27
Eliminating Unnecessary Instrumentation
  • Many variables do not need shadow state
  • Variables that never hold input data
  • Variables that do not produce results used in
    dangerous operations
  • Use static analysis to only apply instrumentation
    to variables that need shadow state
  • At least 83 of instrumentation sites are
    useless!
  • Algorithm is similar to that of constant
    propagation in a compiler
  • Implemented in Dflow, a whole program dataflow
    analysis tool we created

28
Example Removing Unneeded Instrumentation
  • int a, b, c, d, x5
  • a get_input_int()
  • b get_input_int()
  • c 2
  • d b
  • xa 3
  • xc 6
  • printf(d\n, d)

29
Example Removing Unneeded Instrumentation
  • int a, b, c, d, x5
  • create_array_state(x)
  • a get_input_int()
  • create_int_bound_state(a)
  • b get_input_int()
  • create_int_bound_state(b)
  • c 2
  • remove_int_state(c)
  • d b
  • copy_int_state(d, b)
  • check_array_ref(x, a)
  • xa 3
  • check_array_ref(x, c)
  • xc 6
  • printf(d\n, d)

30
Example Removing Unneeded Instrumentation
  • int a, b, c, d, x5
  • create_array_state(x)
  • a get_input_int()
  • create_int_bound_state(a)
  • b get_input_int()
  • create_int_bound_state(b)
  • c 2
  • remove_int_state(c)
  • d b
  • copy_int_state(d, b)
  • check_array_ref(x, a)
  • xa 3
  • check_array_ref(x, c)
  • xc 6
  • printf(d\n, d)

Unnecessary! c never holds input data
31
Example Removing Unneeded Instrumentation
  • int a, b, c, d, x5
  • create_array_state(x)
  • a get_input_int()
  • create_int_bound_state(a)
  • b get_input_int()
  • create_int_bound_state(b)
  • c 2
  • remove_int_state(c)
  • d b
  • copy_int_state(d, b)
  • check_array_ref(x, a)
  • xa 3
  • check_array_ref(x, c)
  • xc 6
  • printf(d\n, d)

Unnecessary! input value in b never used in
dangerous operation
32
Results Removing Unneeded Instrumentation
33
Results Removing Unneeded Instrumentation
34
Approaches to Shadow State Management
  • Shadow state table (Example Jones Kelly)
  • Slow to maintain and access
  • Does not modify the variables within the program
  • Fat variables (Example Safe C)
  • Fast to access, shadow state is contained within
    the variable
  • Variables no longer fit in within a register
  • All variables of a particular type must be
    instrumented
  • Must account for functions that were not compiled
    using fat variables

35
Referencing Local Shadow State by Name
  • Compiler creates separate variable to store
    shadowed state for local variables
  • Quick to access, lookup to table not necessary
  • Original variable is not modified in any form
  • Only created for local variables that need
    shadowed state
  • Still need shadow state table for
  • heap variables
  • aliased local variables (used in the address-of
    () operator)

36
Results Shadow State by Name (Performance)
37
Results Shadow State by Name (Integer Shadow
State Table Accesses)
38
Overall Performance Results
39
Conclusion
  • Our dynamic approach detects input-related faults
    reducing the dependence on the precise input
  • Shadows variables derived from input with
    additional state
  • Integers upper and lower bounds
  • Strings maximum string size and known null flag
  • Found 17 bugs in 9 programs
  • 2 known high security faults in OpenSSH
  • Improved performance by 58
  • removing unneeded instrumentation sites
  • improved shadow state management

40
Future Work
  • Reduce the dependence on the control path
  • Improve performance overhead by eliminating
    redundant instrumentation
  • Add symbolic analysis support
  • Address these common scenarios
  • pointer walking (manual string handling)
  • multiple string concatenation into a single
    buffer
  • Add static bug detection work to prove operations
    safe
  • Combine MUSE and Dflow into a single standalone
    tool
  • Explore other correctness properties

41
Questions and Answers
42
Inserting Malicious Code
  • The injected code is typically very simple
    often a lone system call that invokes a shell
  • Do not know the precise address ahead of time
  • Keep on guessing until you get it right
  • Precede code with a sequence of nops to reduce
    the number of guesses
  • Disassembling the code can help
  • Malicious code need not reside on the stack
    (Example environment variable)
  • Also possible to exploit a buffer overflow on the
    heap

43
Software Verification
  • Verification determines if a program is
    functionally correct
  • Complete program verification only possible for
    trivial programs
  • Instead, programs are shown to satisfy properties
  • that are simple
  • that have well-known behavior
  • Verification schemes are gauged by
  • soundness every possible error is found
  • completeness every reported error is a true error

44
Typical Static Bug Detection Scheme
Parse
Program
Remove parts of code not relevant to property
Abstract
Optimize
Correctness Specification
Translate
Program Model
Can be done using model checker, theorem
prover, constraint solver, or interpreter
Check
45
Dynamic Bug Detection Systems
  • Bug prevention schemes
  • used in the field, needs to be fast
  • add safety checks around dangerous operations
  • bugs are still present
  • Bug detection schemes
  • designed to be used during testing
  • finding bugs is more important than speed
  • high performance overhead
  • typically use shadow state to find bugs that do
    not manifest in an output error

46
Example Static Bug Detection Systems
  • SLAM Uses predicate abstraction to create a
    Boolean program that is used to verify Windows
    device drivers.
  • PREfix Traverses the call graph bottom-up using
    summary models for analyzed functions.
  • ARCHER Uses static analysis and a constraint
    solver to find errors in the Linux kernel.
  • Splint Uses annotation to analyze programs for
    security vulnerabilities.
  • SPIN Designed for verifying distributed system
    protocols. The protocol must be manually written
    using PROMELA.

47
Tainted Data Analysis Algorithm
  • // Initialization
  • Tainted ?
  • InputFunctionCalls stmts that call
    input-producing functions
  • foreach stmt s
  • if (s ? InputFunctionCalls) then Tainted
    Tainted ? Defs(s)
  • // Iterate until Tainted set is stable
  • do
  • LastTainted Tainted
  • foreach stmt s
  • if (d ? Uses(s) s.t. d ? Tainted) then
    Tainted Tainted ? Defs(s)
  • while (LastTainted ? Tainted)
  • // At end, Tainted contains definitions derived
    from input
Write a Comment
User Comments (0)
About PowerShow.com