Client-Driven Pointer Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Client-Driven Pointer Analysis

Description:

Client-Driven Pointer Analysis Samuel Z. Guyer Calvin Lin June 2003 T H E U N I V E R S I T Y O F T E X A S A T A U S T I N – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 41
Provided by: SamuelZ151
Learn more at: http://www.cs.tufts.edu
Category:

less

Transcript and Presenter's Notes

Title: Client-Driven Pointer Analysis


1
Client-Driven Pointer Analysis
  • Samuel Z. Guyer
  • Calvin Lin
  • June 2003

2
Using Pointer Analysis
  • Pointer analysis not a stand-alone analysis
  • Supports other client analyses
  • Current practice
  • Client analysis well focus on error detection
  • Pointer analysis algorithm choose precision

Pointer Analyzer
Client Analysis
Error Detector
Memory Model
3
Motivation
  • Real-life scenario
  • Check for security vulnerabilities in BlackHole
    mail filter
  • Manually inspect reported errors
  • One thing in common a string processing routine
  • Clone procedure ad hoc context sensitivity
  • Using CIFI, all 85 false positives go away
  • Can we automate this process?

Pointer Analyzer
Error Detector
Memory Model
4
Our solution
  • Problems
  • Cost-benefit tradeoff severe for pointer
    analysis
  • Precision choices are too coarse
  • Choice is made a priori by the compiler writer
  • Solution Mixed precision analysis
  • Apply higher precision where its needed
  • Use cheap analysis elsewhere
  • Key Let the needs of client drive precision
  • Customized precision policy created during
    analysis

5
Client-Driven Pointer Analysis
  • Algorithm
  • Start with fast cheap analysis FI and CI
  • Monitor how imprecision causes information loss
  • Adapt Reanalyze with a customized precision
    policy

Pointer Analyzer
Client Analysis
Memory Model
6
Overview
  • Motivation
  • Our algorithm
  • Automatically discover what the client needs
  • Experiments
  • Real programs and challenging error detection
    problems
  • Related work and conclusions

7
False Positives
Remote access vulnerability
  • Example

sock socket(AF_INET, SOCK_STREAM,
0) read(sock, buffer, 100) execl(buffer)
!
8
Client-Driven Pointer Analysis
Pointer Analyzer
Client Analysis
Memory Model
9
Analysis framework
  • Iterative dataflow analysis
  • Pointer analysis flow values are points-to sets
  • Client analysis flow values form typestate
    lattice
  • Fine-grained precision policies
  • Context sensitivity per procedure
  • CS Clone or inline procedure invocation
  • CI Merge values from all call sites
  • Flow sensitivity per memory location
  • FS Build factored use-def chains
  • FI Merge all assignments into a single flow value

10
Client-Driven Pointer Analysis
Pointer Analyzer
Client Analysis
Memory Model
11
Monitor
  • Runs alongside main analysis
  • Monitors information loss
  • Detects polluting assignments
  • Merge two accurate flow values ? ambiguous value
  • Tracks complicit assignments
  • Passing an ambiguous value from one variable to
    another
  • Records in a dependence graph
  • For both pointer and client analyses

12
Dependence graph (I)
  • Polluting assignment
  • Add a node for the variable annotate with a
    diagnosis
  • Complicit assignment
  • Add an edge from left side back to right side

13
Dependence graph (II)
  • Indirect assignments

or
14
Adaptor
DependenceGraph
  • After analysis...
  • Start at the maybe error variables
  • Find all reachable nodes collect the diagnoses
  • Often a small subset of all imprecision

15
In action...
  • Monitor analysis
  • Polluting assignments
  • Diagnose and apply fix
  • In this case one procedure context-sensitive
  • Reanalyze

!
16
Programs
  • 18 real C programs
  • Unmodified source all the issues of production
    code
  • Many are system tools run in privileged mode
  • Representative examples

Name Description Priv Lines of code Procedures CFG nodes
muh IRC proxy ü 5K (25K) 84 5,191
blackhole E-mail filter ü 12K (244K) 71 21,370
wu-ftpd FTP daemon ü 22K (66K) 205 23,107
named DNS server ü 26K (84K) 210 25,452
nn News reader û 36K (116K) 494 46,336
17
Methodology
  • 5 typestate error checkers
  • Represent non-trivial program properties
  • Stress the pointer analyzer
  • Compare client-driven with fixed-precision
  • Goals
  • First, reduce number of errors reported
  • Conservative analysis fewer is better
  • Second, reduce analysis time

18
Results
Remote access vulnerability
10X
19
Why it works
Name Total procs procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive
Name Total procs Remote Access File Access FSV RFSV FTP
muh 84 6
apache 313 8 2 2 10
blackhole 71 2 5
wu-ftpd 205 4 4 17
named 210 1 2 1 4
cfengine 421 4 1 3 31
nn 494 2 1 1 30
  • Notice
  • Different clients have different precision
    requirements
  • Amount of extra precision is small

20
Related work
  • Pointer analysis and typestate error checking
  • Iterative flow analysis Plevyak Chien 94
  • Demand-driven pointer analysis Heintze
    Tardieu 01
  • Combined pointer analysis Zhang, Ryder, Landi
    98
  • Effects of pointer analysis precision Hind 01
    others
  • More precision is more costly
  • Does it help? Is it worth the cost?

21
Conclusions
  • Client-driven pointer analysis
  • Precision should match the client and program
  • Not all pointers are equal
  • Need fine-grained precision policies
  • Key knowing where to add more and what kind
  • Roadmap for scalability
  • Use more expensive analysis on small parts of
    progams

22
(No Transcript)
23
Time
24
Precision policies
Name procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive procedures context-sensitive variables flow-sensitive variables flow-sensitive variables flow-sensitive variables flow-sensitive variables flow-sensitive
Name RA File FSV RFSV FTP RA File FSV RFSV FTP
muh 6 0.1 0.07 0.31
apache 8 2 2 10 0.89 0.18 0.91 1.07 0.83
blackhole 2 5 0.24 0.04 0.32
wu-ftpd 4 4 17 0.63 0.09 0.51 0.53 0.23
named 1 2 1 4 0.14 0.01 0.23 0.20 0.42
cfengine 4 1 3 31 0.43 0.04 0.46 0.48 0.03
nn 2 1 1 30 1.82 0.17 1.99 2.03 0.97
25
(No Transcript)
26
Error detection problems
  • Remote access vulnerabillity
  • File access
  • Format string vulnerability (FSV)
  • Remote FSV
  • FTP behavior

Data from an Internet socket should not specify a
program to execute
Files must be open when accessed
Format string may not contain untrusted data
Check if FSV is remotely exploitable
Can this program be tricked into reading and
transmitting arbitrary files
27
Annotations (I)
  • Dependence and pointer information
  • Describe pointer structures
  • Indicate which objects are accessed and modified

procedure fopen(pathname, mode) on_entry
pathname --gt path_string mode --gt
mode_string access path_string,
mode_string on_exit return --gt new
file_stream
28
Annotations (II)
  • Library-specific properties
  • Dataflow lattices

property State Open, Closed initially
Open property Kind File,
Socket Local, Remote


Remote
Local
Open
Closed
Socket
File


29
Annotations (III)
  • Library routine effects
  • Dataflow transfer functions

procedure socket(domain, type, protocol)
analyze Kind if (domain AF_UNIX)
IOHandle lt- Local if (domain AF_INET)
IOHandle lt- Remote analyze State
IOHandle lt- Open on_exit return --gt new
IOHandle
30
Annotations (IV)
  • Reports and transformations

procedure execl(path, args) on_entry path
--gt path_string report if (Kind
path_string could-be Remote) Error at
callsite remote access procedure
slow_routine(first, second) when (condition)
replace-with quick_check(first)
fast_routine(first, second)
31
Type Theory
  • Equivalent to dataflow analysis (heresy?)
  • Different in practice
  • Dataflow flow-sensitive problems, iterative
    analysis
  • Types flow-insensitive problems, constraint
    solver
  • Commonality
  • No magic bullet same cost for the same precision
  • Extracting the store model is a primary concern

Remember Phil Wadlers talk?
32
Is it correct?
  • Three separate questions
  • Are Sam Guyers experiments correct?
  • Yes, to the best of our knowledge
  • Checked PLAPACK results
  • Checked detected errors against known errors
  • Is our compiler implemented correctly?
  • Flip answer whos is?
  • Better answer testing suites
  • How do we validate a set of annotations?

33
Annotation correctness
  • Not addressed in my dissertation, but...
  • Theoretical approach
  • Does the library implement the domain?
  • Formally verify annotations against
    implementation
  • Practical approach
  • Annotation debugger interactive
  • Automated assistance in early stages of
    development
  • Middle approach
  • Basic consistency checks

34
Error Checking vs Optimization
  • Optimistic
  • False positives allowed
  • It can even be unsound
  • Tend to be may analyses
  • Correctness is absolute
  • Black and white
  • Certify programs bug-free
  • Cost tolerant
  • Explore costly analysis
  • Pessimistic
  • Must preserve semantics
  • Soundness mandatory
  • Tend to be must analyses
  • Performance is relative
  • Spectrum of results
  • No guarantees
  • Cost sensitive
  • Compile-time is a factor

35
Complexity
  • Pointer analysis
  • Address taken linear
  • Steensgaard almost linear (log log n factor)
  • Anderson polynomial (cubic)
  • Shape analysis double exponential
  • Dataflow analysis
  • Intraprocedural polynomial (height of lattice)
  • Context-sensitivity exponential (call graph)
  • Rarely see worst-case

36
Find the error part 3
  • State-of-the-art compiler

struct __sue_23 var_72 struct __sue_25 new_f
(struct __sue_25 ) malloc(sizeof (struct
__sue_25)) _IO_no_init( new_f-gtfp.file, 1, 0,
((void ) 0), ((void ) 0)) (
new_f-gtfp)-gtvtable _IO_file_jumps _IO_file_in
it( new_f-gtfp) if (_IO_file_fopen((struct
__sue_23 ) new_f, filename, mode, is32) !
((void ) 0))   var_72 new_f-gtfp.file   if
((var_72-gt_flags2 1) (var_72-gt_flags 8))
    if (var_72-gt_mode lt 0) ((struct __sue_23
) var_72)-gtvtable _IO_file_jumps_maybe_mmap
    else ((struct __sue_23 ) var_72)-gtvtable
_IO_wfile_jumps_maybe_mmap    
var_72-gt_wide_data-gt_wide_vtable
_IO_wfile_jumps_maybe_mmap   if
(var_72-gt_flags 8192U) _IO_un_link((struct
__sue_23 ) var_72) if (var_72-gt_flags 8192U)
status _IO_file_close_it(var_72)   else status
var_72-gt_flags 32U ? - 1 0 (( (struct
_IO_jump_t ) ((void ) ( ((struct __sue_23 )
(var_72))-gtvtable)                             
(var_72)-gt_vtable_offset))-gt__finish)(var_72,
0) if (var_72-gt_mode lt 0)   if
(((var_72)-gt_IO_save_base ! ((void ) 0)))
_IO_free_backup_area(var_72) if (var_72 !
((struct __sue_23 ) ( _IO_2_1_stdin_))
var_72 ! ((struct __sue_23 ) (
_IO_2_1_stdout_))     var_72 ! ((struct
__sue_23 ) ( _IO_2_1_stderr_)))
var_72-gt_flags 0  
free(var_72)
bytes_read _IO_sgetn(var_72, (char ) var_81,
bytes_requested)
37
Challenge 2 Scope
  • Call graph
  • Objects flow throughout program
  • No scoping constraints
  • Objects referenced through pointers
  • We need whole-program analysis

!
sock (AF_INET, SOCK_STREAM, 0)
(sock, buffer, 100) (ref)
socket
read
execl
38
The Broadway Compiler
  • Broadway source-to-source C compiler
  • Domain-independent compiler mechanisms
  • Annotations lightweight specification language
  • Domain-specific analyses and transformations
  • Many libraries, one compiler

39
Security vulnerabilities
  • How does remote hacking work?
  • Most are not direct attacks (e.g., cracking
    passwords)
  • Idea trick a program into unintended behavior
  • Automated vulnerability detection
  • How do we define intended?
  • Difficult to formalize and check application
    logic
  • Libraries control all critical system
    services
  • Communication, file access, process control
  • Analyze routines to approximate vulnerability

40
End backup slides
Write a Comment
User Comments (0)
About PowerShow.com