FSE 2014 Tutorial String Analysis - PowerPoint PPT Presentation

Loading...

PPT – FSE 2014 Tutorial String Analysis PowerPoint presentation | free to download - id: 6b4d83-OGM3M



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

FSE 2014 Tutorial String Analysis

Description:

FSE 2014 Tutorial String Analysis Tevfik Bultan University of California, Santa Barbara, USA bultan_at_cs.ucsb.edu Fang Yu National Chengchi University, Taiwan – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Date added: 12 April 2020
Slides: 103
Provided by: ValuedSon8
Learn more at: http://www.cs.ucsb.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: FSE 2014 Tutorial String Analysis


1
FSE 2014 Tutorial String Analysis
  • Tevfik Bultan
  • University of California, Santa Barbara, USA
  • bultan_at_cs.ucsb.edu
  • Fang Yu
  • National Chengchi University, Taiwan
  • yuf_at_nccu.edu.tw
  • Muath Alkhalaf
  • King Saud University, Saudi Arabia
  • muath_at_ksu.edu.sa

2
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

3
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

4
Anatomy of a Web Application
unsupscribe.php
Submit
DB
5
Inputs to Web Applications are Strings
6
Web Application Inputs are Strings
unsupscribe.php
Submit
DB
7
Input Needs to be Validated and/or Sanitized
unsupscribe.php
Submit
DB
8
Vulnerabilities in Web Applications
  • There are many well-known security
    vulnerabilities that exist in many web
    applications. Here are some examples
  • SQL injection where a malicious user executes
    SQL commands on the back-end database by
    providing specially formatted input
  • Cross site scripting (XSS) causes the attacker
    to execute a malicious script at a users browser
  • Malicious file execution where a malicious user
    causes the server to execute malicious code
  • These vulnerabilities are typically due to
  • errors in user input validation and sanitization
    or
  • lack of user input validation and sanitization

8
9
Web Applications are Full of Bugs
Source IBM X-Force report
10
Top Web Application Vulnerabilities
  • 2007
  • Injection Flaws
  • XSS
  • Malicious File Execution
  • 2010
  • Injection Flaws
  • XSS
  • Broken Auth. Session Management
  • 2013
  • Injection Flaws
  • Broken Auth. Session Management
  • XSS

11
As Percentage of All Vulnerabilities
  • SQL Injection, XSS, File Inclusion as percentage
    of all computer security vulnerabilities
    (extracted from the CVE repository)

12
Why Is Input Validation Error-prone?
  • Extensive string manipulation
  • Web applications use extensive string
    manipulation
  • To construct html pages, to construct database
    queries in SQL, etc.
  • The user input comes in string form and must be
    validated and sanitized before it can be used
  • This requires the use of complex string
    manipulation functions such as string-replace
  • String manipulation is error prone

12
13
String Related Vulnerabilities
  • String related web application vulnerabilities
    occur when
  • a sensitive function is passed a malicious string
    input from the user
  • This input contains an attack
  • It is not properly sanitized before it reaches
    the sensitive function
  • String analysis Discover these vulnerabilities
    automatically

13
14
Computer Trouble at School
15
SQL Injection
  • A PHP example
  • Access students data by name (from a user
    input).
  • 1lt?php
  • 2 name GETname
  • 3 user data db-gtquery(SELECT FROM
    students
  • WHERE name name)
  • 4?gt

16
SQL Injection
  • A PHP Example
  • Access students data by name (from a user
    input).
  • 1lt?php
  • 2 name GETname
  • 3 user data db-gtquery(SELECT FROM
    students
  • WHERE name Robert ) DROP TABLE students -
    -)
  • 4?gt

17
Motivation for String Analysis
  • Detect Bugs
  • CAUSED BY
  • String filtering and manipulation operations
  • IN
  • Input validation and sanitization code
  • IN
  • Web applications
  • AND
  • Repair them

17
18
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

19
What is a String?
  • Given alphabet S, a string is a finite sequence
    of alphabet symbols
  • ltc1, c2, , cngt for all i, ci is a character
    from S
  • S English a,,z, A,Z
  • S a
  • S a, b,
  • S ASCII NULL, , !, , , 0, , 9, , a, ,
    z,
  • We only consider S ASCII (can be extended)
  • Foo
  • Ldkhklj54
  • 123

S ASCII
S English
S a,b
S a
Hello Welcome good
a aba bbb ababaa aaa
a aa aaa aaaa aaaaa
20
String Manipulation Operations
  • Concatenation
  • 1 2 ? 12
  • Foo bAaR ? FoobAaR
  • Replacement
  • replace(a, A)
  • replace (2,)
  • toUpperCase

bAAR
bAaR
?
34
234
?
ABC
abC
?
21
String Filtering Operations
  • Branch conditions
  • length lt 4 ?
  • Foo
  • bAaR
  • match(/0-9/) ?
  • 234
  • a3v6
  • substring(2, 4) aR ?
  • bAaR
  • Foo

22
A Simple Example
  • Another PHP Example
  • 1lt?php
  • 2 www _GETwww
  • 3 l_otherinfo URL
  • 4 echo lttdgt . l_otherinfo . . www .
    lt/tdgt
  • 5?gt
  • The echo statement in line 4 is a sensitive
    function
  • It contains a Cross Site Scripting (XSS)
    vulnerability

ltscript ...
22
23
Is It Vulnerable?
  • A simple taint analysis can report this segment
    vulnerable using taint propagation
  • 1lt?php
  • 2 www _GETwww
  • 3 l_otherinfo URL
  • 4 echo lttdgt . l_otherinfo . .www.
    lt/tdgt
  • 5?gt
  • echo is tainted ? script is vulnerable

tainted
23
24
How to Fix it?
  • To fix the vulnerability we added a sanitization
    routine at line s
  • Taint analysis will assume that www is untainted
    and report that the segment is NOT vulnerable
  • 1lt?php
  • 2 www _GETwww
  • 3 l_otherinfo URL
  • s www ereg_replace(A-Za-z0-9
    .-_at_//,,www)
  • 4 echo lttdgt . l_otherinfo . .www.
    lt/tdgt
  • 5?gt

tainted
untainted
24
25
Is It Really Sanitized?
  • 1lt?php
  • 2 www _GETwww
  • 3 l_otherinfo URL
  • s www ereg_replace(A-Za-z0-9
    .-_at_//,,www)
  • 4 echo lttdgt . l_otherinfo . .www.
    lt/tdgt
  • 5?gt

ltscript gt
ltscript gt
25
26
Sanitization Routines can be Erroneous
  • The sanitization statement is not correct!
  • ereg_replace(A-Za-z0-9 .-_at_//,,www)
  • Removes all characters that are not in
    A-Za-z0-9 .-_at_/
  • .-_at_ denotes all characters between . and _at_
    (including lt and gt)
  • .-_at_ should be .\-_at_
  • This example is from a buggy sanitization routine
    used in MyEasyMarket-4.1 (line 218 in file
    trans.php)

26
27
String Analysis
  • String analysis determines all possible values
    that a string expression can take during any
    program execution
  • Using string analysis we can identify all
    possible input values of the sensitive functions
  • Then we can check if inputs of sensitive
    functions can contain attack strings
  • How can we characterize attack strings?
  • Use regular expressions to specify the attack
    patterns
  • Attack pattern for XSS SltscriptS

27
28
Vulnerabilities Can Be Tricky
  • Input lt!scrip!t ...gt does not match the attack
    pattern
  • but it matches the vulnerability signature and it
    can cause an attack
  • 1lt?php
  • 2 www _GETwww
  • 3 l_otherinfo URL
  • s www ereg_replace(A-Za-z0-9
    .-_at_//,,www)
  • 4 echo lttdgt . l_otherinfo . .www.
    lt/tdgt
  • 5?gt

lt!scrip!t gt
ltscript gt
28
29
String Analysis
  • If string analysis determines that the
    intersection of the attack pattern and possible
    inputs of the sensitive function is empty
  • then we can conclude that the program is secure
  • If the intersection is not empty, then we can
    again use string analysis to generate a
    vulnerability signature
  • characterizes all malicious inputs
  • Given SltscriptS as an attack pattern
  • The vulnerability signature for _GETwww is
  • SltasacaraiapatS
  • where a? A-Za-z0-9 .-_at_/

29
30
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

31
Overall Analysis Steps
Web App
Sanitizer Functions
Symbolic representation of attack strings and
vulnerability signatures
32
Categorizing Validation and Sanitization
  • There are three types of input validation and
    sanitization functions

33
ValidationSanitization Code is Complex
  • function validate()
  • ...
  • switch(type)
  • case "time"
  • var highlight true
  • var default_msg "Please enter a valid
    time."
  • time_pattern /1-9\0-50-9\s(\AMPM
    ampm?)\s/
  • time_pattern2 /1-10-2\0-50-9\s(
    \AMPMampm?)\s/
  • time_pattern3 /1-10-2\0-50-9\0
    -50-9\s(\AMPM
  • ampm?)\s/
  • time_pattern4 /1-9\0-50-9\0-50
    -9\s(\AMPM
  • ampm?)\s/
  • if (field.value ! "")
  • if (!time_pattern.test(field.value)
  • !time_pattern2.test(field.value)
  • !time_pattern3.test(field.value)
  • !time_pattern4.test(field.value))
  • error true
  • Mixed input validation
  • and sanitization for multiple
  • input fields

2) Lots of event handling and error reporting
code
34
Extraction
  • In order to analyze string analysis code, it is
    necessary to extract input validation and
    sanitization functions
  • Server-side extraction
  • PHP
  • Static analysis
  • Client-side extraction
  • JavaScript
  • Dynamic analysis

35
Extraction
_POSTemail _POSTusername
Sources
  • Static extraction using Pixy
  • Augmented to handle path conditions
  • Static dependency analysis
  • Output is a dependency graph
  • Contains all validation and
  • sanitization operations between
  • sources and sink

Sink mysql_query()
echo
36
Dynamic Extraction for Javascript
Enter email
Source
  • Run application on a number of inputs
  • Inputs are selected heuristically
  • Instrumented execution
  • HtmlUnit browser simulator
  • Rhino JS interpreter
  • Convert all accesses on objects and arrays to
    accesses on memory locations
  • Dynamic dependency tracking

Sink submit xmlhttp.send()
37
Overall Analysis Steps
Web App
Sanitizer Functions
Symbolic representation of attack strings and
vulnerability signatures
38
Automata-based String Analysis
  • Finite State Automata can be used to characterize
    sets of string values
  • Automata based string analysis
  • Associate each string expression in the program
    with an automaton
  • The automaton accepts an over approximation of
    all possible values that the string expression
    can take during program execution
  • Using this automata representation we
    symbolically execute the program, only paying
    attention to string manipulation operations

38
39
Forward Backward Analyses
  • First convert sanitizer functions to dependency
    graphs
  • Combine symbolic forward and backward symbolic
    reachability analyses
  • Forward analysis
  • Assume that the user input can be any string
  • Propagate this information on the dependency
    graph
  • When a sensitive function is reached, intersect
    with attack pattern
  • Backward analysis
  • If the intersection is not empty, propagate the
    result backwards to identify which inputs can
    cause an attack

Forward Analysis
Backward Analysis
Sanitizer functions
Vulnerability Signatures
Attack patterns
39
40
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

41
Dependency Graphs
  • Extract dependency
  • graphs from
  • sanitizer functions
  • 1lt?php
  • 2 www GETwww
  • 3 l_otherinfo URL
  • 4 www ereg_replace(
  • A-Za-z0-9 .-_at_//,,www
  • )
  • 5 echo l_otherinfo .
  • .www
  • 6?gt

_GETwww, 2
A-Za-z0-9 .-_at_//, 4
, 4
www, 2
URL, 3
, 5
l_otherinfo, 3
preg_replace, 4
str_concat, 5
www, 4
str_concat, 5
echo, 5
Dependency Graph
41
42
Forward Analysis
  • Using the dependency graph conduct vulnerability
    analysis
  • Automata-based forward symbolic analysis that
    identifies the possible values of each node
  • Each node in the dependency graph is associated
    with a DFA
  • DFA accepts an over-approximation of the strings
    values that the string expression represented by
    that node can take at runtime
  • The DFAs for the input nodes accept S
  • Intersecting the DFA for the sink nodes with the
    DFA for the attack pattern identifies the
    vulnerabilities

42
43
Forward Analysis
  • Need to implement post-image computations for
    string operations
  • postConcat(M1, M2)
  • returns M, where MM1.M2
  • postReplace(M1, M2, M3)
  • returns M, where Mreplace(M1, M2, M3)
  • Need to handle many specialized string
    operations
  • regmatch, substring, indexof, length, contains,
    trim, addslashes, htmlspecialchars,
    mysql_real_escape_string, tolower, toupper

43
44
Forward Analysis
Forward S
Attack Pattern SltS
_GETwww, 2
, 4
A-Za-z0-9 .-_at_//, 4
www, 2
URL, 3
Forward e
Forward S
Forward A-Za-z0-9 .-_at_/
Forward URL
, 5
preg_replace, 4
l_otherinfo, 3
Forward
Forward A-Za-z0-9 .-_at_/
Forward URL
str_concat, 5
www, 4
Forward URL
Forward A-Za-z0-9 .-_at_/
str_concat, 5
Forward URL A-Za-z0-9 .-_at_/
echo, 5
n
L(SltS)
L(URL A-Za-z0-9 .-_at_/)
Forward URL A-Za-z0-9 .-_at_/
L(URL A-Za-z0-9 .--_at_/ltA-Za-z0-9 .-_at_/)
? Ø
44
45
Result Automaton
U
R
L

A-Za-z0-9 .--_at_/
A-Za-z0-9 .-_at_/
Space
lt
URL A-Za-z0-9 .--_at_/ltA-Za-z0-9 .-_at_/
45
46
Symbolic Automata Representation
  • MONA DFA Package for automata manipulation
  • Klarlund and Møller, 2001
  • Compact Representation
  • Canonical form and
  • Shared BDD nodes
  • Efficient MBDD Manipulations
  • Union, Intersection, and Emptiness Checking
  • Projection and Minimization
  • Cannot Handle Nondeterminism
  • Use dummy bits to encode nondeterminism

46
47
Symbolic Automata Representation
Symbolic DFA representation
Explicit DFA representation
47
48
Widening
  • String verification problem is undecidable
  • The forward fixpoint computation is not
    guaranteed to converge in the presence of loops
    and recursion
  • Compute a sound approximation
  • During fixpoint compute an over approximation of
    the least fixpoint that corresponds to the
    reachable states
  • Use an automata based widening operation to
    over-approximate the fixpoint
  • Widening operation over-approximates the union
    operations and accelerates the convergence of the
    fixpoint computation

48
49
Widening
  • Given a loop such as
  • 1lt?php
  • 2 var head
  • 3 while (. . .)
  • 4 var var . tail
  • 5
  • 6 echo var
  • 7?gt
  • Our forward analysis with widening would compute
    that the value of the variable var in line 6 is
    (head)(tail)

49
50
Backward Analysis
  • A vulnerability signature is a characterization
    of all malicious inputs that can be used to
    generate attack strings
  • Identify vulnerability signatures using an
    automata-based backward symbolic analysis
    starting from the sink node
  • Need to implement Pre-image computations on
    string operations
  • preConcatPrefix(M, M2)
  • returns M1 and where M M1.M2
  • preConcatSuffix(M, M1)
  • returns M2, where M M1.M2
  • preReplace(M, M2, M3)
  • returns M1, where Mreplace(M1, M2, M3)

50
51
Backward Analysis
Forward S
Backward ltltS
_GETwww, 2
node 3
node 6
A-Za-z0-9 .-_at_//, 4
, 4
www, 2
URL, 3
Forward e
Forward A-Za-z0-9 .-_at_/
Forward S
Forward URL
Backward Do not care
Backward Do not care
Backward ltltS
Backward Do not care
preg_replace, 4
, 5
Vulnerability Signature ltltS
l_otherinfo, 3
Forward
Forward A-Za-z0-9 .-_at_/
Forward URL
Backward Do not care
Backward A-Za-z0-9 .--_at_/ltA-Za-z0-9
.-_at_/
Backward Do not care
node 10
str_concat, 5
www, 4
Forward A-Za-z0-9 .-_at_/
Forward URL
node 11
Backward A-Za-z0-9 .--_at_/ltA-Za-z0-9 .-_at_/
Backward Do not care
str_concat, 5
Forward URL A-Za-z0-9 .-_at_/
Backward URL A-Za-z0-9 .--_at_/ltA-Za-z0-
9 .-_at_/
node 12
echo, 5
Forward URL A-Za-z0-9 .-_at_/
Backward URL A-Za-z0-9 .--_at_/ltA-Za-z0-
9 .-_at_/
51
52
Vulnerability Signature Automaton
S
lt
lt
Non-ASCII
ltltS
52
53
Overall Analysis Steps
Web App
Sanitizer Functions
Symbolic representation of attack strings and
vulnerability signatures
54
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

55
Recap
  • Given an automata-based string analyzer,
  • Vulnerability Analysis We can do a forward
    analysis to detect all the strings that reach the
    sink and that match the attack pattern
  • We can compute an automaton that accepts all such
    strings
  • If there is any such string the application might
    be vulnerable to the type of attack specified by
    the attack pattern
  • Vulnerability Signature We can do a backward
    analysis to compute the vulnerability signature
  • Vulnerability signature is the set of all input
    strings that can generate a string value at the
    sink that matches the attack pattern
  • We can compute an automaton that accepts all such
    strings
  • What else can we do?
  • Can we automatically repair a vulnerability if we
    detect one?

56
Vulnerability Signatures
  • The vulnerability signature is the result of the
    input node, which includes all possible malicious
    inputs
  • An input that does not match this signature
    cannot exploit the vulnerability
  • After generating the vulnerability signature
  • Can we generate a patch based on the
    vulnerability signature?
  • The vulnerability signature automaton for
    the running example

lt
S
lt
56
57
Patches from Vulnerability Signatures
  • Main idea
  • Given a vulnerability signature automaton, find a
    cut that separates initial and accepting states
  • Remove the characters in the cut from the user
    input to sanitize
  • This means, that if we just delete lt from the
    user input, then the vulnerability can be removed

lt
S
lt
min-cut is lt
57
58
Patches from Vulnerability Signatures
  • Ideally, we want to modify the input (as little
    as possible) so that it does not match the
    vulnerability signature
  • Given a DFA, an alphabet cut is
  • a set of characters that after removing the
    edges that are associated with the characters in
    the set, the modified DFA does not accept any
    non-empty string
  • Finding a minimal alphabet cut of a DFA is an
    NP-hard problem (one can reduce the vertex cover
    problem to this problem)
  • We use a min-cut algorithm instead
  • The set of characters that are associated with
    the edges of the min cut is an alphabet cut
  • but not necessarily the minimum alphabet cut

58
59
Automatically Generated Patch
  • Automatically generated patch will make sure that
    no string that matches the attack pattern reaches
    the sensitive function
  • lt?php
  • if (preg match(/ ltlt./, GETwww))
  • GETwww preg replace(lt,,
    GETwww)
  • www _GETwww
  • l_otherinfo URL
  • www ereg_replace(A-Za-z0-9
    .-_at_//,,www)
  • echo lttdgt . l_otherinfo . .www.
    lt/tdgt
  • ?gt

59
60
Experiments
  • We evaluated our approach on five vulnerabilities
    from three open source web applications
  • MyEasyMarket-4.1 A shopping cart program
  • (2) BloggIT-1.0 A blog engine
  • (3) proManager-0.72 A project management system
  • We used the following XSS attack pattern
  • SltscriptS

60
61
Forward Analysis Results
  • The dependency graphs of these benchmarks are
    simplified based on the sinks
  • Unrelated parts are removed using slicing

Input Input Input Input Results Results Results
nodes edges sinks inputs Time(s) Mem (kb) states/bdds
21 20 1 1 0.08 2599 23/219
29 29 1 1 0.53 13633 48/495
25 25 1 2 0.12 1955 125/1200
23 22 1 1 0.12 4022 133/1222
25 25 1 1 0.12 3387 125/1200
61
62
Backward Analysis Results
  • We use the backward analysis to generate the
    vulnerability signatures
  • Backward analysis starts from the vulnerable
    sinks identified during forward analysis

Input Input Input Input Results Results Results
nodes edges sinks inputs Time(s) Mem (kb) states/bdds
21 20 1 1 0.46 2963 9/199
29 29 1 1 41.03 1859767 811/8389
25 25 1 2 2.35 5673 20/302, 20/302
23 22 1 1 2.33 32035 91/1127
25 25 1 1 5.02 14958 20/302
62
63
Alphabet Cuts
  • We generate alphabet cuts from the vulnerability
    signatures using a min-cut algorithm
  • Problem When there are two user inputs the patch
    will block everything and delete everything
  • Overlooks the relations among input variables
    (e.g., the concatenation of two inputs contains lt
    SCRIPT)

Input Input Input Input Results
nodes edges sinks inputs Alphabet Cut
21 20 1 1 lt
29 29 1 1 S,,
25 25 1 2 S , S
23 22 1 1 lt,,
25 25 1 1 lt,,
Vulnerability signature depends on two inputs
63
64
Relational String Analysis
  • Instead of using multiple single-track DFAs use
    one multi-track DFA
  • Each track represents the values of one string
    variable
  • Using multi-track DFAs
  • Identifies the relations among string variables
  • Generates relational vulnerability signatures for
    multiple user inputs of a vulnerable application
  • Improves the precision of the path-sensitive
    analysis
  • Proves properties that depend on relations among
    string variables, e.g., file usr.txt

64
65
Multi-track Automata
  • Let X (the first track), Y (the second track), be
    two string variables
  • ? is a padding symbol
  • A multi-track automaton that encodes X Y.txt

(t,?)
(x,?)
(t,?)
(a,a), (b,b)
65
66
Relational Vulnerability Signature
  • We perform forward analysis using multi-track
    automata to generate relational vulnerability
    signatures
  • Each track represents one user input
  • An auxiliary track represents the values of the
    current node
  • We intersect the auxiliary track with the attack
    pattern upon termination

66
67
Relational Vulnerability Signature
  • Consider a simple example having multiple user
    inputs
  • lt?php
  • 1 www _GETwww
  • 2 url _GETurl
  • 3 echo url. www
  • ?gt
  • Let the attack pattern be S lt S

67
68
Relational Vulnerability Signature
  • A multi-track automaton (url, www, aux)
  • Identifies the fact that the concatenation of two
    inputs contains lt

(a,?,a), (b,?,b),
(a,?,a), (b,?,b),
(lt,?,lt)
(?,a,a), (?,b,b),
(?,a,a), (?,b,b),
(?,lt,lt)
(?,lt,lt)
(?,a,a), (?,b,b),
(?,a,a), (?,b,b),
68
69
Relational Vulnerability Signature
  • Project away the auxiliary variable
  • Find the min-cut
  • This min-cut identifies the alphabet cuts lt for
    the first track (url) and lt for the second
    track (www)

(a,?), (b,?),
(a,?), (b,?),
(lt,?)
(?,a), (?,b),
(?,a), (?,b),
(?,lt)
(?,lt)
(?,a), (?,b),
(?,a), (?,b),
69
min-cut is lt,lt
70
Patch for Multiple Inputs
  • Patch If the inputs match the signature, delete
    its alphabet cut
  • lt?php
  • if (preg match(/ ltlt./, GETurl.
    GETwww))
  • GETurl preg replace(lt,,
    GETurl)
  • GETwww preg replace(lt,,
    GETwww)
  • 1 www GETwww
  • 2 url GETurl
  • 3 echo url. www
  • ?gt

70
71
Conservative Approximations
  • To conduct relational string analysis, we need to
    compute intersection of multi-track automata
  • Intersection is closed under aligned multi-track
    automata
  • ?s are right justified in all tracks, e.g., ab??
    instead of a?b?
  • However, there exist unaligned multi-track
    automata that can not be described by aligned
    ones
  • We propose an alignment algorithm that constructs
    aligned automata which over or under approximate
    unaligned ones

71
72
Conservative Approximations
  • Modeling Word Equations
  • Intractability of X cZ
  • The number of states of the corresponding aligned
    multi-track DFA is exponential to the length of
    c.
  • Irregularity of X YZ
  • X YZ is not describable by an aligned
    multi-track automata
  • We propose a conservative analysis
  • We construct multi-track automata that over or
    under-approximate the word equations

72
73
Composite Analysis
  • What I have talked about so far focuses only on
    string contents
  • It does not handle constraints on string lengths
  • It cannot handle comparisons among integer
    variables and string lengths
  • We extended our string analysis techniques to
    analyze systems that have unbounded string and
    integer variables
  • We proposed a composite static analysis approach
    that combines string analysis and size analysis

73
74
Size Analysis
  • Size Analysis The goal of size analysis is to
    provide properties about string lengths
  • It can be used to discover buffer overflow
    vulnerabilities
  • Integer Analysis At each program point,
    statically compute the possible states of the
    values of all integer variables.
  • These infinite states are symbolically
    over-approximated as linear arithmetic
    constraints that can be represented as an
    arithmetic automaton
  • Integer analysis can be used to perform size
    analysis by representing lengths of string
    variables as integer variables.

74
75
An Example
  • Consider the following segment
  • 1 lt?php
  • 2 www GETwww
  • 3 l otherinfo URL
  • 4 www ereg replace(A-Za-z0-9
    ./-_at_//,,www)
  • 5 if(strlen(www) lt limit)
  • 6 echo lttdgt . l otherinfo . . www .
    lt/tdgt
  • 7?gt
  • If we perform size analysis only, after line 4,
    we do not know the length of www
  • If we perform string analysis only, at line 5, we
    cannot check/enforce the branch condition.

75
76
Composite Analysis
  • We need a composite analysis that combines string
    analysis with size analysis.
  • Challenge How to transfer information between
    string automata and arithmetic automata?
  • A string automaton is a single-track DFA that
    accepts a regular language, whose length forms a
    semi-linear set
  • For example 4, 6 ? 2 3k k 0
  • The unary encoding of a semi-linear set is
    uniquely identified by a unary automaton
  • The unary automaton can be constructed by
    replacing the alphabet of a string automaton with
    a unary alphabet

76
77
Arithmetic Automata
  • An arithmetic automaton is a multi-track DFA,
    where each track represents the value of one
    variable over a binary alphabet
  • If the language of an arithmetic automaton
    satisfies a Presburger formula, the value of each
    variable forms a semi-linear set
  • The semi-linear set is accepted by the binary
    automaton that projects away all other tracks
    from the arithmetic automaton

77
78
Connecting the Dots
  • We developed novel algorithms to convert unary
    automata to binary automata and vice versa
  • Using these conversion algorithms we can conduct
    a composite analysis that subsumes size analysis
    and string analysis

String Automata
Unary Length Automata
Binary Length Automata
Arithmetic Automata
78
79
Case Study
  • Schoolmate 1.5.4
  • Number of PHP files 63
  • Lines of code 8181
  • Forward Analysis results
  • After manual inspection we found the following

Time Memory Number of XSS sensitive sinks Number of XSS Vulnerabilities
22 minutes 281 MB 898 153
Actual Vulnerabilities False Positives
105 48
79
80
Case Study False Positives
  • Why false positives?
  • Path insensitivity 39
  • Path to vulnerable program point is not feasible
  • We extended our approach with path sensitivity,
    so this issue is resolved
  • Un-modeled built in PHP functions 6
  • Unfound user written functions 3
  • PHP programs have more than one execution entry
    point
  • We can remove all these false positives by
    extending our analysis to a path sensitive
    analysis and modeling more PHP functions

80
81
Case Study - Sanitization
  • We patched all actual vulnerabilities by adding
    sanitization routines
  • We ran stranger the second time
  • Stranger proved that our patches are correct with
    respect to the attack pattern we are using

81
82
Client-side Analysis for Input Validation
  • Client-side input validation analysis
  • String analysis for JavaScript
  • Dynamic slicing to extract the validation code
  • Use regular expressions for specification of
  • input validation policy max and min policies

Policy (regular expression)
Confirms to Policy ?
Yes
No
Sanitizer Function
Counter Example
Generate Patch
83
Min Max Policies
S
Under Constrained
Max Policy
Over Constrained
Min Policy
84
Differential String Analysis
unsupscribe.php
Submit
DB
85
Why Differential Analysis? Verification without
Specification
Server-side
Client-side
86
Differential Analysis Overview
  • Analyze and compare client- and
  • server-side input validation functions
  • General security policy
  • Server-side should always be stronger
  • than client-side input validation
  • Semantic differential repair

Target Sanitizer
Reference Sanitizer
 
Generate Patch
No
Yes
87
Stranger LibStranger String Analysis Toolset
Available at https//github.com/vlab-cs-ucsb
  • Uses Pixy Jovanovic et al., 2006 as a PHP front
    end
  • Uses MONA Klarlund and Møller, 2001 automata
    package for automata manipulation

Attack patterns
Symbolic String Analysis
Pixy Front End
String/Automata Operations
Parser
String Analyzer
LibStranger Automata Based String Analysis
Library
Dependency Graphs
Stranger Automata
PHP program
CFG
DFAs
Dependency Analyzer
MONA Automata Package
Vulnerability Signatures Patches
87
88
SemRep A Differential Repair Tool
  • Available at https//github.com/vlab-cs-ucsb
  • A recent paper Kausler, Sherman, ASE14 that
    compares sound string constraint solvers (JSA,
    LibStranger, Z3-Str, ECLIPSE-Str), reports that
    LibStranger is the best!

89
What will you learn in this tutorial?
  • Why is string analysis necessary? What is the
    motivation?
  • What does a string analyzer do? What does it
    compute?
  • What are the steps in building a string analyzer?
  • How can I implement those steps if I wanted to
    build an automata based string analyzer?
  • What can I do with an automata based string
    analyzer?
  • Are there other types of string analyzers?

90
String Analysis Bibliography
  • Automata based string analysis
  • A static analysis framework for detecting SQL
    injection vulnerabilities Fu et al., COMPSAC07
  • Saner Composing Static and Dynamic Analysis to
    Validate Sanitization in Web Applications
    Balzarotti et al., SP 2008
  • Symbolic String Verification An Automata-based
    Approach Yu et al., SPIN08
  • Symbolic String Verification Combining String
    Analysis and Size Analysis Yu et al., TACAS09
  • Rex Symbolic Regular Expression Explorer Veanes
    et al., ICST10
  • Stranger An Automata-based String Analysis Tool
    for PHP Yu et al., TACAS10
  • Relational String Verification Using Multi-Track
    Automata Yu et al., CIAA10, IJFCS11
  • Path- and index-sensitive string analysis based
    on monadic second-order logic Tateishi et al.,
    ISSTA11

90
91
String Analysis Bibliography
  • Automata based string analysis, continued
  • An Evaluation of Automata Algorithms for String
    Analysis Hooimeijer et al., VMCAI11
  • Fast and Precise Sanitizer Analysis with BEK
    Hooimeijer et al., Usenix11
  • Symbolic finite state transducers algorithms and
    applications Veanes et al., POPL12
  • Static Analysis of String Encoders and Decoders
    Dantoni et al. VMCAI13
  • Applications of Symbolic Finite Automata.
    Veanes, CIAA13
  • Automata-Based Symbolic String Analysis for
    Vulnerability Detection Yu et al., FMSD14

91
92
String Analysis Bibliography
  • String analysis based on context free grammars
  • Precise Analysis of String Expressions
    Christensen et al., SAS03
  • Java String Analyzer (JSA) Moller et al.
  • Static approximation of dynamically generated Web
    pages Minamide, WWW05
  • PHP String Analyzer Minamide
  • Grammar-based analysis string expressions
    Thiemann, TLDI05

92
93
String Analysis Bibliography
  • String analysis based on symbolic
    execution/symbolic analysis
  • Abstracting symbolic execution with string
    analysis Shannon et al., MUTATION07
  • Path Feasibility Analysis for String-Manipulating
    Programs Bjorner et al., TACAS09
  • A Symbolic Execution Framework for JavaScript
    Saxena et al., SP 2010
  • Symbolic execution of programs with strings
    Redelinghuys et al., ITC21

93
94
String Analysis Bibliography
  • String analysis and abstraction/widening
  • A Practical String Analyzer by the Widening
    Approach Choi et al. APLAS06
  • String Abstractions for String Verification Yu
    et al., SPIN11
  • A Suite of Abstract Domains for Static Analysis
    of String Values Constantini et al., SPE13

94
95
String Analysis Bibliography
  • String constraint solving
  • Reasoning about Strings in Databases Grahne at
    al., JCSS99
  • Constraint Reasoning over Strings Golden et al.,
    CP03
  • A decision procedure for subset constraints over
    regular languages Hooimeijer et al., PLDI09
  • Strsolve solving string constraints lazily
    Hooimeijer et al., ASE10, ASE12
  • An SMT-LIB Format for Sequences and Regular
    Expressions Bjorner et al., SMT12
  • Z3-Str A Z3-Based String Solver for Web
    Application Analysis Zheng et al., ESEC/FSE13
  • Word Equations with Length Constraints What's
    Decidable? Ganesh et al., HVC12
  • (Un)Decidability Results for Word Equations with
    Length and Regular Expression Constraints Ganesh
    et al., ADDCT13

95
96
String Analysis Bibliography
  • String constraint solving, continued
  • A DPLL(T) Theory Solver for a Theory of Strings
    and Regular Expressions Liang et al., CAV14
  • String Constraints for Verification Abdulla et
    al., CAV14
  • S3 A Symbolic String Solver for Vulnerability
    Detection in Web Applications Trinh et al.,
    CCS14
  • A model counter for constraints over unbounded
    strings Luu et al., PLDI14
  • Evaluation of String Constraint Solvers in the
    Context of Symbolic Execution Kausler et al.,
    ASE14

96
97
String Analysis Bibliography
  • Bounded string constraint solvers
  • HAMPI a solver for string constraints Kiezun et
    al., ISSTA09
  • HAMPI A String Solver for Testing, Analysis and
    Vulnerability Detection Ganesh et al., CAV11
  • HAMPI A solver for word equations over strings,
    regular expressions, and context-free grammars
    Kiezun et al., TOSEM12
  • Kaluza Saxena et al.
  • PASS String Solving with Parameterized Array and
    Interval Automaton Li Ghosh, HVC14

97
98
String Analysis Bibliography
  • String analysis for vulnerability detection
  • AMNESIA analysis and monitoring for NEutralizing
    SQL-injection attacks Halfond et al., ASE05
  • Preventing SQL injection attacks using AMNESIA.
    Halfond et al., ICSE06
  • Sound and precise analysis of web applications
    for injection vulnerabilities Wassermann et al.,
    PLDI07
  • Static detection of cross-site scripting
    vulnerabilities Su et al., ICSE08
  • Generating Vulnerability Signatures for String
    Manipulating Programs Using Automata-based
    Forward and Backward Symbolic Analyses Yu et
    al., ASE09
  • Verifying Client-Side Input Validation Functions
    Using String Analysis Alkhalaf et al., ICSE12

98
99
String Analysis Bibliography
  • String Analysis for Test Generation
  • Dynamic test input generation for database
    applications Emmi et al., ISSTA07
  • Dynamic test input generation for web
    applications. Wassermann et al., ISSTA08
  • JST an automatic test generation tool for
    industrial Java applications with strings Ghosh
    et al., ICSE13
  • Automated Test Generation from Vulnerability
    Signatures Aydin et al., ICST14

99
100
String Analysis Bibliography
  • String Analysis for Interface Discovery
  • Improving Test Case Generation for Web
    Applications Using Automated Interface Discovery
    Halfond et al. FSE07
  • Automated Identification of Parameter Mismatches
    in Web Applications Halfond et al. FSE08
  • String Analysis for Specification Analysis
  • Lightweight String Reasoning for OCL Buttner et
    al., ECMFA12
  • Lightweight String Reasoning in Model Finding
    Buttner et al., SSM13

100
101
String Analysis Bibliography
  • String Analysis for Program Repair
  • Patching Vulnerabilities with Sanitization
    Synthesis Yu et al., ICSE11
  • Automated Repair of HTML Generation Errors in PHP
    Applications Using String Constraint Solving
    Samimi et al., 2012
  • Patcher An Online Service for Detecting, Viewing
    and Patching Web Application Vulnerabilities Yu
    et al., HICSS14
  • Differential String Analysis
  • Automatic Blackbox Detection of Parameter
    Tampering Opportunities in Web Applications
    Bisht et al., CCS10
  • Waptec Whitebox Analysis of Web Applications for
    Parameter Tampering Exploit Construction. Bisht
    et al., CCS11
  • ViewPoints Differential String Analysis for
    Discovering Client and Server-Side Input
    Validation Inconsistencies Alkhalaf et al.,
    ISSTA12
  • Semantic Differential Repair for Input Validation
    and Sanitization Alkhalaf et al. ISSTA14

101
102
Coming Soon
  • A book on String Analysis!
About PowerShow.com