Program%20Analysis%20via%20Graph%20Reachability - PowerPoint PPT Presentation

About This Presentation
Title:

Program%20Analysis%20via%20Graph%20Reachability

Description:

context-sensitive structure-transmitted data-dependence analysis ... Year 2000 Problem. Automatic Differentiation. What Are Slices Useful For? Understanding Programs ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 70
Provided by: thoma55
Category:

less

Transcript and Presenter's Notes

Title: Program%20Analysis%20via%20Graph%20Reachability


1
Program Analysis via Graph Reachability
  • Thomas Reps
  • University of Wisconsin

Joint work with S. Horwitz, M. Sagiv, G. Rosay,
and D. Melski
2
1987
1993
1994
1995
1996
1997
1998
3
More Recently
  • Flow-insensitive points-to analysis
  • An undecidability result
  • context-sensitive structure-transmitted
    data-dependence analysis
  • Model checking of recursive hierarchical
    finite-state machines
  • infinite-state systems
  • CFL-reachability/circularity queries
  • linear-, quadratic-, and cubic-time algorithms

4
Other Applications of CFL-Reachability
  • Analysis of attribute grammars
  • CFL-recognition
  • ? ? L(G)?
  • 2DPDA- and 2NPDA-recognition
  • ? ? L(M)?
  • String-matching problems
  • Ping-pong protocols in distributed systems
    Dolev, Even, Karp 83

5
Outline
  • Interprocedural slicing
  • Interprocedural dataflow analysis
  • Demand-driven analysis
  • (Model-checking of recursive HFSMs)

6
Program Slicing
  • The backward slice w.r.t variable v at program
    point p The program subset that may influence
    the value of
  • variable v at point p.
  • The forward slice w.r.t variable v at program
    point p
  • The program subset that may be influenced by
  • the value of variable v at point p.

7
Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
8
Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Backward slice with respect to statement
printf(d\n,i)
9
Forward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
10
Forward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Forward slice with respect to statement sum 0
11
Who Cares About Slices?
  • Understanding programs
  • Restructuring Programs
  • Program Specialization and Reuse
  • Program Differencing
  • Testing (and Retesting)
  • Year 2000 Problem
  • Automatic Differentiation

12
What Are Slices Useful For?
  • Understanding Programs
  • What is affected by what?
  • Restructuring Programs
  • Isolation of separate computational threads
  • Program Specialization and Reuse
  • Slices specialized programs
  • Only reuse needed slices
  • Program Differencing
  • Compare slices to identify changes
  • Testing
  • What new test cases would improve coverage?
  • What regression tests must be rerun after a
    change?

13
Line-Character-Count Program
void line_char_count(FILE f) int lines
0 int chars BOOL eof_flag FALSE int
n extern void scan_line(FILE f, BOOL bptr,
int iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
14
Character-Count Program
void char_count(FILE f) int lines 0 int
chars BOOL eof_flag FALSE int n extern
void scan_line(FILE f, BOOL bptr, int
iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
15
Line-Character-Count Program
void line_char_count(FILE f) int lines
0 int chars BOOL eof_flag FALSE int
n extern void scan_line(FILE f, BOOL bptr,
int iptr) scan_line(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line(f, eof_flag, n) chars chars
n printf(lines d\n,
lines) printf(chars d\n, chars)
16
Line-Count Program
void line_count(FILE f) int lines 0 int
chars BOOL eof_flag FALSE int n extern
void scan_line2(FILE f, BOOL bptr, int
iptr) scan_line2(f, eof_flag, n) chars
n while(eof_flag FALSE) lines lines
1 scan_line2(f, eof_flag, n) chars
chars n printf(lines d\n,
lines) printf(chars d\n, chars)
17
How are Slices Computed?
  • Reachability in a Dependence Graph
  • Program Dependence Graph (PDG)
  • Dependences within one procedure
  • Intraprocedural slicing is reachability in one
    PDG
  • System Dependence Graph (SDG)
  • Dependences within entire system
  • Interprocedural slicing is reachability in the SDG

18
How is a PDG Created?
  • Control Flow Graph (CFG)
  • PDG is union of
  • Control Dependence Graph
  • Flow Dependence Graph
  • computed from CFG

19
Control Flow Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
F
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
sum sum i
i i i
20
Control Dependence Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Control dependence
q is reached from p if condition p is true (T),
not otherwise.
p
q
T
Similar for false (F).
p
q
F
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
21
Flow Dependence Graph
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Flow dependence
Value of variable assigned at p may be used at q.
p
q
Enter
i 1
sum 0
printf(sum)
printf(i)
while(i lt 11)
sum sum i
i i i
22
Program Dependence Graph (PDG)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Control dependence
Flow dependence
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
23
Program Dependence Graph (PDG)
int main() int i 1 int sum 0 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
24
Backward Slice
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
25
Backward Slice (2)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
26
Backward Slice (3)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
27
Backward Slice (4)
int main() int sum 0 int i 1 while (i
lt 11) sum sum i i i
1 printf(d\n,sum) printf(d\n,i)
Enter
T
T
T
T
T
T
sum 0
i 1
printf(sum)
printf(i)
while(i lt 11)
T
T
sum sum i
i i i
28
Slice Extraction
int main() int i 1 while (i lt 11)
i i 1 printf(d\n,i)
Enter
T
T
T
T
i 1
printf(i)
while(i lt 11)
T
i i i
29
CodeSurfer
30
(No Transcript)
31
CodeSurfer
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
37
Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Backward slice with respect to statement
printf(d\n,i)
38
Interprocedural Slice
int main() int sum 0 int i 1 while (i
lt 11) sum add(sum,i) i
add(i,1) printf(d\n,sum) printf(d\n,i
)
int add(int x, int y) return x y
Superfluous components included by Weisers
slicing algorithm TSE 84 Left out by algorithm
of Horwitz, Reps, Binkley PLDI 88 TOPLAS 90
39
How is an SDG Created?
  • Each PDG has nodes for
  • entry point
  • procedure parameters and function result
  • Each call site has nodes for
  • call
  • arguments and function result
  • Appropriate edges
  • entry node to parameters
  • call node to arguments
  • call node to entry node
  • arguments to parameters

40
System Dependence Graph (SDG)
Enter main
Call p
Call p
Enter p
41
SDG for the Sum Program
Enter main
while(i lt 11)
sum 0
i 1
printf(sum)
printf(i)
Call add
Call add
yin i
xin sum
sum xout
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
42
Interprocedural Backward Slice
Enter main
Call p
Call p
Enter p
43
Interprocedural Backward Slice (2)
Enter main
Call p
Call p
Enter p
44
Interprocedural Backward Slice (3)
Enter main
Call p
Call p
Enter p
45
Interprocedural Backward Slice (4)
Enter main
Call p
Call p
Enter p
46
Interprocedural Backward Slice (5)
Enter main
Call p
Call p
Enter p
47
Interprocedural Backward Slice (6)
Enter main
Call p
Call p
Enter p
48
Matched-Parenthesis Path
49
Interprocedural Backward Slice (6)
Enter main
Call p
Call p
Enter p
50
Interprocedural Backward Slice (7)
Enter main
Call p
Call p
Enter p
51
Slice Extraction
Enter main
Call p
Enter p
52
Slice of the Sum Program
Enter main
while(i lt 11)
i 1
printf(i)
Call add
xin i
yin 1
i xout
Enter add
x xin
y yin
x x y
xout x
53
CFL-ReachabilityYannakakis 90
  • G Graph
  • L A context-free language
  • L-path from s to t iff
  • Running time O(N 3)

54
(No Transcript)
55
Degenerate Case CFL-Recognition
56
CFL-Reachability via Dynamic Programming
Graph
Grammar
B
C
57
Interprocedural Slicingvia CFL-Reachability
  • Graph System dependence graph
  • L L(matched)
  • Node m is in the slice w.r.t. n iff there is an
    L(matched)-path from m to n

58
Asymptotic Running Time Reps, Horwitz, Sagiv,
Rosay 94
  • CFL-reachability
  • System dependence graph N nodes, E edges
  • Running time O(N 3)
  • System dependence graph Special structure

Running time O(E CallSites MaxParams3)
59
pointer analysis? points-to analysis? shape
analysis? alias analysis?
60
Cross-Cutting Issues
  • Context-sensitive/context-insensitive analysis
  • interprocedural slicing
  • interprocedural dataflow analysis
  • Pointers and heap-allocated storage
  • Flow-sensitive/flow-insensitive analysis
  • Andersens points-to analysis
  • Scalability

61
The Need for Pointer Analysis
int main() int sum 0 int i 1 int p
sum int q i int (f)(int,int)
add while (q lt 11) p (f)(p,q)
q (f)(q,1) printf(d\n,p)
printf(d\n,q)
int add(int x, int y) return x y
62
The Need for Pointer Analysis
int main() int sum 0 int i 1 int p
sum int q i int (f)(int,int)
add while (i lt 11) sum add(sum,i)
i add(i,1) printf(d\n,sum)
printf(d\n,i)
int add(int x, int y) return x y
63
Flow-Sensitive Points-To Analysis
p q
p q
p q
p q
64
Flow-Sensitive ? Flow-Insensitive
65
Flow-Insensitive Points-To AnalysisAndersen 94,
Shapiro Horwitz 97
p q
p q
p q
p q
66
Flow-Insensitive Points-To Analysis
a
a e b a c f b c d a
e
b
c
f
d
67
Flow-Insensitive Points-To Analysis
  • Andersen Thesis 94
  • Formulated using set constraints
  • Cubic-time algorithm
  • Shapiro Horwitz (1995 POPL 97)
  • Re-formulated as a graph-grammar problem
  • Reps (1995 unpublished)
  • Re-formulated as a Horn-clause program
  • Melski (1996 see Reps, IST98)
  • Re-formulated via CFL-reachability

68
CFL-Reachability via Dynamic Programming
Graph
Grammar
B
C
69
CFL-Reachability Chain Programs
Graph
Grammar
B
C
a(X,Z) - b(X,Y), c(Y,Z).
70
Base Facts for Points-To Analysis
p q
assignAddr(p,q).
p q
assign(p,q).
p q
assignStar(p,q).
p q
starAssign(p,q).
71
Rules for Points-To Analysis (I)
pointsTo(P,Q) - assignAddr(P,Q).
pointsTo(P,R) - assign(P,Q), pointsTo(Q,R).
72
Rules for Points-To Analysis (II)
pointsTo(P,S) - assignStar(P,Q),pointsTo(Q,R),poi
ntsTo(R,S).
pointsTo(R,S) - starAssign(P,Q),pointsTo(P,R),poi
ntsTo(Q,S).
73
Rules for Points-To Analysis (II)
pointsTo(P,S) - assignStar(P,Q),pointsTo(Q,R),poi
ntsTo(R,S).
pointsTo(R,S) - starAssign(P,Q),pointsTo(P,R),poi
ntsTo(Q,S).
pointsTo(R,S) - pointsTo(P,R),starAssign(P,Q),poi
ntsTo(Q,S).
74
Creating a Chain Program
pointsTo(R,S) - starAssign(P,Q),pointsTo(P,R),poi
ntsTo(Q,S).
pointsTo(R,S) - pointsTo(P,R),starAssign(P,Q),poi
ntsTo(Q,S).
75
Base Facts for Points-To Analysis
p q
assignAddr(p,q).
p q
assign(p,q).
p q
assignStar(p,q).
p q
starAssign(p,q).
76
Creating a Chain Program
pointsTo(P,Q) - assignAddr(P,Q).
pointsTo(P,R) - assign(P,Q), pointsTo(Q,R).
pointsTo(P,S) - assignStar(P,Q),pointsTo(Q,R),poi
ntsTo(R,S).
77
. . . and now to CFL-Reachability
78
Points-To Analysis as CFL-Reachability
Consequences
  • Points-to analysis solvable in time cubic in the
    number of variables
  • Known previously Andersen 94
  • Demand algorithms
  • What does variable p point to?
  • Issue query ?- pointsTo(p, Q).
  • Solve single-source L(pointsTo)-reachability
    problem
  • What variables point to q?
  • Issue query ?- pointsTo(P, q).
  • Solve single-target L(pointsTo)-reachability
    problem

79
1987
1993
1994
1995
1996
1997
1998
80
Structure-Transmitted Dependences Reps1995
McCarthys equations car(cons(x,y)) x
cdr(cons(x,y)) y
w cons(x,y) v car(w)
81
Set Constraints
Semantics of Set Constraints
82
CFL-ReachabilityversusSet Constraints
  • Lazy languages CFL-reachability is more natural
  • car(cons(X,Y)) X
  • Strict languages Set constraints are more
    natural
  • car(cons(X,Y)) X, provided I(Y) g v
  • But . . . SC and CFL-reachability are equivalent!
  • Melski Reps 97

83
Solving Set Constraints
84
Simulating Inhabited
W
85
Simulating Inhabited
86
Simulating Provided I(Y) g v
87
SC CFL-Reachability Consequences
  • Demand algorithm for SC
  • SC is log-space complete for PTIME
  • Limitations on ability to parallelize algorithms
    for solving set-constraint problems

88
Outline
  • Interprocedural slicing
  • Interprocedural dataflow analysis
  • Demand-driven analysis
  • (Model-checking of recursive HFSMs)

89
1987
1993
1994
1995
1996
1997
1998
90
Dataflow Analysis
  • Goal For each point in the program, determine a
    superset of the facts that could possibly hold
    during execution
  • Examples
  • Constant propagation
  • Reaching definitions
  • Live variables
  • Possibly uninitialized variables

91
Useful For . . .
  • Optimizing compilers
  • Parallelizing compilers
  • Tools that detect possible logical errors
  • Tools that show the effects of a proposed
    modification

92
Possibly Uninitialized Variables

w,x,y
w,y
w,y
w,y
w
w,y

93
Precise Intraprocedural Analysis
C
94
if . . .
95
Precise Interprocedural Analysis
ret
C
n
start
Sharir Pnueli 81
96
Representing Dataflow Functions
Identity Function
a
b
c
Constant Function
97
Representing Dataflow Functions
a
b
c
Gen/Kill Function
a
b
c
Non-Gen/Kill Function
98
if . . .
99
Composing Dataflow Functions
100
x
y
a
b
if . . .
101
Interprocedural Dataflow Analysisvia
CFL-Reachability
  • Graph Exploded control-flow graph
  • L L(matched)
  • Fact d holds at n iff there is an L(matched)-path
    from

102
Asymptotic Running Time Reps, Horwitz, Sagiv
95
  • CFL-reachability
  • Exploded control-flow graph ND nodes
  • Running time O(N3D3)
  • Exploded control-flow graph Special
    structure

Running time O(ED3)
Typically E l N
Gen/kill problems O(ED)
103
Why Bother?Were only interested in
million-line programs
  • Know thy enemy!
  • Any algorithm must do these operations
  • Avoid pitfalls (e.g., claiming quadratic-time
    algorithm)
  • Special cases
  • Gen/kill problems O(ED)
  • Compression techniques
  • Basic blocks
  • SSA form
  • Demand algorithms

104
Outline
  • Interprocedural slicing
  • Interprocedural dataflow analysis
  • Demand-driven analysis
  • (Model-checking of recursive HFSMs)

105
Exhaustive Versus Demand Analysis
  • Exhaustive analysis All facts at all points
  • Optimization Concentrate on inner loops
  • Program-understanding tools Only some facts are
    of interest
  • Demand analysis
  • Does a given fact hold at a given point?
  • Which facts hold at a given point?
  • At which points does a given fact hold?

106
Exhaustive Versus Demand Analysis
  • Exhaustive analysis All facts at all points
  • Optimization Concentrate on inner loops
  • Program-understanding tools Only some facts are
    of interest

107
Demand Analysis and LP Queries (I)
  • Flow-insensitive analysis
  • Does variable x point to y?
  • ?- pointsTo(x, y).
  • What does variable x point to?
  • ?- pointsTo(x, Y).
  • What variables point to y?
  • ?- pointsTo(X, y).

108
Demand Analysis and LP Queries (II)
  • Flow-sensitive analysis
  • Does a given fact f hold at a given point p?
  • ?- dfFact(p, f).
  • Which facts hold at a given point p?
  • ?- dfFact(p, F).
  • At which points does a given fact f hold?
  • ?- dfFact(P, f).
  • E.g., flow-sensitive points-to analysis
  • ?- dfFact(p, pointsTo(x, Y)).
  • ?- dfFact(P, pointsTo(x, y)).
  • etc.

109
if . . .
110
Experimental ResultsHorwitz , Reps, Sagiv
1995
  • 53 C programs (200-6,700 lines)
  • For a single fact of interest
  • Demand algorithm always better than exhaustive
    algorithm
  • All appropriate demands beats exhaustive when
    percentage of yes answers is high
  • Live variables
  • Truly live variables
  • Constant predicates
  • . . .

111
Path Problems
  • Static analysis
  • context-free reachability
  • Path profiling
  • path counting
  • Model checking
  • reachability
  • cyclicity
  • Testing
  • identifying non-executable paths

112
Outline
  • Interprocedural slicing
  • Interprocedural dataflow analysis
  • Demand-driven analysis
  • (Model-checking of recursive HFSMs)

113
Model-Checking of Recursive HFSMs Benedikt,
Godefroid, Reps (in prep.)
  • Non-recursive HFSMs Alur Yannakakis 98
  • Ordinary FSMs
  • T-reachability/circularity queries
  • Recursive HFSMs
  • Matched-parenthesis T-reachability/circularity
  • Key observation Linear-time algorithms for
    matched-parenthesis T-reachability/cyclicity
  • Single-entry/multi-exit or multi-entry/single-exi
    t
  • Deterministic, multi-entry/multi-exit

114
Recursive HFSMs Data Complexity
115
Recursive HFSMs Data Complexity
116
But . . . ?
  • Model checking
  • Huge graphs (10100 reachable states)
  • Reachability/circularity queries
  • Represent implicitly (OBDDs)
  • Dataflow analysis
  • Large graphs
  • e.g., Stmts ?Vars (? 1011)
  • CFL-reachability queries Reps,Horwitz,Sagiv 95
  • OBDDs blew up Siff Reps 95 (unpub.)
  • . . . yes, we tried the usual tricks . . .

117
CFL-Reachability Scope of Applicability
  • Static analysis
  • Slicing, DFA, structure-transmitted dep.,
    points-to analysis
  • Formal-language theory
  • CF-, 2DPDA-, 2NPDA-recognition
  • Attribute-grammar analysis
  • Verification
  • Model-checking recursive HFSMs
  • Ping-pong protocols Dolev, Even, Karp 83

118
CFL-Reachability Benefits
  • Algorithms
  • Demand exhaustive
  • Complexity
  • Linear-, quadratic-, cubic-time algorithms
  • PTIME-completeness
  • Variants that are undecidable
  • Complementary to
  • Equations
  • Set constraints
  • Types
  • . . .

119
Most Significant Contributions 1987-2000
  • Asymptotically fastest algorithms
  • Interprocedural slicing
  • Interprocedural dataflow analysis
  • Demand algorithms
  • All appropriate demands may beat exhaustive
  • Tool for slicing and browsing ANSI C
  • Slices programs as large as 60,000 lines
  • University research distribution
  • Commercial product CodeSurfer (GrammaTech, Inc.)
  • CFL-reachability as unifying conceptual model
  • Kou 77, HolleyRosen 81, CooperKennedy 88,
    Callahan 88, Horwitz,Reps,Binkley 88, . . .
  • Identifies fundamental bottlenecks (e.g.,
    cubic-time barrier)

120
Path Problems
  • Static analysis
  • context-free reachability
  • Path profiling
  • path counting
  • Model checking
  • reachability
  • cyclicity
  • Testing
  • identifying non-executable paths

121
Ball-Larus Intraprocedural Path Profiling
NumPathsToExit(Exit) 1
122
Melski-Reps Interprocedural Path Profiling
123
Automatic Differentiation
124
Automatic Differentiation
double F(double x) int i double ans
1.0 for(i 1 i lt n i) ans ans
fi(x) return ans
double delta . . . / small constant
/ double F(double x) return (F(xdelta) -
F(x)) / delta
125
Automatic Differentiation
double F (double x) int i double ans
1.0 for(i 1 i lt n i) ans ans
fi(x) return ans
126
Automatic Differentiation
double F(double x) int i double ans
0.0 double ans 1.0 for(i 1 i lt n i)
ans ans fi(x) ans fi(x) ans
ans fi(x) return ans
127
Automatic Differentiation
x1
y1
xi
yj1
xm
yn
Write a Comment
User Comments (0)
About PowerShow.com