Title: Data Structures and Algorithms for Efficient Shape Analysis
1Data Structures and Algorithms for Efficient
Shape Analysis
- byRoman Manevich
- Prepared under the supervision of Dr. Shmuel
(Mooly) Sagiv
2Motivation
- TVLA is a powerful and general abstract
interpretation system - Abstract interpretation in TVLA
- Operational semantics is expressed with
first-order logic TC formulae - Program states are represented assets of
Evolving First-Order Structures
3Outline
- Shape Analysis quick intro
- Compactly representing structures
- Tuning abstraction to improve performance
4What is Shape Analysis
- Determines Shape Invariants for imperative
programs - Can be used to verify a wide range of properties
over different programming languages
5reverse Example
/ list.h /typedef struct node struct node
n int data List
/ print.c /include list.hList reverse
(List x) List y, t y NULL while
(x ! NULL) t y y x
x x ? n y ? n t return
y
6reverse Example
Shape before
x
n
n
. . .
Shape after
y
n
n
. . .
7Definition of a First-Order Logical Structure
- S ltU, ?gt
- U a set of individuals (node set)
- ? a mapping p(r) ? (Ur ? 0,1) the
interpretation of p
8Three-Valued Logic
- 1 True
- 0 False
- 1/2 Unknown
- A join semi-lattice 0 ? 1 1/2
1/2
?
?
9Canonical Abstraction
- Partition the individuals into equivalence
classes based on the values of their unary
predicates - Collapse other predicates via ?
- pS (u1, ..., uk) ? pB (u1, ..., uk)
f(u1)u1, ..., f(uk)uk) - At most 3n abstract individuals
10Canonical Abstraction Example
u1 rn,x
u2 rn,x
u3 rn,x
n
n
n
x
n
x
n
11Compactly Representing First-Order Logical
Structures
- Space is a major bottleneck
- Analysis explores many logical structures
- Reduce space by sharing information across
structures
12Desired Properties
- Sparse data structures
- Share common sub-structures
- Inherited sharing
- Incidental sharing due to program invariants
- But feasible time performance
- Phase sensitive data structures
13Chapter Outline
- Background
- First-order structure representations
- Base representation (TVLA 0.91)
- BDD representation
- Empirical evaluation
- Conclusion
14First-Order Logical Structures
- Generalize shape graphs
- Arbitrary set of individuals
- Arbitrary set of predicates on individuals
- Dynamically evolving
- Usually small changes
- Properties are extracted by evaluating first
order formula ?v1 , v x(v1) ? n(v1, v) - Join operator requires isomorphism testing
15First-Order Structure ADT
- Structure new() / empty structure /
- SetOfNodes nodeSet(Structure)
- Node newNode(Structure)
- removeNode(Structure, node)
- Kleene eval(Structure, p(r), ltu1, . . . ,urgt)
- update(Structure, p(r), ltu1, . . . ,urgt, Kleene)
- Structure copy(Structure)
16print_all Example
/ list.h /typedef struct node struct node
n int data L
/ print.c /include list.hvoid print_all(L
y) L x x y while (x ! NULL) /
assert(x ! NULL) / printf(elemd,
x?data) x x?n
17print_all Example
n½
usm½
u1y1
n½
S0
x y x(v) y(v)
copy(S0) S1
nodeset(S0) u1, u
eval(S0, y, u1) 1
update(S1, x, u1, 1)
x1
eval(S0, y, u) 0
update(S1, x, u, 0)
18print_all Example
n½
u1x1y1
while (x ! NULL)precondition ?v x(v)
usm½
n½
S1
n½
x x ? nfocus ?v1 x(v1) ? n(v1, v)x(v)
?v1 x(v1) ? n(v1, v)
usm½
u1y1
S2.0
n½
u1y1
ux1
S2.1
n1
n½
n½
n½
u.0sm½
u1y1
n1
S2.2
u.1x1
19Overview and Main Results
- Two novel representations of first-order
structures - New BDD representation
- New representation using functional maps
- Implementation techniques
- Empirical evaluation
- Comparison of different representations
- Space is reduced by a factor of 410
- New representations scale better
20Base Representation (Tal Lev-Ami SAS 2000)
- Two-Level Map Predicate ? (Node Tuple ?
Kleene) - Sparse Representation
- Limited inherited sharing by Copy-On-Write
21BDDs in a Nutshell (Bryant 86)
- Ordered Binary Decision Diagrams
- Data structure for Boolean functions
- Functions are represented as (unique) DAGs
x1
x2
x2
x3
x3
x3
x3
1
0
0
0
0
1
0
1
22BDDs in a Nutshell (Bryant 86)
- Ordered Binary Decision Diagrams
- Data structure for Boolean functions
- Functions are represented as (unique) DAGs
- Also achieve sharing across functions
x1
x1
x1
x2
x2
x2
x2
x2
x3
x3
x3
x3
x3
x3
x3
0
1
0
1
0
1
Duplicate Terminals
Duplicate Nonterminals
Redundant Tests
23Encoding Structures Using Integers
- Static encoding of
- Predicates
- Kleene values
- Dynamic encoding of nodes
- 0, 1, , n-1
- Encode predicate ps values as
- ep(p).en(u1). en(u2) . . en(un) . ek(Kleene)
24BDD Representation of Integer Sets
- Characteristic function
- S1,5 1lt001gt 5lt101gt ?S
(x1?x2?x3) ? (x1?x2?x3)
25BDD Representation of Integer Sets
- Characteristic function
- S1,5 1lt001gt 5lt101gt ?S
(x1?x2?x3) ? (x1?x2?x3)
26BDD Representation Example
n½
usm½
S0
n½
S0
u1y1
1
27BDD Representation Example
n½
usm½
S0
S1
n½
S0
u1y1
xy
n½
u1x1y1
usm½
n½
S1
1
28BDD Representation Example
S2.2
n½
usm½
S0
S1
n½
S0
u1y1
xy
n½
u1x1y1
usm½
n½
S1
xx?n
n½
n½
n½
u.0sm½
u1y1
n1
S2.2
u.1x1
1
29BDD Representation Example
S2.2
n½
usm½
S0
S1
n½
S0
u1y1
xy
n½
u1x1y1
usm½
n½
S1
xx?n
n½
n½
n½
u.0sm½
u1y1
n1
S2.2
u.1x1
1
30Improved BDD Representation
- Using this representation directlydoesnt save
space canonicity doesnt carry over from
propositional to first-order logic - Observation
- Node names can be arbitrarily remapped without
affecting the ADT semantics - Our heuristics
- Use canonic node names to encode nodes and obtain
a canonic representation - Increases incidental sharing
- Reduces isomorphism test to pointer comparison
- 4-10 space reduction
31Reducing Time Overhead
- Current implementation not optimized
- Expensive formula evaluation
- Hybrid representation
- Distinguish between phasesmutable phase ? Join
? immutable phase - Dynamically switch representations
32Functional Representation
- Alternative representation for first-order
structures - Structures represented by maps from integers to
Kleene values - Tailored for representing first-order structures
- Achieves better results than BDDs
- Techniques similar to the BDD representation
- More details in the thesis
33Introduction to Functional Maps
34Introduction to Functional Maps
35Introduction to Functional Maps
36Introduction to Functional Maps
37Functional Representation Example
n½
usm½
n½
u1y1
38Functional Representation Example
n½
n½
u1x1y1
usm½
n½
usm½
n½
u1y1
39Functional Representation Example
n½
n½
n½
n½
u1x1y1
n½
u.0sm½
usm½
u1y1
n1
n½
u.1x1
usm½
n½
u1y1
40Reducing Time Overhead
- Lazy normalization is used to balance
time/space performance
41Empirical Evaluation
- Benchmarks
- Cleanness Analysis (SAS 2000)
- Garbage Collector
- CMP (PLDI 2002) of Java Front-End and Kernel
Benchmarks - Mobile Ambients (ESOP 2000)
- Stress testing the representations
- We use relational analysis
- Save structures in every CFG location
42Space Results
43Space Results
44Abstract Counters
- Ignore language/implementation details
- A more reliable measurement technique
- Count only crucial space information
- Independent of C/Java
45Abstract Counters Results
46Trends in theCleanness Analysis Benchmark
47Conclusions
- Two novel representations of first-order
structures - New BDD representation
- New representation using functional maps
- Implementation techniques
- Substantially better than inherited sharing
- Structure canonization is crucial
- Normalization via hash-consing is the key
technique
48Conclusions
- The use of BDDs for static analysis is not a
panacea for space saving - Domain-specific encoding crucial for saving space
- Failed attempts
- Original implementation of Veiths encoding
- PAG
49Tuning Abstraction for Improved Performance
- Analysis can be very costly
- Explores many structuresGC example explores
gt180,000 structures
50Existing Analysis Modes
- Relational analysis
- Doubly-exponential in worst case
- Our most precise method
- Single-structure analysis (Tal Lev-Ami SAS 2000)
- Singly-exponential in worst case
- Can be very efficient
- Can be very imprecise
- Sometimes very inefficient
51Single-Structure Analysis
May exist
n
u1
u
x
S0
n
u1
u
x
S0 ? S1
u1
x
S1
52Single-Structure Analysis
- Active property
- ac0 doesnt exist in every concrete structure
- ac1 exists in every concrete structure
- ac1/2 may exist in some concrete structure
u1ac1
u ac1
n
x
S0
u1 ac1
u ac1/2
n
x
S0 ? S1
u1 ac1
x
S1
53Single-Structure Analysis
- Sometimes overly imprecise
- Refine analysis by using nullary predicates to
distinguish between different structures
54Is there a sweet spot?
Efficiency
Relational Analysis
Precision
55Chapter Outline
- Removing embedded structures
- Merging structures with same set of canonical
names - Staged analysis to localize abstraction
- Merging pseudo-embedded structures
56Order Relations on Structures and Sets of
Structures
- S, S ? 3-STRUCTS ? S if for every predicate p
- ps(u1,,uk) ? ps((u1),, (uk))
- (u (u)u gt 1) ? sms(u)
- X, X ? 23-STRUCTX ? X
- Every S?X has S?X and S?S
57Compacting Transformations
- We look for transformation T 23-STRUCT?
23-STRUCT with the following properties - Compacting T(x) ? x
- Conservative T(x) ? x
- Without sacrificing precision
58Removing Embedded Structures
S1
S0
x
x
n
y
y
u1 rn,trn,y
n
n
t
t
59Removing Embedded Structures
Reversing a listwith exactly 3 cells
Reversing a listwith at least 3 cells
S1
S0
x
x
n
y
y
u1 rn,trn,y
n
n
t
t
60Detecting Embedding is hard
- In general, as hard as GRAPH ISOMORPHISM
- Conditions for a unique mapping
- Canonical abstraction
- Definite values
- Polynomial time check
61Results (structures explored)
62Results (structures explored)
63Canonical Names Method
- Canonical abstraction merges individuals with
same canonical names (unary abstraction
predicate values) - Merge structures with same set of canonical names
- Both transformations preserve definity of
abstraction predicates - But ignores precision of non-abstraction
predicates
64Canonical Abstraction Example
u1 rn,x
u2 rn,x
u3 rn,x
n
n
n
x
n
x
n
65Merging Structures with Same Canonical Names
Example
u rn,x
n
x
n
S0
S0 ? S1
n
x
S1
n
n
x
66Merging Structures with Same Canonical Names
Example
n
u0
u
x
S0
n
S0 ? S1
u0
u
x
S1
u0
u
x
67Results (structures explored)
68Localizing Abstraction
- Find an appropriate subset of abstraction
predicates for every CFG node - Observation programs contain dead variables
exploit to make corresponding predicates dead - Compute predicate liveness to determine subset
of abstraction predicates
69reverse Example
List reverse (List x) L0 List y, t L1
y NULL L2 while (x ! NULL) L3
t y L4 y x L5 x x ? n L6
y ? n t L7 return y
y dead
t dead
all dead
70Results (structures explored)
71Compaction via Pseudo-Embedding
- Pseudo-Embedding similar to embedding with
respect to abs. predicates - S, S ? 3-STRUCTS ? S if for every abstract
predicate p - ps(u) ? ps((u))
- (u (u)u gt 1) ? sms(u)
72Modified blur
- Order relation on nodesu1 ? u2 if for every
abstraction predicate p ps(u1) ? ps(u2) - blur merges u1 with u2 if u1 ? u2
73blur Example
n
u0 rn,x
u rn,x
x
blur
n
u rn,x
x
74Merging Pseudo-Embedded Structures Example
Abstraction predicates x,yNon-abstraction
predicates rn,x, rn,y, n
u rn,y rn,x
n
u0rn,x
x
y
n
S0
x
u rn,y 1/2 rn,x
S0 ? S1
n
y
S1
x
u rn,y rn,x
y
75Results (structures explored)
76Empirical Evaluation
- Benchmarks
- Garbage Collector
- Mobile Ambients (ESOP 2000)
- Sorting procedures (ISSTA 2000)
- MA J2 completed without instrumentation
predicates and without messages
77Results (structures explored)
Out of memory
Out of time
False alarms
78Conclusion
- New method is usually much more efficient (by
orders of magnitude) - Doesnt lose precision on benchmarks
- Performance more stable than other methods
79Future and Ongoing Work
- Time optimizations
- Symbolic (BDD) execution of TVLA operations
- Compactly represent sets of structures
- Improving abstraction locality
- Truly live predicates
- Analyzing liveness for core predicates and
deriving for instrumentation predicates - Experiment with other compacting transformations
- Achieve polynomial complexity
80The End