# Data Structures and Algorithms for Efficient Shape Analysis - PowerPoint PPT Presentation

PPT – Data Structures and Algorithms for Efficient Shape Analysis PowerPoint presentation | free to view - id: 7149cc-NjA2Y

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Data Structures and Algorithms for Efficient Shape Analysis

Description:

### Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 70
Provided by: Roma114
Category:
Tags:
Transcript and Presenter's Notes

Title: Data Structures and Algorithms for Efficient Shape Analysis

1
Data Structures and Algorithms for Efficient
Shape Analysis
• by Roman Manevich
• Prepared under the supervision of Dr. Shmuel
(Mooly) Sagiv

2
Motivation
• TVLA is a powerful and general abstract
interpretation system
• Abstract interpretation in TVLA
• Operational semantics is expressed with
first-order logic TC formulae
• Program states are represented as sets of
Evolving First-Order Structures
• Efficiency is an issue

3
Outline
• Shape Analysis quick intro
• Compactly representing structures
• Tuning abstraction to improve performance

4
What is Shape Analysis
• Determines Shape Invariants for imperative
programs
• Can be used to verify a wide range of properties
over different programming languages

5
reverse Example
/ list.h / typedef struct node struct node
n int data List
/ print.c / include list.h List reverse
(List x) List y, t y NULL while
(x ! NULL) t y y x
x x ? n y ? n t return
y
6
reverse Example
Shape before
x
n
n
. . .
Shape after
y
n
n
. . .
7
Definition of a First-Order Logical Structure
• S ltU, ?gt
• U a set of individuals (node set)
• ? a mapping p(r) ? (Ur ? 0,1) the
interpretation of p

8
Three-Valued Logic
• 1 True
• 0 False
• 1/2 Unknown
• A join semi-lattice 0 ? 1 1/2

1/2
?
?
9
Canonical Abstraction
• Partition the individuals into equivalence
classes based on the values of their unary
predicates
• Collapse other predicates via ?
• pS (u1, ..., uk) ? pB (u1, ..., uk)
f(u1)u1, ..., f(uk)uk)
• At most 3n abstract individuals

10
Canonical Abstraction Example
u1 rn,x
u2 rn,x
u3 rn,x
n
n
n
x
n
x
n
11
Compactly Representing First-Order Logical
Structures
• Space is a major bottleneck
• Analysis explores many logical structures
• Reduce space by sharing information across
structures

12
Desired Properties
• Sparse data structures
• Share common sub-structures
• Inherited sharing
• Incidental sharing due to program invariants
• But feasible time performance
• Phase sensitive data structures

13
Chapter Outline
• Background
• First-order structure representations
• Base representation (TVLA 0.91)
• BDD representation
• Empirical evaluation
• Conclusion

14
First-Order Logical Structures
• Generalize shape graphs
• Arbitrary set of individuals
• Arbitrary set of predicates on individuals
• Dynamically evolving
• Usually small changes
• Properties are extracted by evaluating first
order formula ?v1 , v x(v1) ? n(v1, v)
• Join operator requires isomorphism testing

15
• Structure new() / empty structure /
• SetOfNodes nodeSet(Structure)
• Node newNode(Structure)
• removeNode(Structure, node)
• Kleene eval(Structure, p(r), ltu1, . . . ,urgt)
• update(Structure, p(r), ltu1, . . . ,urgt, Kleene)
• Structure copy(Structure)

16
print_all Example
/ list.h / typedef struct node struct node
n int data L
/ print.c / include list.h void print_all(L
y) L x x y while (x ! NULL) /
assert(x ! NULL) / printf(elemd,
x?data) x x?n
17
print_all Example

u sm½
u1 y1

S0
x y x(v) y(v)
copy(S0) S1
nodeset(S0) u1, u
eval(S0, y, u1) 1
update(S1, x, u1, 1)
x1
eval(S0, y, u) 0
update(S1, x, u, 0)
18
print_all Example

u1 x1 y1
while (x ! NULL) precondition ?v x(v)
u sm½

S1

x x ? n focus ?v1 x(v1) ? n(v1, v) x(v)
?v1 x(v1) ? n(v1, v)
u sm½
u1 y1
S2.0

u1 y1
u x1
S2.1
n1

u.0 sm½
u1 y1
n1
S2.2
u.1 x1
19
Overview and Main Results
• Two novel representations of first-order
structures
• New BDD representation
• New representation using functional maps
• Implementation techniques
• Empirical evaluation
• Comparison of different representations
• Space is reduced by a factor of 410
• New representations scale better

20
Base Representation (Tal Lev-Ami SAS 2000)
• Two-Level Map Predicate ? (Node Tuple ?
Kleene)
• Sparse Representation
• Limited inherited sharing by Copy-On-Write

21
BDDs in a Nutshell (Bryant 86)
• Ordered Binary Decision Diagrams
• Data structure for Boolean functions
• Functions are represented as (unique) DAGs

f x3 x2 x1
0 0 0 0
0 1 0 0
0 0 1 0
1 1 1 0
0 0 0 1
1 1 0 1
0 0 1 1
1 1 1 1
x1
x2
x2
x3
x3
x3
x3
1
0
0
0
0
1
0
1
22
BDDs in a Nutshell (Bryant 86)
• Ordered Binary Decision Diagrams
• Data structure for Boolean functions
• Functions are represented as (unique) DAGs
• Also achieve sharing across functions

x1
x1
x1
x2
x2
x2
x2
x2
x3
x3
x3
x3
x3
x3
x3
0
1
0
1
0
1
Duplicate Terminals
Duplicate Nonterminals
Redundant Tests
23
Encoding Structures Using Integers
• Static encoding of
• Predicates
• Kleene values
• Dynamic encoding of nodes
• 0, 1, , n-1
• Encode predicate ps values as
• ep(p).en(u1). en(u2) . . en(un) . ek(Kleene)

24
BDD Representation of Integer Sets
• Characteristic function
• S1,5 1lt001gt 5lt101gt ?S
(x1?x2?x3) ? (x1?x2?x3)

25
BDD Representation of Integer Sets
• Characteristic function
• S1,5 1lt001gt 5lt101gt ?S
(x1?x2?x3) ? (x1?x2?x3)

26
BDD Representation Example

u sm½
S0

S0
u1 y1
1
27
BDD Representation Example

u sm½
S0
S1

S0
u1 y1
xy

u1 x1 y1
u sm½

S1
1
28
BDD Representation Example
S2.2

u sm½
S0
S1

S0
u1 y1
xy

u1 x1 y1
u sm½

S1
xx?n

u.0 sm½
u1 y1
n1
S2.2
u.1 x1
1
29
BDD Representation Example
S2.2

u sm½
S0
S1

S0
u1 y1
xy

u1 x1 y1
u sm½

S1
xx?n

u.0 sm½
u1 y1
n1
S2.2
u.1 x1
1
30
Improved BDD Representation
• Using this representation directly doesnt save
space canonicity doesnt carry over from
propositional to first-order logic
• Observation
• Node names can be arbitrarily remapped without
• Our heuristics
• Use canonic node names to encode nodes and obtain
a canonic representation
• Increases incidental sharing
• Reduces isomorphism test to pointer comparison
• 4-10 space reduction

31
• Current implementation not optimized
• Expensive formula evaluation
• Hybrid representation
• Distinguish between phases mutable phase ? Join
? immutable phase
• Dynamically switch representations

32
Functional Representation
• Alternative representation for first-order
structures
• Structures represented by maps from integers to
Kleene values
• Tailored for representing first-order structures
• Achieves better results than BDDs
• Techniques similar to the BDD representation
• More details in the thesis

33
Introduction to Functional Maps
• A mapping N ? 0,½,1

2 1 0
1 0 ½
34
Introduction to Functional Maps
• Sparse maps

size 27 size 27 size 27

size 9 size 9 size 9

2 1 0
1 0 ½
5 4 3
0 0 0
8 7 6
1 0 ½
35
Introduction to Functional Maps
• Share unique sub-maps

size 27 size 27 size 27

size 9 size 9 size 9

2 1 0
1 0 ½
8 7 6
1 0 ½
36
Introduction to Functional Maps
• Share unique sub-maps

size 27 size 27 size 27

size 9 size 9 size 9

2 1 0
1 0 ½
37
Functional Representation Example

u sm½

u1 y1
S0 S0 S0
binary unary nullary
size27 size27 size27

size9 size9 size9

size9 size9 size9

y x sm
1 0 0
y x sm
0 0 ½
n
½
38
Functional Representation Example

u1 x1 y1
u sm½

u sm½

u1 y1
S0 S0 S0
binary unary nullary
S1 S1 S1
binary unary nullary
size27 size27 size27

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

y x sm
1 0 0
y x sm
0 0 ½
y x sm
1 1 0
n
½
39
Functional Representation Example

u1 x1 y1

u.0 sm½
u sm½
u1 y1
n1

u.1 x1
u sm½

u1 y1
S0 S0 S0
binary unary nullary
S2.2 S2.2 S2.2
binary unary nullary
S1 S1 S1
binary unary nullary
size81 size81 size81

size27 size27 size27

size27 size27 size27

size27 size27 size27

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

y x sm
1 0 0
y x sm
0 0 ½
y x sm
0 1 0
y x sm
1 1 0
n
½
n
1
40
• Lazy normalization is used to balance
time/space performance

41
Empirical Evaluation
• Benchmarks
• Cleanness Analysis (SAS 2000)
• Garbage Collector
• CMP (PLDI 2002) of Java Front-End and Kernel
Benchmarks
• Mobile Ambients (ESOP 2000)
• Stress testing the representations
• We use relational analysis
• Save structures in every CFG location

42
Space Results
43
Space Results
44
Abstract Counters
• Ignore language/implementation details
• A more reliable measurement technique
• Count only crucial space information
• Independent of C/Java

45
Abstract Counters Results
46
Trends in the Cleanness Analysis Benchmark
47
Conclusions
• Two novel representations of first-order
structures
• New BDD representation
• New representation using functional maps
• Implementation techniques
• Substantially better than inherited sharing
• Structure canonization is crucial
• Normalization via hash-consing is the key
technique

48
Conclusions
• The use of BDDs for static analysis is not a
panacea for space saving
• Domain-specific encoding crucial for saving space
• Failed attempts
• Original implementation of Veiths encoding
• PAG

49
Tuning Abstraction for Improved Performance
• Analysis can be very costly
• Explores many structures GC example explores
gt180,000 structures

50
Existing Analysis Modes
• Relational analysis
• Doubly-exponential in worst case
• Our most precise method
• Single-structure analysis (Tal Lev-Ami SAS 2000)
• Singly-exponential in worst case
• Can be very efficient
• Can be very imprecise
• Sometimes very inefficient

51
Single-Structure Analysis
May exist
n
u1
u
x
S0
n
u1
u
x
S0 ? S1
u1
x
S1
52
Single-Structure Analysis
• Active property
• ac0 doesnt exist in every concrete structure
• ac1 exists in every concrete structure
• ac1/2 may exist in some concrete structure

u1 ac1
u ac1
n
x
S0
u1 ac1
u ac1/2
n
x
S0 ? S1
u1 ac1
x
S1
53
Single-Structure Analysis
• Sometimes overly imprecise
• Refine analysis by using nullary predicates to
distinguish between different structures

54
Is there a sweet spot?
Efficiency
Relational Analysis
Precision
55
Chapter Outline
• Removing embedded structures
• Merging structures with same set of canonical
names
• Staged analysis to localize abstraction
• Merging pseudo-embedded structures

56
Order Relations on Structures and Sets of
Structures
• S, S ? 3-STRUCT S ?ƒ S if for every predicate p
• ps(u1,,uk) ? ps(ƒ(u1),, ƒ(uk))
• (u ƒ(u)u gt 1) ? sms(u)
• X, X ? 23-STRUCT X ? X
• Every S?X has S?X and S?S

57
Compacting Transformations
• We look for transformation T 23-STRUCT?
23-STRUCT with the following properties
• Compacting T(x) ? x
• Conservative T(x) ? x
• Without sacrificing precision

58
Removing Embedded Structures
S1
S0
x
x
n
y
y
u1 rn,t rn,y
n
n
t
t
59
Removing Embedded Structures
Reversing a list with exactly 3 cells
Reversing a list with at least 3 cells
S1
S0
x
x
n
y
y
u1 rn,t rn,y
n
n
t
t
60
Detecting Embedding is hard
• In general, as hard as GRAPH ISOMORPHISM
• Conditions for a unique mapping
• Canonical abstraction
• Definite values
• Polynomial time check

61
Results (structures explored)
62
Results (structures explored)
63
Canonical Names Method
• Canonical abstraction merges individuals with
same canonical names (unary abstraction
predicate values)
• Merge structures with same set of canonical names
• Both transformations preserve definity of
abstraction predicates
• But ignores precision of non-abstraction
predicates

64
Canonical Abstraction Example
u1 rn,x
u2 rn,x
u3 rn,x
n
n
n
x
n
x
n
65
Merging Structures with Same Canonical Names
Example
u rn,x
n
x
n
S0
S0 ? S1
n
x
S1
n
n
x
66
Merging Structures with Same Canonical Names
Example
n
u0
u
x
S0
n
S0 ? S1
u0
u
x
S1
u0
u
x
67
Results (structures explored)
68
Localizing Abstraction
• Find an appropriate subset of abstraction
predicates for every CFG node
• Observation programs contain dead variables
exploit to make corresponding predicates dead
• Compute predicate liveness to determine subset
of abstraction predicates

69
reverse Example
List reverse (List x) L0 List y, t L1
y NULL L2 while (x ! NULL) L3
t y L4 y x L5 x x ? n L6
y ? n t L7 return y
70
Results (structures explored)
71
Compaction via Pseudo-Embedding
• Pseudo-Embedding similar to embedding with
respect to abs. predicates
• S, S ? 3-STRUCT S ?ƒ S if for every abstract
predicate p
• ps(u) ? ps(ƒ(u))
• (u ƒ(u)u gt 1) ? sms(u)

72
Modified blur
• Order relation on nodes u1 ? u2 if for every
abstraction predicate p ps(u1) ? ps(u2)
• blur merges u1 with u2 if u1 ? u2

73
blur Example
n
u0 rn,x
u rn,x
x
blur
n
u rn,x
x
74
Merging Pseudo-Embedded Structures Example
Abstraction predicates x,y Non-abstraction
predicates rn,x, rn,y, n
u rn,y rn,x
n
u0 rn,x
x
y
n
S0
x
u rn,y 1/2 rn,x
S0 ? S1
n
y
S1
x
u rn,y rn,x
y
75
Results (structures explored)
76
Empirical Evaluation
• Benchmarks
• Garbage Collector
• Mobile Ambients (ESOP 2000)
• Sorting procedures (ISSTA 2000)
• MA J2 completed without instrumentation
predicates and without messages

77
Results (structures explored)
Out of memory
Out of time
False alarms
78
Conclusion
• New method is usually much more efficient (by
orders of magnitude)
• Doesnt lose precision on benchmarks
• Performance more stable than other methods

79
Future and Ongoing Work
• Time optimizations
• Symbolic (BDD) execution of TVLA operations
• Compactly represent sets of structures
• Improving abstraction locality
• Truly live predicates
• Analyzing liveness for core predicates and
deriving for instrumentation predicates
• Experiment with other compacting transformations
• Achieve polynomial complexity

80
The End