Data Structures and Algorithms for Efficient Shape Analysis - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Data Structures and Algorithms for Efficient Shape Analysis

Description:

Join operator requires isomorphism testing. First-Order Structure ADT ... mutable phase Join immutable phase. Dynamically switch representations. Functional ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 70
Provided by: RomanMa8
Category:

less

Transcript and Presenter's Notes

Title: Data Structures and Algorithms for Efficient Shape Analysis


1
Data Structures and Algorithms for Efficient
Shape Analysis
  • byRoman Manevich
  • Prepared under the supervision of Dr. Shmuel
    (Mooly) Sagiv

2
Motivation
  • TVLA is a powerful and general abstract
    interpretation system
  • Abstract interpretation in TVLA
  • Operational semantics is expressed with
    first-order logic TC formulae
  • Program states are represented assets of
    Evolving First-Order Structures
  • Efficiency is an issue

3
Outline
  • Shape Analysis quick intro
  • Compactly representing structures
  • Tuning abstraction to improve performance

4
What is Shape Analysis
  • Determines Shape Invariants for imperative
    programs
  • Can be used to verify a wide range of properties
    over different programming languages

5
reverse Example
/ list.h /typedef struct node struct node
n int data List
/ print.c /include list.hList reverse
(List x) List y, t y NULL while
(x ! NULL) t y y x
x x ? n y ? n t return
y
6
reverse Example
Shape before
x
n
n
. . .
Shape after
y
n
n
. . .
7
Definition of a First-Order Logical Structure
  • S ltU, ?gt
  • U a set of individuals (node set)
  • ? a mapping p(r) ? (Ur ? 0,1) the
    interpretation of p

8
Three-Valued Logic
  • 1 True
  • 0 False
  • 1/2 Unknown
  • A join semi-lattice 0 ? 1 1/2

1/2
?
?
9
Canonical Abstraction
  • Partition the individuals into equivalence
    classes based on the values of their unary
    predicates
  • Collapse other predicates via ?
  • pS (u1, ..., uk) ? pB (u1, ..., uk)
    f(u1)u1, ..., f(uk)uk)
  • At most 3n abstract individuals

10
Canonical Abstraction Example
u1 rn,x
u2 rn,x
u3 rn,x
n
n
n
x
n
x
n
11
Compactly Representing First-Order Logical
Structures
  • Space is a major bottleneck
  • Analysis explores many logical structures
  • Reduce space by sharing information across
    structures

12
Desired Properties
  • Sparse data structures
  • Share common sub-structures
  • Inherited sharing
  • Incidental sharing due to program invariants
  • But feasible time performance
  • Phase sensitive data structures

13
Chapter Outline
  • Background
  • First-order structure representations
  • Base representation (TVLA 0.91)
  • BDD representation
  • Empirical evaluation
  • Conclusion

14
First-Order Logical Structures
  • Generalize shape graphs
  • Arbitrary set of individuals
  • Arbitrary set of predicates on individuals
  • Dynamically evolving
  • Usually small changes
  • Properties are extracted by evaluating first
    order formula ?v1 , v x(v1) ? n(v1, v)
  • Join operator requires isomorphism testing

15
First-Order Structure ADT
  • Structure new() / empty structure /
  • SetOfNodes nodeSet(Structure)
  • Node newNode(Structure)
  • removeNode(Structure, node)
  • Kleene eval(Structure, p(r), ltu1, . . . ,urgt)
  • update(Structure, p(r), ltu1, . . . ,urgt, Kleene)
  • Structure copy(Structure)

16
print_all Example
/ list.h /typedef struct node struct node
n int data L
/ print.c /include list.hvoid print_all(L
y) L x x y while (x ! NULL) /
assert(x ! NULL) / printf(elemd,
x?data) x x?n
17
print_all Example

usm½
u1y1

S0
x y x(v) y(v)
copy(S0) S1
nodeset(S0) u1, u
eval(S0, y, u1) 1
update(S1, x, u1, 1)
x1
eval(S0, y, u) 0
update(S1, x, u, 0)
18
print_all Example

u1x1y1
while (x ! NULL)precondition ?v x(v)
usm½

S1

x x ? nfocus ?v1 x(v1) ? n(v1, v)x(v)
?v1 x(v1) ? n(v1, v)
usm½
u1y1
S2.0

u1y1
ux1
S2.1
n1



u.0sm½
u1y1
n1
S2.2
u.1x1
19
Overview and Main Results
  • Two novel representations of first-order
    structures
  • New BDD representation
  • New representation using functional maps
  • Implementation techniques
  • Empirical evaluation
  • Comparison of different representations
  • Space is reduced by a factor of 410
  • New representations scale better

20
Base Representation (Tal Lev-Ami SAS 2000)
  • Two-Level Map Predicate ? (Node Tuple ?
    Kleene)
  • Sparse Representation
  • Limited inherited sharing by Copy-On-Write

21
BDDs in a Nutshell (Bryant 86)
  • Ordered Binary Decision Diagrams
  • Data structure for Boolean functions
  • Functions are represented as (unique) DAGs

x1
x2
x2
x3
x3
x3
x3
1
0
0
0
0
1
0
1
22
BDDs in a Nutshell (Bryant 86)
  • Ordered Binary Decision Diagrams
  • Data structure for Boolean functions
  • Functions are represented as (unique) DAGs
  • Also achieve sharing across functions

x1
x1
x1
x2
x2
x2
x2
x2
x3
x3
x3
x3
x3
x3
x3
0
1
0
1
0
1
Duplicate Terminals
Duplicate Nonterminals
Redundant Tests
23
Encoding Structures Using Integers
  • Static encoding of
  • Predicates
  • Kleene values
  • Dynamic encoding of nodes
  • 0, 1, , n-1
  • Encode predicate ps values as
  • ep(p).en(u1). en(u2) . . en(un) . ek(Kleene)

24
BDD Representation of Integer Sets
  • Characteristic function
  • S1,5 1lt001gt 5lt101gt ?S
    (x1?x2?x3) ? (x1?x2?x3)

25
BDD Representation of Integer Sets
  • Characteristic function
  • S1,5 1lt001gt 5lt101gt ?S
    (x1?x2?x3) ? (x1?x2?x3)

26
BDD Representation Example

usm½
S0

S0
u1y1
1
27
BDD Representation Example

usm½
S0
S1

S0
u1y1
xy

u1x1y1
usm½

S1
1
28
BDD Representation Example
S2.2

usm½
S0
S1

S0
u1y1
xy

u1x1y1
usm½

S1
xx?n



u.0sm½
u1y1
n1
S2.2
u.1x1
1
29
BDD Representation Example
S2.2

usm½
S0
S1

S0
u1y1
xy

u1x1y1
usm½

S1
xx?n



u.0sm½
u1y1
n1
S2.2
u.1x1
1
30
Improved BDD Representation
  • Using this representation directlydoesnt save
    space canonicity doesnt carry over from
    propositional to first-order logic
  • Observation
  • Node names can be arbitrarily remapped without
    affecting the ADT semantics
  • Our heuristics
  • Use canonic node names to encode nodes and obtain
    a canonic representation
  • Increases incidental sharing
  • Reduces isomorphism test to pointer comparison
  • 4-10 space reduction

31
Reducing Time Overhead
  • Current implementation not optimized
  • Expensive formula evaluation
  • Hybrid representation
  • Distinguish between phasesmutable phase ? Join
    ? immutable phase
  • Dynamically switch representations

32
Functional Representation
  • Alternative representation for first-order
    structures
  • Structures represented by maps from integers to
    Kleene values
  • Tailored for representing first-order structures
  • Achieves better results than BDDs
  • Techniques similar to the BDD representation
  • More details in the thesis

33
Introduction to Functional Maps
  • A mapping N ? 0,½,1

34
Introduction to Functional Maps
  • Sparse maps

35
Introduction to Functional Maps
  • Share unique sub-maps

36
Introduction to Functional Maps
  • Share unique sub-maps

37
Functional Representation Example

usm½

u1y1
38
Functional Representation Example


u1x1y1
usm½

usm½

u1y1
39
Functional Representation Example




u1x1y1

u.0sm½
usm½
u1y1
n1

u.1x1
usm½

u1y1
40
Reducing Time Overhead
  • Lazy normalization is used to balance
    time/space performance

41
Empirical Evaluation
  • Benchmarks
  • Cleanness Analysis (SAS 2000)
  • Garbage Collector
  • CMP (PLDI 2002) of Java Front-End and Kernel
    Benchmarks
  • Mobile Ambients (ESOP 2000)
  • Stress testing the representations
  • We use relational analysis
  • Save structures in every CFG location

42
Space Results
43
Space Results
44
Abstract Counters
  • Ignore language/implementation details
  • A more reliable measurement technique
  • Count only crucial space information
  • Independent of C/Java

45
Abstract Counters Results
46
Trends in theCleanness Analysis Benchmark
47
Conclusions
  • Two novel representations of first-order
    structures
  • New BDD representation
  • New representation using functional maps
  • Implementation techniques
  • Substantially better than inherited sharing
  • Structure canonization is crucial
  • Normalization via hash-consing is the key
    technique

48
Conclusions
  • The use of BDDs for static analysis is not a
    panacea for space saving
  • Domain-specific encoding crucial for saving space
  • Failed attempts
  • Original implementation of Veiths encoding
  • PAG

49
Tuning Abstraction for Improved Performance
  • Analysis can be very costly
  • Explores many structuresGC example explores
    gt180,000 structures

50
Existing Analysis Modes
  • Relational analysis
  • Doubly-exponential in worst case
  • Our most precise method
  • Single-structure analysis (Tal Lev-Ami SAS 2000)
  • Singly-exponential in worst case
  • Can be very efficient
  • Can be very imprecise
  • Sometimes very inefficient

51
Single-Structure Analysis
May exist
n
u1
u
x
S0
n
u1
u
x
S0 ? S1
u1
x
S1
52
Single-Structure Analysis
  • Active property
  • ac0 doesnt exist in every concrete structure
  • ac1 exists in every concrete structure
  • ac1/2 may exist in some concrete structure

u1ac1
u ac1
n
x
S0
u1 ac1
u ac1/2
n
x
S0 ? S1
u1 ac1
x
S1
53
Single-Structure Analysis
  • Sometimes overly imprecise
  • Refine analysis by using nullary predicates to
    distinguish between different structures

54
Is there a sweet spot?
Efficiency
Relational Analysis
Precision
55
Chapter Outline
  • Removing embedded structures
  • Merging structures with same set of canonical
    names
  • Staged analysis to localize abstraction
  • Merging pseudo-embedded structures

56
Order Relations on Structures and Sets of
Structures
  • S, S ? 3-STRUCTS ? S if for every predicate p
  • ps(u1,,uk) ? ps((u1),, (uk))
  • (u (u)u gt 1) ? sms(u)
  • X, X ? 23-STRUCTX ? X
  • Every S?X has S?X and S?S

57
Compacting Transformations
  • We look for transformation T 23-STRUCT?
    23-STRUCT with the following properties
  • Compacting T(x) ? x
  • Conservative T(x) ? x
  • Without sacrificing precision

58
Removing Embedded Structures
S1
S0
x
x
n
y
y
u1 rn,trn,y
n
n
t
t
59
Removing Embedded Structures
Reversing a listwith exactly 3 cells
Reversing a listwith at least 3 cells
S1
S0
x
x
n
y
y
u1 rn,trn,y
n
n
t
t
60
Detecting Embedding is hard
  • In general, as hard as GRAPH ISOMORPHISM
  • Conditions for a unique mapping
  • Canonical abstraction
  • Definite values
  • Polynomial time check

61
Results (structures explored)
62
Results (structures explored)
63
Canonical Names Method
  • Canonical abstraction merges individuals with
    same canonical names (unary abstraction
    predicate values)
  • Merge structures with same set of canonical names
  • Both transformations preserve definity of
    abstraction predicates
  • But ignores precision of non-abstraction
    predicates

64
Canonical Abstraction Example
u1 rn,x
u2 rn,x
u3 rn,x
n
n
n
x
n
x
n
65
Merging Structures with Same Canonical Names
Example
u rn,x
n
x
n
S0
S0 ? S1
n
x
S1
n
n
x
66
Merging Structures with Same Canonical Names
Example
n
u0
u
x
S0
n
S0 ? S1
u0
u
x
S1
u0
u
x
67
Results (structures explored)
68
Localizing Abstraction
  • Find an appropriate subset of abstraction
    predicates for every CFG node
  • Observation programs contain dead variables
    exploit to make corresponding predicates dead
  • Compute predicate liveness to determine subset
    of abstraction predicates

69
reverse Example
List reverse (List x) L0 List y, t L1
y NULL L2 while (x ! NULL) L3
t y L4 y x L5 x x ? n L6
y ? n t L7 return y
y dead
t dead
all dead
70
Results (structures explored)
71
Compaction via Pseudo-Embedding
  • Pseudo-Embedding similar to embedding with
    respect to abs. predicates
  • S, S ? 3-STRUCTS ? S if for every abstract
    predicate p
  • ps(u) ? ps((u))
  • (u (u)u gt 1) ? sms(u)

72
Modified blur
  • Order relation on nodesu1 ? u2 if for every
    abstraction predicate p ps(u1) ? ps(u2)
  • blur merges u1 with u2 if u1 ? u2

73
blur Example
n
u0 rn,x
u rn,x
x
blur
n
u rn,x
x
74
Merging Pseudo-Embedded Structures Example
Abstraction predicates x,yNon-abstraction
predicates rn,x, rn,y, n
u rn,y rn,x
n
u0rn,x
x
y
n
S0
x
u rn,y 1/2 rn,x
S0 ? S1
n
y
S1
x
u rn,y rn,x
y
75
Results (structures explored)
76
Empirical Evaluation
  • Benchmarks
  • Garbage Collector
  • Mobile Ambients (ESOP 2000)
  • Sorting procedures (ISSTA 2000)
  • MA J2 completed without instrumentation
    predicates and without messages

77
Results (structures explored)
Out of memory
Out of time
False alarms
78
Conclusion
  • New method is usually much more efficient (by
    orders of magnitude)
  • Doesnt lose precision on benchmarks
  • Performance more stable than other methods

79
Future and Ongoing Work
  • Time optimizations
  • Symbolic (BDD) execution of TVLA operations
  • Compactly represent sets of structures
  • Improving abstraction locality
  • Truly live predicates
  • Analyzing liveness for core predicates and
    deriving for instrumentation predicates
  • Experiment with other compacting transformations
  • Achieve polynomial complexity

80
The End
Write a Comment
User Comments (0)
About PowerShow.com