Loading...

PPT – Data Structures and Algorithms for Efficient Shape Analysis PowerPoint presentation | free to view - id: 7149cc-NjA2Y

The Adobe Flash plugin is needed to view this content

Data Structures and Algorithms for Efficient

Shape Analysis

- by Roman Manevich
- Prepared under the supervision of Dr. Shmuel

(Mooly) Sagiv

Motivation

- TVLA is a powerful and general abstract

interpretation system - Abstract interpretation in TVLA
- Operational semantics is expressed with

first-order logic TC formulae - Program states are represented as sets of

Evolving First-Order Structures

- Efficiency is an issue

Outline

- Shape Analysis quick intro
- Compactly representing structures
- Tuning abstraction to improve performance

What is Shape Analysis

- Determines Shape Invariants for imperative

programs - Can be used to verify a wide range of properties

over different programming languages

reverse Example

/ list.h / typedef struct node struct node

n int data List

/ print.c / include list.h List reverse

(List x) List y, t y NULL while

(x ! NULL) t y y x

x x ? n y ? n t return

y

reverse Example

Shape before

x

n

n

. . .

Shape after

y

n

n

. . .

Definition of a First-Order Logical Structure

- S ltU, ?gt
- U a set of individuals (node set)
- ? a mapping p(r) ? (Ur ? 0,1) the

interpretation of p

Three-Valued Logic

- 1 True
- 0 False
- 1/2 Unknown
- A join semi-lattice 0 ? 1 1/2

1/2

?

?

Canonical Abstraction

- Partition the individuals into equivalence

classes based on the values of their unary

predicates - Collapse other predicates via ?
- pS (u1, ..., uk) ? pB (u1, ..., uk)

f(u1)u1, ..., f(uk)uk) - At most 3n abstract individuals

Canonical Abstraction Example

u1 rn,x

u2 rn,x

u3 rn,x

n

n

n

x

n

x

n

Compactly Representing First-Order Logical

Structures

- Space is a major bottleneck
- Analysis explores many logical structures
- Reduce space by sharing information across

structures

Desired Properties

- Sparse data structures
- Share common sub-structures
- Inherited sharing
- Incidental sharing due to program invariants
- But feasible time performance
- Phase sensitive data structures

Chapter Outline

- Background
- First-order structure representations
- Base representation (TVLA 0.91)
- BDD representation
- Empirical evaluation
- Conclusion

First-Order Logical Structures

- Generalize shape graphs
- Arbitrary set of individuals
- Arbitrary set of predicates on individuals
- Dynamically evolving
- Usually small changes
- Properties are extracted by evaluating first

order formula ?v1 , v x(v1) ? n(v1, v) - Join operator requires isomorphism testing

First-Order Structure ADT

- Structure new() / empty structure /
- SetOfNodes nodeSet(Structure)
- Node newNode(Structure)
- removeNode(Structure, node)
- Kleene eval(Structure, p(r), ltu1, . . . ,urgt)
- update(Structure, p(r), ltu1, . . . ,urgt, Kleene)
- Structure copy(Structure)

print_all Example

/ list.h / typedef struct node struct node

n int data L

/ print.c / include list.h void print_all(L

y) L x x y while (x ! NULL) /

assert(x ! NULL) / printf(elemd,

x?data) x x?n

print_all Example

n½

u sm½

u1 y1

n½

S0

x y x(v) y(v)

copy(S0) S1

nodeset(S0) u1, u

eval(S0, y, u1) 1

update(S1, x, u1, 1)

x1

eval(S0, y, u) 0

update(S1, x, u, 0)

print_all Example

n½

u1 x1 y1

while (x ! NULL) precondition ?v x(v)

u sm½

n½

S1

n½

x x ? n focus ?v1 x(v1) ? n(v1, v) x(v)

?v1 x(v1) ? n(v1, v)

u sm½

u1 y1

S2.0

n½

u1 y1

u x1

S2.1

n1

n½

n½

n½

u.0 sm½

u1 y1

n1

S2.2

u.1 x1

Overview and Main Results

- Two novel representations of first-order

structures - New BDD representation
- New representation using functional maps
- Implementation techniques
- Empirical evaluation
- Comparison of different representations
- Space is reduced by a factor of 410
- New representations scale better

Base Representation (Tal Lev-Ami SAS 2000)

- Two-Level Map Predicate ? (Node Tuple ?

Kleene) - Sparse Representation
- Limited inherited sharing by Copy-On-Write

BDDs in a Nutshell (Bryant 86)

- Ordered Binary Decision Diagrams
- Data structure for Boolean functions
- Functions are represented as (unique) DAGs

f x3 x2 x1

0 0 0 0

0 1 0 0

0 0 1 0

1 1 1 0

0 0 0 1

1 1 0 1

0 0 1 1

1 1 1 1

x1

x2

x2

x3

x3

x3

x3

1

0

0

0

0

1

0

1

BDDs in a Nutshell (Bryant 86)

- Ordered Binary Decision Diagrams
- Data structure for Boolean functions
- Functions are represented as (unique) DAGs

- Also achieve sharing across functions

x1

x1

x1

x2

x2

x2

x2

x2

x3

x3

x3

x3

x3

x3

x3

0

1

0

1

0

1

Duplicate Terminals

Duplicate Nonterminals

Redundant Tests

Encoding Structures Using Integers

- Static encoding of
- Predicates
- Kleene values
- Dynamic encoding of nodes
- 0, 1, , n-1
- Encode predicate ps values as
- ep(p).en(u1). en(u2) . . en(un) . ek(Kleene)

BDD Representation of Integer Sets

- Characteristic function
- S1,5 1lt001gt 5lt101gt ?S

(x1?x2?x3) ? (x1?x2?x3)

BDD Representation of Integer Sets

- Characteristic function
- S1,5 1lt001gt 5lt101gt ?S

(x1?x2?x3) ? (x1?x2?x3)

BDD Representation Example

n½

u sm½

S0

n½

S0

u1 y1

1

BDD Representation Example

n½

u sm½

S0

S1

n½

S0

u1 y1

xy

n½

u1 x1 y1

u sm½

n½

S1

1

BDD Representation Example

S2.2

n½

u sm½

S0

S1

n½

S0

u1 y1

xy

n½

u1 x1 y1

u sm½

n½

S1

xx?n

n½

n½

n½

u.0 sm½

u1 y1

n1

S2.2

u.1 x1

1

BDD Representation Example

S2.2

n½

u sm½

S0

S1

n½

S0

u1 y1

xy

n½

u1 x1 y1

u sm½

n½

S1

xx?n

n½

n½

n½

u.0 sm½

u1 y1

n1

S2.2

u.1 x1

1

Improved BDD Representation

- Using this representation directly doesnt save

space canonicity doesnt carry over from

propositional to first-order logic - Observation
- Node names can be arbitrarily remapped without

affecting the ADT semantics - Our heuristics
- Use canonic node names to encode nodes and obtain

a canonic representation - Increases incidental sharing
- Reduces isomorphism test to pointer comparison
- 4-10 space reduction

Reducing Time Overhead

- Current implementation not optimized
- Expensive formula evaluation
- Hybrid representation
- Distinguish between phases mutable phase ? Join

? immutable phase - Dynamically switch representations

Functional Representation

- Alternative representation for first-order

structures - Structures represented by maps from integers to

Kleene values - Tailored for representing first-order structures
- Achieves better results than BDDs
- Techniques similar to the BDD representation
- More details in the thesis

Introduction to Functional Maps

- A mapping N ? 0,½,1

2 1 0

1 0 ½

Introduction to Functional Maps

- Sparse maps

size 27 size 27 size 27

size 9 size 9 size 9

2 1 0

1 0 ½

5 4 3

0 0 0

8 7 6

1 0 ½

Introduction to Functional Maps

- Share unique sub-maps

size 27 size 27 size 27

size 9 size 9 size 9

2 1 0

1 0 ½

8 7 6

1 0 ½

Introduction to Functional Maps

- Share unique sub-maps

size 27 size 27 size 27

size 9 size 9 size 9

2 1 0

1 0 ½

Functional Representation Example

n½

u sm½

n½

u1 y1

S0 S0 S0

binary unary nullary

size27 size27 size27

size9 size9 size9

size9 size9 size9

y x sm

1 0 0

y x sm

0 0 ½

n

½

Functional Representation Example

n½

n½

u1 x1 y1

u sm½

n½

u sm½

n½

u1 y1

S0 S0 S0

binary unary nullary

S1 S1 S1

binary unary nullary

size27 size27 size27

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

y x sm

1 0 0

y x sm

0 0 ½

y x sm

1 1 0

n

½

Functional Representation Example

n½

n½

n½

n½

u1 x1 y1

n½

u.0 sm½

u sm½

u1 y1

n1

n½

u.1 x1

u sm½

n½

u1 y1

S0 S0 S0

binary unary nullary

S2.2 S2.2 S2.2

binary unary nullary

S1 S1 S1

binary unary nullary

size81 size81 size81

size27 size27 size27

size27 size27 size27

size27 size27 size27

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

size9 size9 size9

y x sm

1 0 0

y x sm

0 0 ½

y x sm

0 1 0

y x sm

1 1 0

n

½

n

1

Reducing Time Overhead

- Lazy normalization is used to balance

time/space performance

Empirical Evaluation

- Benchmarks
- Cleanness Analysis (SAS 2000)
- Garbage Collector
- CMP (PLDI 2002) of Java Front-End and Kernel

Benchmarks - Mobile Ambients (ESOP 2000)
- Stress testing the representations
- We use relational analysis
- Save structures in every CFG location

Space Results

Space Results

Abstract Counters

- Ignore language/implementation details
- A more reliable measurement technique
- Count only crucial space information
- Independent of C/Java

Abstract Counters Results

Trends in the Cleanness Analysis Benchmark

Conclusions

- Two novel representations of first-order

structures - New BDD representation
- New representation using functional maps
- Implementation techniques
- Substantially better than inherited sharing
- Structure canonization is crucial
- Normalization via hash-consing is the key

technique

Conclusions

- The use of BDDs for static analysis is not a

panacea for space saving - Domain-specific encoding crucial for saving space
- Failed attempts
- Original implementation of Veiths encoding
- PAG

Tuning Abstraction for Improved Performance

- Analysis can be very costly
- Explores many structures GC example explores

gt180,000 structures

Existing Analysis Modes

- Relational analysis
- Doubly-exponential in worst case
- Our most precise method
- Single-structure analysis (Tal Lev-Ami SAS 2000)
- Singly-exponential in worst case
- Can be very efficient
- Can be very imprecise
- Sometimes very inefficient

Single-Structure Analysis

May exist

n

u1

u

x

S0

n

u1

u

x

S0 ? S1

u1

x

S1

Single-Structure Analysis

- Active property
- ac0 doesnt exist in every concrete structure
- ac1 exists in every concrete structure
- ac1/2 may exist in some concrete structure

u1 ac1

u ac1

n

x

S0

u1 ac1

u ac1/2

n

x

S0 ? S1

u1 ac1

x

S1

Single-Structure Analysis

- Sometimes overly imprecise
- Refine analysis by using nullary predicates to

distinguish between different structures

Is there a sweet spot?

Efficiency

Relational Analysis

Precision

Chapter Outline

- Removing embedded structures
- Merging structures with same set of canonical

names - Staged analysis to localize abstraction
- Merging pseudo-embedded structures

Order Relations on Structures and Sets of

Structures

- S, S ? 3-STRUCT S ?ƒ S if for every predicate p
- ps(u1,,uk) ? ps(ƒ(u1),, ƒ(uk))
- (u ƒ(u)u gt 1) ? sms(u)
- X, X ? 23-STRUCT X ? X
- Every S?X has S?X and S?S

Compacting Transformations

- We look for transformation T 23-STRUCT?

23-STRUCT with the following properties - Compacting T(x) ? x
- Conservative T(x) ? x
- Without sacrificing precision

Removing Embedded Structures

S1

S0

x

x

n

y

y

u1 rn,t rn,y

n

n

t

t

Removing Embedded Structures

Reversing a list with exactly 3 cells

Reversing a list with at least 3 cells

S1

S0

x

x

n

y

y

u1 rn,t rn,y

n

n

t

t

Detecting Embedding is hard

- In general, as hard as GRAPH ISOMORPHISM
- Conditions for a unique mapping
- Canonical abstraction
- Definite values
- Polynomial time check

Results (structures explored)

Results (structures explored)

Canonical Names Method

- Canonical abstraction merges individuals with

same canonical names (unary abstraction

predicate values) - Merge structures with same set of canonical names
- Both transformations preserve definity of

abstraction predicates - But ignores precision of non-abstraction

predicates

Canonical Abstraction Example

u1 rn,x

u2 rn,x

u3 rn,x

n

n

n

x

n

x

n

Merging Structures with Same Canonical Names

Example

u rn,x

n

x

n

S0

S0 ? S1

n

x

S1

n

n

x

Merging Structures with Same Canonical Names

Example

n

u0

u

x

S0

n

S0 ? S1

u0

u

x

S1

u0

u

x

Results (structures explored)

Localizing Abstraction

- Find an appropriate subset of abstraction

predicates for every CFG node - Observation programs contain dead variables

exploit to make corresponding predicates dead - Compute predicate liveness to determine subset

of abstraction predicates

reverse Example

List reverse (List x) L0 List y, t L1

y NULL L2 while (x ! NULL) L3

t y L4 y x L5 x x ? n L6

y ? n t L7 return y

y dead

t dead

all dead

Results (structures explored)

Compaction via Pseudo-Embedding

- Pseudo-Embedding similar to embedding with

respect to abs. predicates - S, S ? 3-STRUCT S ?ƒ S if for every abstract

predicate p - ps(u) ? ps(ƒ(u))
- (u ƒ(u)u gt 1) ? sms(u)

Modified blur

- Order relation on nodes u1 ? u2 if for every

abstraction predicate p ps(u1) ? ps(u2) - blur merges u1 with u2 if u1 ? u2

blur Example

n

u0 rn,x

u rn,x

x

blur

n

u rn,x

x

Merging Pseudo-Embedded Structures Example

Abstraction predicates x,y Non-abstraction

predicates rn,x, rn,y, n

u rn,y rn,x

n

u0 rn,x

x

y

n

S0

x

u rn,y 1/2 rn,x

S0 ? S1

n

y

S1

x

u rn,y rn,x

y

Results (structures explored)

Empirical Evaluation

- Benchmarks
- Garbage Collector
- Mobile Ambients (ESOP 2000)
- Sorting procedures (ISSTA 2000)
- MA J2 completed without instrumentation

predicates and without messages

Results (structures explored)

Out of memory

Out of time

False alarms

Conclusion

- New method is usually much more efficient (by

orders of magnitude) - Doesnt lose precision on benchmarks
- Performance more stable than other methods

Future and Ongoing Work

- Time optimizations
- Symbolic (BDD) execution of TVLA operations
- Compactly represent sets of structures
- Improving abstraction locality
- Truly live predicates
- Analyzing liveness for core predicates and

deriving for instrumentation predicates - Experiment with other compacting transformations
- Achieve polynomial complexity

The End