Introduction to Abstract Interpretation

About This Presentation

Title:

Introduction to Abstract Interpretation

Description:

Xmas Party. 2.00. Neil. SAT solving. 12.00. Free. Break. 10.45. Axel. Two Variable ... Computing Lab Xmas Party. Located in Origins the 'restaurant' in Darwin ... – PowerPoint PPT presentation

Number of Views:299

Avg rating:3.0/5.0

Slides: 67

Provided by: UlfNi1

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Abstract Interpretation

1
Introduction to Abstract Interpretation

Neil Kettle, Andy King and Axel Simon
a.m.king_at_kent.ac.uk
http//www.cs.kent.ac.uk/amk
Acknowledgments much of this material has been
adapted from surveys by Patrick and Radia Cousot

2
Applications of abstract interpretation

Verification can a concurrent program deadlock?
Is termination assured?
Parallelisation are two or more tasks
independent? What is the worst/base-case running
time of function?
Transformation can a definition be unfolded?
Will unfolding terminate?
Implementation can an operation be specialised
with knowledge of its (global) calling context?
Applications and players are incredibly diverse

3
House-keeping
4
Computing Lab Xmas Party

Located in Origins the restaurant in Darwin
A buffer lunch will be served courtesy of the
department
Department will supply some wine (which last year
lasted 10 minutes)
Bar will be open afterwards if some wine is not
enough wine
Send an e-mail to Deborah Sowrey
D.J.Sowery_at_kent.ac.uk if you want to attend
Come along and meet other post-grads

5
Casting out nines algorithm

Which of the following multiplications are
correct
2173 ? 38 81574 or
2173 ? 38 82574
Casting out nines is a checking technique that is
really a form of abstract interpretation
Sum the digits in the multiplicand n1, multiplier
n2 and the product n to obtain s1, s2 and s.
Divide s1, s2 and s by 9 to compute the
remainder, that is, r1 s1 mod 9, r2 s2 mod 9
and r s mod 9.
If (r1 ? r2) mod 9 ? r then multiplication is
incorrect
The algorithm returns incorrect or dont know

6
Running the numbers for 2173 ? 38 81574

Compute r1 (2173) mod 9
Compute r2 (38) mod 9
Calculate (r1 ? r2) mod 9
Calculate r (81574) mod 9
Check ((r1 ? r2) mod 9 r)
Deduce that 2173 ? 38 81574 is

7
Abstract interpretation is a theory of
relationships

The computational domain for multiplication
(concrete domain)
N the set of non-negative integers
The computational domain of remainders used in
the checking algorithm (abstract domain)
R 0, 1, , 8
Key question is what is the relationship between
an element n?N which is used in the real
algorithm and its analog r?R in the check

8
What is the relationship?

When multiplicand is n1 456, say, then the
check uses r1 (456) mod 9 4
Observe that
456 mod 9
(4100 56) mod 9
(490 410 56) mod 9
(410 56) mod 9
((4 5)10 6) mod 9
((4 5)9 (4 5) 6) mod 9
(4 5 6) mod 9
More generally, induction can show r1 n1 mod 9
and r2 n2 mod 9

9
Correctness is the preservation of relationships

The check simulates the concrete multiplication
and, in effect, is an abstract multiplication
Concrete multiplication is n n1 ? n2
Abstract multiplication is r (r1 ? r2) mod 9
Where r1 describes n1 and r2 describes n2
For brevity, write r ? n iff r n mod 9
Then abstract multiplication preserves ? iff
whenever r1 ? n1 and r2 ? n2 it follows that r ? n

10
Correctness argument

Suppose r1 ? n1 and r2 ? n2
If
n n1 ? n2 then
n mod 9 (n1 ? n2) mod 9 hence
n mod 9 ((n1 mod 9) ? (n2 mod 9)) mod 9 whence
n mod 9 (r1 ? r2) mod 9 r therefore
r ? n
Consequently if ?(r ? n) then n ? n1 ? n2

11
Summary

Formalise the relationship between the data
Check that the relationship is preserved by the
abstract analogues of the concrete operations
The relational framework Acta Informatica,
30(2)103-129,1993 not only emphases the theory
of relations but is very general

12
Numeric approximation and widening

Abstract interpretation does not require a domain
to be finite

13
Interval approximation

Consider the following Pascal-like program
SYNTOX PLDI90 inferred the invariants scoped
within
Invariants occur between consecutive lines in the
program
i?0,15 asserts 0?i?15 whereas i?0,0 means i0

begin i 0 1 i?0,0 while (i
lt 16) do 2 i?0,15
i i 1 3 i?1,16 end
4 i?16,16
14
Compilation versus (classic) interpretation

Abstract compilation compile the concrete
program into an abstract program (equation
system) and execute the abstract program
good separation of concerns that aids debugging
the particulars of the domain can be exploited to
reorder operations, specialise operations, etc
Abstract interpretation run the concrete
program but on-the-fly interpret its concrete
operations as abstract operations
ideal for a generic framework (toolkit) which is
parameterised by abstract domain plugins

15
Abstract domain that is used in interval analysis

Domain of intervals includes
l,u where l ? u and l,u ? Z for bounded sets ie
0, 5?0,1,4 since 0,1,4 ? 0, 5
? to represent the empty set of numbers, that is,
? ? ?
l,? for sets which are bounded below such as
l,l2,l4,
-?,u to represent sets which are bounded above
such as ..,l-5,l-3,l

16
Weakening intervals
if then 1 i?0,2 else 2
i?3,5 endif 3 i?0,5

Join (path merge) is defined
Put d1?d2 d1 if d2 ?
d2 else if d1 ?
min(l1,l2), max(u1,u2)
otherwise
whenever d1 l1,u1 and d2 l2,u2

17
Strengthening intervals

Meet is defined
Put d1?d2 ? if (d1 ?) ? (d2 ?)
max(l1,l2), min(u1,u2) otherwise
whenever d1 l1,u1 and d2 l2,u2

3 i?0,5 if (2 lt i) then 4 i?3,5
else 5 i?0,2
18
Meet and join are the basic primitives for
compilation

I1 0,0 since program point (1) immediately
follows the i 0
I2 (I1? I3) ? -?, 15 since
control from program points (1) and (3) flow
into (2)
point (2) is reached only if i lt 16 holds
I3 n1 n ? I2 since (3) is only reachable
from (2) via the increment
I4 (I1? I3) ? 16, ? since
control from (1) and (3) flow into (4)
point (4) is reached only if ?(i lt 16) holds

19
Interval iteration
20
Jacobi versus Gauss-Seidel iteration

With Jacobi, the new vector ?I1,I2,I3,I4? of
intervals is calculated from the old
?I1,I2,I3,I4?
With Gauss-Seidel iteration
I1 is calculated from ?I1,I2,I3,I4?
I2 is calculated from ?I1,I2,I3,I4?
I3 is calculated from ?I1,I2,I3,I4?
I4 is calculated from ?I1,I2,I3,I4?

21
Gauss-Seidel versus chaotic iteration

Observe that I4 might change if either I1 or I3
change, hence evaluate I4 after I1 and I3
stabilise
Suggests that wait until stability is achieved at
one level before starting on the next

I1
I2
I1
I4
I3
I4
I2, I3
22
Gauss-Seidel versus chaotic iteration

Chaotic iteration can postpone evaluating Ii for
bounded number of iterations
I1 is calculated from ?I1,-,-,-?
I2 and I3 are calculated Gauss-Seidel style
from ?I1,I2,I3,-?
I4 is calculated from ?I1,I2,I3,I4?
Fast and (incremental) fixpoint solvers TOPLAS
22(2)187-223,2000 apply chaotic iteration

23
Research challenge

Compiling to equations and iteration is
well-understood (albeit not well-known)
The implicit assumption is that source is
available
With the advent of component and multi-linguistic
programming, the problem is how to generate the
equations from
A specification of the algorithm or the API
The types of the algorithm or component
In the interim, environments with support for
modularity either
Equip the programmer with an equation language
Or make worst-case assumptions about behaviour

24
Suppose i was decremented rather than incremented
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?-?,0
i i -1 3 i?-?,-1 end
4 i??

I1 0,0
I2 (I1? I3) ? -?, 15
I3 n-1 n ? I2
I4 (I1? I3) ? 16, ?

25
Ascending chain condition

A domain D is ACC iff it does not contain an
infinite strictly increasing chain d1ltd2ltd3lt
where dltd iff d?d and d?d (see below)
The interval domain D is ordered by
? ? d forall d?D and
l1,u1 ? l2,u2 iff l2?l1?u1?u2
and is not ACC since 0,0lt-1,0lt-2,0lt

T
-4 3 2 1 0 1 2 3 4
?
26
Some very expressive relational domains are ACC

The sub-expression elimination relies on
detecting duplicated expression evaluation
Karr Acta Informatica, 6, 133-151 noticed that
detecting an invariance such as
y x/2 7 was key to this optimisation

begin x sin(a) 2 y sin(a)
7 end
27
The affine domain

The domain of affine equations over n variables
is
D ?A,B?A is m?n dimensional matrix and
B is m dimensional column vector
D is ordered by
?A1,B1???A2,B2? iff (if A1xB1 then A2xB2)

28
Pre-orders versus posets

A pre-order ?D, ?? is a set D ordered by a binary
relation ? such that
If d?d for all d?D
If d1?d2 and d2?d3 then d1?d3
A poset is pre-order ?D, ?? such that
If d1?d2 and d2?d3 then d1?d3

29
The affine domain is a pre-order (so it is not
ACC)

Observe ?A1,B1???A2,B2? but ?A2,B2???A1,B1?
A1 B1 A2 B2
To build a poset from a pre-order
define d?d iff d?d and d?d
define d? d?Dd?d and D? d?d?D
define d? ? d? iff d?d
The poset ?D?, ?? is ACC since chain length is
bounded by the number of variables n

30
Inducing termination for non-ACC (and huge ACC)
domains

Enforce convergence for intervals with a widening
operator ?D?D ? D
??d d
d?? d
l1,u1 ? l2,u2 if l2ltl1 then -? else l1,
if u1ltu2 then ? else u1
Examples
1,2?1,2 1,2
1,2?1,3 1,? but 1,3?1,2 1,3
Safe since li,ui?(l1,u1?l2,u2) for i?1,2

31
Chaotic iteration with widening

To terminate it is necessary to traverse each
loop a finite number of times
It is sufficient to pass through I2 or I3 a
finite number of times Bourdoncle, 1990
Thus widen at I3 since it is simpler

I1
I2
I3
I4
32
Termination for the decrement

I1 0,0
I2 (I1? I3) ? -?, 15
I3 I3?n-1 n ? I2 note the fix
I4 (I1? I3) ? 16, ?
When I2 -1,0 and I3 -1,0, then
I3?n1 n ? I2 -1,0 ? -2,-1 -?,0

33
Widening dynamic data-structures
cons
cons
cons
or
or
or
or
0
nil
cons
0
1
nil
begin i 0 p nil while (i
lt 16) do i i 1 p new
cons(i, p) 1p?cons(i, cons(0,nil))
end
cons
0
2
nil
1
or
or
0
nil
cons
0
1
nil
0
nil
34
Depth-2 versus type-graph widening
cons
cons
or
or
or
or
cons
0
2
nil
1
0
2
nil
1
any
any

Type-graph widening is more compact
Type-graph widening becomes difficult when a list
contains lists as its elements
In constraint-based analysis, widening is
dispensed with altogether

35
(Malicious) research challenge

Read a survey paper to find an abstract domain
that is ACC but has a maximal chain length of
O(2n)
Construct a program with O(n) symbols that
iterates through all O(2n) abstractions
Publish the program in IPL

36
Not all numeric domains are convex

A set S?Rn is convex iff for all x,y?S it follows
that ?x (1-?)y 0???1 ? S
The 2 leftmost sets in R2 are convex but the 2
rightmost sets are not.

37
Are intervals or affine equations convex?

Suppose the values of n variables are represented
by n intervals l1,u1,,ln,un
Suppose x?x1,,xn?, y?y1,,yn??Rn are described
by the intervals
Then each li?xi?ui and each li?yi?ui u
Let 0???1 and observe z ?x (1-?)y ??x1
(1-?)y1, , ?xn (1-?)yn?
Therefore li?min(xi, yi) ? ?xi (1-?)yi ?
max(xi, yi)?ui and convexity follows

38
Arithmetic congruences are not convex

Elements of the arithmetic congruence (AC) domain
take the form x 2y 1 (mod 3) which describes
integral values of x and y
More exactly, the AC domain consists of
conjunctions of equations of the form
c1x1cmxm (c mod n) where ci,c?Z and n?N
Incredibly AC is ACC IJCM, 30, 165--190, 1989

39
Research challenge

Søndergaard FSTTCS,95 introduced the concept of
an immediate fixpoint
Consider the following (groundness) dependency
equations over the domain of Boolean functions
?Bool, ?, ??
f1 x ? (y ? z)
f2 ?t(?x(?z(u ? (t?x) ? v ? (t?z) ? f4)))
f3 ?u (?v(x ? u ? z ? v ? f2))
f4 f1? f3
Where ?x(f) fx ?true?fx ?false thus ?x(x?y)
true and ?x(x?y) y

40
The alternative tactic

The standard tactic is to apply iteration
Søndergaard found that the system can be solved
symbolically (like a quadratic)
This would be very useful for infinite domains
for improved precision and predictability

41
Combining analyses

Verifiers and optimisers are often multi-pass,
built from several separate analyses
Should the analyses be performed in parallel or
in sequence?
Analyses can interact to improve one another
(problem is in the complexity of the interaction
Pratt)

42
Pruning combined domains

Suppose that ?1? D1?C and ?2?D2?C, then how is
DD1?D2 interpreted?
Then ?d1,d2??c iff d1?1c ? d2?2c
Ideally, many ?d1,d2??D will be redundant, that
is, ??c?C . c?1d1?c?2d2

43
Time versus precision from TOPLAS
17(1)28--44,1993
44
The Galois framework

Abstract interpretation is often presented in
terms of Galois connections

45
Lattices a prelude to Galois connections

Suppose ?S, ?? is a poset
A mapping ?S?S?S is a join (least upper bound)
iff
a?b is an upper bound of a and b, that is, a?a?b
and b?a?b for all a,b?S
a?b is the least upper bound, that is, if c?S is
an upper bound of a and b, then a?b?c
The definition of the meet ?S?S?S (the greatest
lower bound) is analogous

46
Complete lattices

A lattice ?S, ?, ?, ?? is a poset ?S, ?? equipped
with a join ? and a meet ?
The join concept can often be lifted to sets by
defining ??(S)?S iff
t?(?T) for all T?S and for all t?T
if t?s for all t?T then (?T)?s
If meet can often be lifted analogously, then the
lattice is complete
A lattice that contains a finite number of
elements is always complete

47
A lattice that is not complete

A hyperplane in 2-d space in a line and in 3-d
space is a plane
A hyperplane in Rn is any space that can be
defined by x?Rn c1x1cnxn c where
c1,,cn,c?R
A halfspace in Rn is any space that can be
defined by x?Rn c1x1cnxn ? c
A polyhedron is the intersection of a finite
number of half-spaces

48
Examples and non-examples in planar space
49
Join for polyhedra

Join of polyhedra P1 and P2 in Rn coincides (with
the topological closure) of the convex hull of
P1?P2

50
The join of an infinite set of polyhedra

Consider the following infinite chain of regular
polyhedra
The only space that contains all these polyhedra
is a circle yet this is not polyhedral

51
?A, ?, C, ?? is Galois connection whenever

?A, ?A? and ?C, ?C? are complete lattices
The mappings ?C?A and ?A?C are monotonic, that
is,
If c1 ?C c2 then ?(c1) ?A ?(c2)
If a1 ?A a2 then ?(a1) ?C ?(a2)
The compositions ???A?A and ???C?C are
extensive and reductive respectively, that is,
c ?C (???)(c) for all c?C
(???)(a) ?A a for all a?A

52
A classic Galois connection example

The concrete domain ?C,?C,?C,?C? is ??(Z),?,?,??
The abstract domain ?A,?A,?A,?A? where
A ?,,-,T
? ?A a ?AT for all a?A
join ?A and meet ?A are defined by

53
The relationship between A and C

The concretisation mapping ?A?C is defined
?(?) Ø
?() n?Z n gt 0
?(-) n?Z n lt 0
?(T) Z
The abstraction mapping ?C?A is defined
?(S) ? if S Ø
?(S) else if n gt 0 for all n?S
?(S) - else if n lt 0 for all n?S
?(S) Z otherwise

54
Avoiding repetition

Can define ? with ? and vice versa
?(S) ?Aa?A S ? ?(a)
And dually ?(a) ?S?Z ?(S) ?A a
As an example consider ?(1,2)
1,2 ? ?(T) ?
1,2 ? ?() ?
1,2 ? ?(-) ?
1,2 ? ?(?) ?
Therefore ?(1,2) ?A, T

55
Collecting domains and semantics

Observe that C is not that concrete programs
include operations such as Z?Z?Z
C?(Z) is collecting domain which is easier to
abstract than Z since it already a lattice
To abstract Z?Z?Z, say, we synthesise a
collecting version C?(Z)??(Z)??(Z) and then
abstract that
Put S1 C S2 n1n2 n1? S1 and n2 ? S2

56
Safety and optimality requirements

Safety requires ?(?(a1)C?(a2)) ?C a1 A a2 for
all a1,a2?A
Optimality POPL,269282,1979 also requires a1
A a2 ?C ?(?(a1)C?(a2))
Arguing optimality is harder than safety since
rare-case approximation can simplify a tricky
argument JLP

57
Abstract multiplication

Consider safety for ?(?()C?()) ?C A
Recall ?() n?Z n gt 0
Thus ?()C?() n1n2 n1n2 gt 0
Hence ?(?()C?()) ?C A
Need A ?C ?(?()C?()) for optimality
Recall ?(?()C?()) ?C A
Hence ?(?()C?()) ? ?,
But ?() ? Ø, thus ?()C?() ? Ø
Therefore ?(?()C?()) ? ?

58
Exotic applications of abstract interpretation

Recovering programmer intentions for
understanding undocumented or third-party code
Verifying that a buffer-over cannot occur, or
pin-pointing where one might occur in a C program
Inferring the environment in which is a system of
synchronising agents will not deadlock
Lower-bound time-complexity analysis for
granularity throttling
Binding-time analysis for inferring off-line
unfolding decisions which avoid code-bloat

59
Pointers to the literature

SAS, POPL, ESOP, ICLP, ICFP,
Useful review articles and books
Patrick and Radhia Cousot, Comparing the Galois
connection and Widening/Narrowing approaches to
Abstract Interpretation, PLILP, LNCS 631,
269-295, 1992. Available from LIX library.
Patrick and Radhia Cousot, Abstract
interpretation and Application to Logic Programs,
JLP, 13(2-3)103-179, 1992
Flemming Neilson, Hanne Riis Neilson and Chris
Hankin, Principles of Program Analysis, Springer,
1999.
Patrick has a database of abstract interpretation
researchers and regularly writes tutorials, see,
CC02.

60
Appendix SAT solving

SAT is not a form of abstract interpretation but
abstraction and abstract interpretation is often
used to reduce a verification problem to a
satisfiability checking problem
Acknowledgments much of this material is adapted
from the review article, The Quest for Efficient
Boolean Satisfiability Solvers by Zhang and
Malik, 2002.

61
The SAT problem

Given an arbitrary prepositional formula, f say,
does there exist a variable assignment (a model)
under which f evaluates to true
One model for f (x?y) is ?x?true, y?true
SAT is the stereotypic NP-complete problem but
this does not preclude the existence of efficient
SAT algorithms for certain SAT instances
Stålmarck US Patent N527689,1995 and
applications in AI planning, software
verification, circuit testing have promoted a
resurgence of interest in SAT

62
The other type of completeness

A SAT algorithm is said to be complete iff (given
enough resource) it will either
compute a satisfying variable assignment or
verify that no such assignment exists
A SAT algorithm is incomplete (stochastic) iff
unsatisfiability cannot always be detected
Trade incompleteness for speed when a solution is
very likely to exist (planning applications).
In program verification (partial) correctness
often follows by proving unsatisfiability

63
The Davis-Logemann-Loveland (DPLL) approach

1st generation solvers such as POSIT, 2cl, CSAT,
etc based on PDLL as are the 2nd generation
solvers such as SATO and zChaff which tune PDLL
Davis and Putman JACM,7201215,1960 proposed
resolution for Boolean SAT DLL
CACM,5394397,1962 replaced resolution with
search to improve memory usage (special case)
CNF used to simplify unsatisfiability checking
conversion is polynomial JSC,2,293304, 1986
CNF is a conjunction of clauses, for example,
(x?y) (x?y)?(y?x) (x??y)?(?x?y)

64
The Davis-Logemann-Loveland (PDLL) algorithm
bool function DPLL(f, ?) begin ?fail, ??
unit(f, ?) if (fail) return false if
(satisfied(f, ?)) return true else if
(unsatisfied(f, ?)) return false else
begin let x ? var(f)-var(?)
if (DPLL(f, ??x?true)) return
true else return DPLL(f,
??x?false) end end end

unit applies unit propagation, possibly detecting
unsatisfiability
satisfied returns true if one literal in each
clause is true
unsatisfied return false if there exists one
clause with every literal false
non-determinacy is in the choice of variable
stack for search

65
Unit propagation

Unit clause rule if all the literals but one are
false, then the remainder is set to true
Many SAT solvers use a counter scheme Crawford,
AAAI, 1993 that uses
One counter per clause to track the number of
false literals in each clause
If a count reaches the total number of literals,
then unsatisfiability has been detected
Otherwise if it one less then remaining literal
is set
Each assignment updates many counts and pointer
bases scheme are used within SATO and zChaff Gu
et al, DIMACS series DMTCS, 1997

66
Choices, choices

If variables remain uninstantiated after
propagation, then resort to random binding
Better to rank variables by the number of times
they occur in clauses which are not (yet) true
But a variable in 128 clauses each with 2
uninstantiated variables is a better candidate
than another in 128 clauses each with 32
uninstantiated variables
But what about the overhead of ranking especially
with learnt clauses
But what about trailing for backtracking
But what about intelligent back-jumping