Introduction to Abstract Interpretation - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Abstract Interpretation

Description:

Xmas Party. 2.00. Neil. SAT solving. 12.00. Free. Break. 10.45. Axel. Two Variable ... Computing Lab Xmas Party. Located in Origins the 'restaurant' in Darwin ... – PowerPoint PPT presentation

Number of Views:299
Avg rating:3.0/5.0
Slides: 67
Provided by: UlfNi1
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Abstract Interpretation


1
Introduction to Abstract Interpretation
  • Neil Kettle, Andy King and Axel Simon
  • a.m.king_at_kent.ac.uk
  • http//www.cs.kent.ac.uk/amk
  • Acknowledgments much of this material has been
    adapted from surveys by Patrick and Radia Cousot

2
Applications of abstract interpretation
  • Verification can a concurrent program deadlock?
    Is termination assured?
  • Parallelisation are two or more tasks
    independent? What is the worst/base-case running
    time of function?
  • Transformation can a definition be unfolded?
    Will unfolding terminate?
  • Implementation can an operation be specialised
    with knowledge of its (global) calling context?
  • Applications and players are incredibly diverse

3
House-keeping
4
Computing Lab Xmas Party
  • Located in Origins the restaurant in Darwin
  • A buffer lunch will be served courtesy of the
    department
  • Department will supply some wine (which last year
    lasted 10 minutes)
  • Bar will be open afterwards if some wine is not
    enough wine
  • Send an e-mail to Deborah Sowrey
    D.J.Sowery_at_kent.ac.uk if you want to attend
  • Come along and meet other post-grads

5
Casting out nines algorithm
  • Which of the following multiplications are
    correct
  • 2173 ? 38 81574 or
  • 2173 ? 38 82574
  • Casting out nines is a checking technique that is
    really a form of abstract interpretation
  • Sum the digits in the multiplicand n1, multiplier
    n2 and the product n to obtain s1, s2 and s.
  • Divide s1, s2 and s by 9 to compute the
    remainder, that is, r1 s1 mod 9, r2 s2 mod 9
    and r s mod 9.
  • If (r1 ? r2) mod 9 ? r then multiplication is
    incorrect
  • The algorithm returns incorrect or dont know

6
Running the numbers for 2173 ? 38 81574
  • Compute r1 (2173) mod 9
  • Compute r2 (38) mod 9
  • Calculate (r1 ? r2) mod 9
  • Calculate r (81574) mod 9
  • Check ((r1 ? r2) mod 9 r)
  • Deduce that 2173 ? 38 81574 is

7
Abstract interpretation is a theory of
relationships
  • The computational domain for multiplication
    (concrete domain)
  • N the set of non-negative integers
  • The computational domain of remainders used in
    the checking algorithm (abstract domain)
  • R 0, 1, , 8
  • Key question is what is the relationship between
    an element n?N which is used in the real
    algorithm and its analog r?R in the check

8
What is the relationship?
  • When multiplicand is n1 456, say, then the
    check uses r1 (456) mod 9 4
  • Observe that
  • 456 mod 9
  • (4100 56) mod 9
  • (490 410 56) mod 9
  • (410 56) mod 9
  • ((4 5)10 6) mod 9
  • ((4 5)9 (4 5) 6) mod 9
  • (4 5 6) mod 9
  • More generally, induction can show r1 n1 mod 9
    and r2 n2 mod 9

9
Correctness is the preservation of relationships
  • The check simulates the concrete multiplication
    and, in effect, is an abstract multiplication
  • Concrete multiplication is n n1 ? n2
  • Abstract multiplication is r (r1 ? r2) mod 9
  • Where r1 describes n1 and r2 describes n2
  • For brevity, write r ? n iff r n mod 9
  • Then abstract multiplication preserves ? iff
    whenever r1 ? n1 and r2 ? n2 it follows that r ? n

10
Correctness argument
  • Suppose r1 ? n1 and r2 ? n2
  • If
  • n n1 ? n2 then
  • n mod 9 (n1 ? n2) mod 9 hence
  • n mod 9 ((n1 mod 9) ? (n2 mod 9)) mod 9 whence
  • n mod 9 (r1 ? r2) mod 9 r therefore
  • r ? n
  • Consequently if ?(r ? n) then n ? n1 ? n2

11
Summary
  • Formalise the relationship between the data
  • Check that the relationship is preserved by the
    abstract analogues of the concrete operations
  • The relational framework Acta Informatica,
    30(2)103-129,1993 not only emphases the theory
    of relations but is very general

12
Numeric approximation and widening
  • Abstract interpretation does not require a domain
    to be finite

13
Interval approximation
  • Consider the following Pascal-like program
  • SYNTOX PLDI90 inferred the invariants scoped
    within
  • Invariants occur between consecutive lines in the
    program
  • i?0,15 asserts 0?i?15 whereas i?0,0 means i0

begin i 0 1 i?0,0 while (i
lt 16) do 2 i?0,15
i i 1 3 i?1,16 end
4 i?16,16
14
Compilation versus (classic) interpretation
  • Abstract compilation compile the concrete
    program into an abstract program (equation
    system) and execute the abstract program
  • good separation of concerns that aids debugging
  • the particulars of the domain can be exploited to
    reorder operations, specialise operations, etc
  • Abstract interpretation run the concrete
    program but on-the-fly interpret its concrete
    operations as abstract operations
  • ideal for a generic framework (toolkit) which is
    parameterised by abstract domain plugins

15
Abstract domain that is used in interval analysis
  • Domain of intervals includes
  • l,u where l ? u and l,u ? Z for bounded sets ie
    0, 5?0,1,4 since 0,1,4 ? 0, 5
  • ? to represent the empty set of numbers, that is,
    ? ? ?
  • l,? for sets which are bounded below such as
    l,l2,l4,
  • -?,u to represent sets which are bounded above
    such as ..,l-5,l-3,l

16
Weakening intervals
if then 1 i?0,2 else 2
i?3,5 endif 3 i?0,5
  • Join (path merge) is defined
  • Put d1?d2 d1 if d2 ?
  • d2 else if d1 ?
  • min(l1,l2), max(u1,u2)
    otherwise
  • whenever d1 l1,u1 and d2 l2,u2

17
Strengthening intervals
  • Meet is defined
  • Put d1?d2 ? if (d1 ?) ? (d2 ?)
  • max(l1,l2), min(u1,u2) otherwise
  • whenever d1 l1,u1 and d2 l2,u2

3 i?0,5 if (2 lt i) then 4 i?3,5
else 5 i?0,2
18
Meet and join are the basic primitives for
compilation
  • I1 0,0 since program point (1) immediately
    follows the i 0
  • I2 (I1? I3) ? -?, 15 since
  • control from program points (1) and (3) flow
    into (2)
  • point (2) is reached only if i lt 16 holds
  • I3 n1 n ? I2 since (3) is only reachable
    from (2) via the increment
  • I4 (I1? I3) ? 16, ? since
  • control from (1) and (3) flow into (4)
  • point (4) is reached only if ?(i lt 16) holds

19
Interval iteration
20
Jacobi versus Gauss-Seidel iteration
  • With Jacobi, the new vector ?I1,I2,I3,I4? of
    intervals is calculated from the old
    ?I1,I2,I3,I4?
  • With Gauss-Seidel iteration
  • I1 is calculated from ?I1,I2,I3,I4?
  • I2 is calculated from ?I1,I2,I3,I4?
  • I3 is calculated from ?I1,I2,I3,I4?
  • I4 is calculated from ?I1,I2,I3,I4?

21
Gauss-Seidel versus chaotic iteration
  • Observe that I4 might change if either I1 or I3
    change, hence evaluate I4 after I1 and I3
    stabilise
  • Suggests that wait until stability is achieved at
    one level before starting on the next

I1
I2
I1
I4
I3
I4
I2, I3
22
Gauss-Seidel versus chaotic iteration
  • Chaotic iteration can postpone evaluating Ii for
    bounded number of iterations
  • I1 is calculated from ?I1,-,-,-?
  • I2 and I3 are calculated Gauss-Seidel style
    from ?I1,I2,I3,-?
  • I4 is calculated from ?I1,I2,I3,I4?
  • Fast and (incremental) fixpoint solvers TOPLAS
    22(2)187-223,2000 apply chaotic iteration

23
Research challenge
  • Compiling to equations and iteration is
    well-understood (albeit not well-known)
  • The implicit assumption is that source is
    available
  • With the advent of component and multi-linguistic
    programming, the problem is how to generate the
    equations from
  • A specification of the algorithm or the API
  • The types of the algorithm or component
  • In the interim, environments with support for
    modularity either
  • Equip the programmer with an equation language
  • Or make worst-case assumptions about behaviour

24
Suppose i was decremented rather than incremented
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?-?,0
i i -1 3 i?-?,-1 end
4 i??
  • I1 0,0
  • I2 (I1? I3) ? -?, 15
  • I3 n-1 n ? I2
  • I4 (I1? I3) ? 16, ?

25
Ascending chain condition
  • A domain D is ACC iff it does not contain an
    infinite strictly increasing chain d1ltd2ltd3lt
    where dltd iff d?d and d?d (see below)
  • The interval domain D is ordered by
  • ? ? d forall d?D and
  • l1,u1 ? l2,u2 iff l2?l1?u1?u2
  • and is not ACC since 0,0lt-1,0lt-2,0lt

T
-4 3 2 1 0 1 2 3 4
?
26
Some very expressive relational domains are ACC
  • The sub-expression elimination relies on
    detecting duplicated expression evaluation
  • Karr Acta Informatica, 6, 133-151 noticed that
    detecting an invariance such as
  • y x/2 7 was key to this optimisation

begin x sin(a) 2 y sin(a)
7 end
27
The affine domain
  • The domain of affine equations over n variables
    is
  • D ?A,B?A is m?n dimensional matrix and
  • B is m dimensional column vector
  • D is ordered by
  • ?A1,B1???A2,B2? iff (if A1xB1 then A2xB2)

28
Pre-orders versus posets
  • A pre-order ?D, ?? is a set D ordered by a binary
    relation ? such that
  • If d?d for all d?D
  • If d1?d2 and d2?d3 then d1?d3
  • A poset is pre-order ?D, ?? such that
  • If d1?d2 and d2?d3 then d1?d3

29
The affine domain is a pre-order (so it is not
ACC)
  • Observe ?A1,B1???A2,B2? but ?A2,B2???A1,B1?
  • A1 B1 A2 B2
  • To build a poset from a pre-order
  • define d?d iff d?d and d?d
  • define d? d?Dd?d and D? d?d?D
  • define d? ? d? iff d?d
  • The poset ?D?, ?? is ACC since chain length is
    bounded by the number of variables n

30
Inducing termination for non-ACC (and huge ACC)
domains
  • Enforce convergence for intervals with a widening
    operator ?D?D ? D
  • ??d d
  • d?? d
  • l1,u1 ? l2,u2 if l2ltl1 then -? else l1,
  • if u1ltu2 then ? else u1
  • Examples
  • 1,2?1,2 1,2
  • 1,2?1,3 1,? but 1,3?1,2 1,3
  • Safe since li,ui?(l1,u1?l2,u2) for i?1,2

31
Chaotic iteration with widening
  • To terminate it is necessary to traverse each
    loop a finite number of times
  • It is sufficient to pass through I2 or I3 a
    finite number of times Bourdoncle, 1990
  • Thus widen at I3 since it is simpler

I1
I2
I3
I4
32
Termination for the decrement
  • I1 0,0
  • I2 (I1? I3) ? -?, 15
  • I3 I3?n-1 n ? I2 note the fix
  • I4 (I1? I3) ? 16, ?
  • When I2 -1,0 and I3 -1,0, then
  • I3?n1 n ? I2 -1,0 ? -2,-1 -?,0

33
Widening dynamic data-structures
cons
cons
cons
or
or
or
or
0
nil
cons
0
1
nil
begin i 0 p nil while (i
lt 16) do i i 1 p new
cons(i, p) 1p?cons(i, cons(0,nil))
end
cons
0
2
nil
1
or
or
0
nil
cons
0
1
nil
0
nil
34
Depth-2 versus type-graph widening
cons
cons
or
or
or
or
cons
0
2
nil
1
0
2
nil
1
any
any
  • Type-graph widening is more compact
  • Type-graph widening becomes difficult when a list
    contains lists as its elements
  • In constraint-based analysis, widening is
    dispensed with altogether

35
(Malicious) research challenge
  • Read a survey paper to find an abstract domain
    that is ACC but has a maximal chain length of
    O(2n)
  • Construct a program with O(n) symbols that
    iterates through all O(2n) abstractions
  • Publish the program in IPL

36
Not all numeric domains are convex
  • A set S?Rn is convex iff for all x,y?S it follows
    that ?x (1-?)y 0???1 ? S
  • The 2 leftmost sets in R2 are convex but the 2
    rightmost sets are not.

37
Are intervals or affine equations convex?
  • Suppose the values of n variables are represented
    by n intervals l1,u1,,ln,un
  • Suppose x?x1,,xn?, y?y1,,yn??Rn are described
    by the intervals
  • Then each li?xi?ui and each li?yi?ui u
  • Let 0???1 and observe z ?x (1-?)y ??x1
    (1-?)y1, , ?xn (1-?)yn?
  • Therefore li?min(xi, yi) ? ?xi (1-?)yi ?
    max(xi, yi)?ui and convexity follows

38
Arithmetic congruences are not convex
  • Elements of the arithmetic congruence (AC) domain
    take the form x 2y 1 (mod 3) which describes
    integral values of x and y
  • More exactly, the AC domain consists of
    conjunctions of equations of the form
  • c1x1cmxm (c mod n) where ci,c?Z and n?N
  • Incredibly AC is ACC IJCM, 30, 165--190, 1989

39
Research challenge
  • Søndergaard FSTTCS,95 introduced the concept of
    an immediate fixpoint
  • Consider the following (groundness) dependency
    equations over the domain of Boolean functions
    ?Bool, ?, ??
  • f1 x ? (y ? z)
  • f2 ?t(?x(?z(u ? (t?x) ? v ? (t?z) ? f4)))
  • f3 ?u (?v(x ? u ? z ? v ? f2))
  • f4 f1? f3
  • Where ?x(f) fx ?true?fx ?false thus ?x(x?y)
    true and ?x(x?y) y

40
The alternative tactic
  • The standard tactic is to apply iteration
  • Søndergaard found that the system can be solved
    symbolically (like a quadratic)
  • This would be very useful for infinite domains
    for improved precision and predictability

41
Combining analyses
  • Verifiers and optimisers are often multi-pass,
    built from several separate analyses
  • Should the analyses be performed in parallel or
    in sequence?
  • Analyses can interact to improve one another
    (problem is in the complexity of the interaction
    Pratt)

42
Pruning combined domains
  • Suppose that ?1? D1?C and ?2?D2?C, then how is
    DD1?D2 interpreted?
  • Then ?d1,d2??c iff d1?1c ? d2?2c
  • Ideally, many ?d1,d2??D will be redundant, that
    is, ??c?C . c?1d1?c?2d2

43
Time versus precision from TOPLAS
17(1)28--44,1993
44
The Galois framework
  • Abstract interpretation is often presented in
    terms of Galois connections

45
Lattices a prelude to Galois connections
  • Suppose ?S, ?? is a poset
  • A mapping ?S?S?S is a join (least upper bound)
    iff
  • a?b is an upper bound of a and b, that is, a?a?b
    and b?a?b for all a,b?S
  • a?b is the least upper bound, that is, if c?S is
    an upper bound of a and b, then a?b?c
  • The definition of the meet ?S?S?S (the greatest
    lower bound) is analogous

46
Complete lattices
  • A lattice ?S, ?, ?, ?? is a poset ?S, ?? equipped
    with a join ? and a meet ?
  • The join concept can often be lifted to sets by
    defining ??(S)?S iff
  • t?(?T) for all T?S and for all t?T
  • if t?s for all t?T then (?T)?s
  • If meet can often be lifted analogously, then the
    lattice is complete
  • A lattice that contains a finite number of
    elements is always complete

47
A lattice that is not complete
  • A hyperplane in 2-d space in a line and in 3-d
    space is a plane
  • A hyperplane in Rn is any space that can be
    defined by x?Rn c1x1cnxn c where
    c1,,cn,c?R
  • A halfspace in Rn is any space that can be
    defined by x?Rn c1x1cnxn ? c
  • A polyhedron is the intersection of a finite
    number of half-spaces

48
Examples and non-examples in planar space
49
Join for polyhedra
  • Join of polyhedra P1 and P2 in Rn coincides (with
    the topological closure) of the convex hull of
    P1?P2

50
The join of an infinite set of polyhedra
  • Consider the following infinite chain of regular
    polyhedra
  • The only space that contains all these polyhedra
    is a circle yet this is not polyhedral

51
?A, ?, C, ?? is Galois connection whenever
  • ?A, ?A? and ?C, ?C? are complete lattices
  • The mappings ?C?A and ?A?C are monotonic, that
    is,
  • If c1 ?C c2 then ?(c1) ?A ?(c2)
  • If a1 ?A a2 then ?(a1) ?C ?(a2)
  • The compositions ???A?A and ???C?C are
    extensive and reductive respectively, that is,
  • c ?C (???)(c) for all c?C
  • (???)(a) ?A a for all a?A

52
A classic Galois connection example
  • The concrete domain ?C,?C,?C,?C? is ??(Z),?,?,??
  • The abstract domain ?A,?A,?A,?A? where
  • A ?,,-,T
  • ? ?A a ?AT for all a?A
  • join ?A and meet ?A are defined by

53
The relationship between A and C
  • The concretisation mapping ?A?C is defined
  • ?(?) Ø
  • ?() n?Z n gt 0
  • ?(-) n?Z n lt 0
  • ?(T) Z
  • The abstraction mapping ?C?A is defined
  • ?(S) ? if S Ø
  • ?(S) else if n gt 0 for all n?S
  • ?(S) - else if n lt 0 for all n?S
  • ?(S) Z otherwise

54
Avoiding repetition
  • Can define ? with ? and vice versa
  • ?(S) ?Aa?A S ? ?(a)
  • And dually ?(a) ?S?Z ?(S) ?A a
  • As an example consider ?(1,2)
  • 1,2 ? ?(T) ?
  • 1,2 ? ?() ?
  • 1,2 ? ?(-) ?
  • 1,2 ? ?(?) ?
  • Therefore ?(1,2) ?A, T

55
Collecting domains and semantics
  • Observe that C is not that concrete programs
    include operations such as Z?Z?Z
  • C?(Z) is collecting domain which is easier to
    abstract than Z since it already a lattice
  • To abstract Z?Z?Z, say, we synthesise a
    collecting version C?(Z)??(Z)??(Z) and then
    abstract that
  • Put S1 C S2 n1n2 n1? S1 and n2 ? S2

56
Safety and optimality requirements
  • Safety requires ?(?(a1)C?(a2)) ?C a1 A a2 for
    all a1,a2?A
  • Optimality POPL,269282,1979 also requires a1
    A a2 ?C ?(?(a1)C?(a2))
  • Arguing optimality is harder than safety since
    rare-case approximation can simplify a tricky
    argument JLP

57
Abstract multiplication
  • Consider safety for ?(?()C?()) ?C A
  • Recall ?() n?Z n gt 0
  • Thus ?()C?() n1n2 n1n2 gt 0
  • Hence ?(?()C?()) ?C A
  • Need A ?C ?(?()C?()) for optimality
  • Recall ?(?()C?()) ?C A
  • Hence ?(?()C?()) ? ?,
  • But ?() ? Ø, thus ?()C?() ? Ø
  • Therefore ?(?()C?()) ? ?

58
Exotic applications of abstract interpretation
  • Recovering programmer intentions for
    understanding undocumented or third-party code
  • Verifying that a buffer-over cannot occur, or
    pin-pointing where one might occur in a C program
  • Inferring the environment in which is a system of
    synchronising agents will not deadlock
  • Lower-bound time-complexity analysis for
    granularity throttling
  • Binding-time analysis for inferring off-line
    unfolding decisions which avoid code-bloat

59
Pointers to the literature
  • SAS, POPL, ESOP, ICLP, ICFP,
  • Useful review articles and books
  • Patrick and Radhia Cousot, Comparing the Galois
    connection and Widening/Narrowing approaches to
    Abstract Interpretation, PLILP, LNCS 631,
    269-295, 1992. Available from LIX library.
  • Patrick and Radhia Cousot, Abstract
    interpretation and Application to Logic Programs,
    JLP, 13(2-3)103-179, 1992
  • Flemming Neilson, Hanne Riis Neilson and Chris
    Hankin, Principles of Program Analysis, Springer,
    1999.
  • Patrick has a database of abstract interpretation
    researchers and regularly writes tutorials, see,
    CC02.

60
Appendix SAT solving
  • SAT is not a form of abstract interpretation but
    abstraction and abstract interpretation is often
    used to reduce a verification problem to a
    satisfiability checking problem
  • Acknowledgments much of this material is adapted
    from the review article, The Quest for Efficient
    Boolean Satisfiability Solvers by Zhang and
    Malik, 2002.

61
The SAT problem
  • Given an arbitrary prepositional formula, f say,
    does there exist a variable assignment (a model)
    under which f evaluates to true
  • One model for f (x?y) is ?x?true, y?true
  • SAT is the stereotypic NP-complete problem but
    this does not preclude the existence of efficient
    SAT algorithms for certain SAT instances
  • Stålmarck US Patent N527689,1995 and
    applications in AI planning, software
    verification, circuit testing have promoted a
    resurgence of interest in SAT

62
The other type of completeness
  • A SAT algorithm is said to be complete iff (given
    enough resource) it will either
  • compute a satisfying variable assignment or
  • verify that no such assignment exists
  • A SAT algorithm is incomplete (stochastic) iff
    unsatisfiability cannot always be detected
  • Trade incompleteness for speed when a solution is
    very likely to exist (planning applications).
  • In program verification (partial) correctness
    often follows by proving unsatisfiability

63
The Davis-Logemann-Loveland (DPLL) approach
  • 1st generation solvers such as POSIT, 2cl, CSAT,
    etc based on PDLL as are the 2nd generation
    solvers such as SATO and zChaff which tune PDLL
  • Davis and Putman JACM,7201215,1960 proposed
    resolution for Boolean SAT DLL
    CACM,5394397,1962 replaced resolution with
    search to improve memory usage (special case)
  • CNF used to simplify unsatisfiability checking
    conversion is polynomial JSC,2,293304, 1986
  • CNF is a conjunction of clauses, for example,
    (x?y) (x?y)?(y?x) (x??y)?(?x?y)

64
The Davis-Logemann-Loveland (PDLL) algorithm
bool function DPLL(f, ?) begin ?fail, ??
unit(f, ?) if (fail) return false if
(satisfied(f, ?)) return true else if
(unsatisfied(f, ?)) return false else
begin let x ? var(f)-var(?)
if (DPLL(f, ??x?true)) return
true else return DPLL(f,
??x?false) end end end
  • unit applies unit propagation, possibly detecting
    unsatisfiability
  • satisfied returns true if one literal in each
    clause is true
  • unsatisfied return false if there exists one
    clause with every literal false
  • non-determinacy is in the choice of variable
  • stack for search

65
Unit propagation
  • Unit clause rule if all the literals but one are
    false, then the remainder is set to true
  • Many SAT solvers use a counter scheme Crawford,
    AAAI, 1993 that uses
  • One counter per clause to track the number of
    false literals in each clause
  • If a count reaches the total number of literals,
    then unsatisfiability has been detected
  • Otherwise if it one less then remaining literal
    is set
  • Each assignment updates many counts and pointer
    bases scheme are used within SATO and zChaff Gu
    et al, DIMACS series DMTCS, 1997

66
Choices, choices
  • If variables remain uninstantiated after
    propagation, then resort to random binding
  • Better to rank variables by the number of times
    they occur in clauses which are not (yet) true
  • But a variable in 128 clauses each with 2
    uninstantiated variables is a better candidate
    than another in 128 clauses each with 32
    uninstantiated variables
  • But what about the overhead of ranking especially
    with learnt clauses
  • But what about trailing for backtracking
  • But what about intelligent back-jumping
Write a Comment
User Comments (0)
About PowerShow.com