Modeling Data in Formal Verification Bits, Bit Vectors, or Words - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling Data in Formal Verification Bits, Bit Vectors, or Words

Description:

Basis for most CAD, model checking. Words: View each word as arbitrary value ... Historic method for most CAD, testing, and verification tools. E.g., model checkers ... – PowerPoint PPT presentation

Number of Views:1269
Avg rating:3.0/5.0
Slides: 58
Provided by: RandalE9
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Modeling Data in Formal Verification Bits, Bit Vectors, or Words


1
Modeling Data in Formal VerificationBits, Bit
Vectors, or Words
Randal E. Bryant Carnegie Mellon University
http//www.cs.cmu.edu/bryant
2
Overview
  • Issue
  • How should data be modeled in formal analysis?
  • Verification, test generation, security analysis,
  • Approaches
  • Bits Every bit is represented individually
  • Basis for most CAD, model checking
  • Words View each word as arbitrary value
  • E.g., unbounded integers
  • Historic program verification work
  • Bit Vectors Finite precision words
  • Captures true semantics of hardware and software
  • More opportunities for abstraction than with bits

3
Bit-Level Modeling
Control Logic
  • Represent Every Bit of State Individually
  • Behavior expressed as Boolean next-state over
    current state
  • Historic method for most CAD, testing, and
    verification tools
  • E.g., model checkers

4
Bit-Level Modeling in Practice
  • Strengths
  • Allows precise modeling of system
  • Well developed technology
  • BDDs SAT for Boolean reasoning
  • Limitations
  • Every state bit introduces two Boolean variables
  • Current state next state
  • Overly detailed modeling of system functions
  • Dont want to capture full details of FPU
  • Making It Work
  • Use extensive abstraction to reduce bit count
  • Hard to abstract functionality

5
Word-Level Abstraction 1 Bits ? Integers
x0
x1
x2
xn-1
  • View Data as Symbolic Words
  • Arbitrary integers
  • No assumptions about size or encoding
  • Classic model for reasoning about software
  • Can store in memories registers

6
Abstracting Data Bits
Control Logic
7
Word-Level Abstraction 2 Uninterpreted
Functions
f
  • For any Block that Transforms or Evaluates Data
  • Replace with generic, unspecified function
  • Only assumed property is functional consistency
  • a x ? b y ? f (a, b) f (x, y)

8
Abstracting Functions
Control Logic
Data Path
Com. Log. 1
Com. Log. 1
  • For Any Block that Transforms Data
  • Replace by uninterpreted function
  • Ignore detailed functionality
  • Conservative approximation of actual system

9
Word-Level Modeling History
  • Historic
  • Used by theorem provers
  • More Recently
  • Burch Dill, CAV 94
  • Verify that pipelined processor has same behavior
    as unpipelined reference model
  • Use word-level abstractions of data paths and
    memories
  • Use decision procedure to determine equivalence
  • Bryant, Lahiri, Seshia, CAV 02
  • UCLID verifier
  • Tool for describing verifying systems at word
    level

10
Pipeline Verification Example
Pipelined Processor
Reference Model
11
Abstracted Pipeline Verification
Pipelined Processor
Reference Model
12
Experience with Word-Level Modeling
  • Powerful Abstraction Tool
  • Allows focus on control of large-scale system
  • Can model systems with very large memories
  • Hard to Generate Abstract Model
  • Hand-generated how to validate?
  • Automatic abstraction limited success
  • Andraus Sakallah, DAC 2004
  • Realistic Features Break Abstraction
  • E.g., Set ALU function to A0 to pass operand to
    output
  • Desire
  • Should be able to mix detailed bit-level
    representation with abstracted word-level
    representation

13
Bit Vectors Motivating Example 1
int abs(int x) int mask xgtgt31 return (x
mask) mask 1
int test_abs(int x) return (x lt 0) ? -x x
  • Do these functions produce identical results?
  • Strategy
  • Represent and reason about bit-level program
    behavior
  • Specific to machine word size, integer
    representations, and operations

14
Motivating Example 2
void fun() char fmt16 fgets(fmt, 16,
stdin) fmt15 '\0' printf(fmt)
  • Is there an input string that causes value 234 to
    be written to address a4a3a2a1?
  • Answer
  • Yes "a1a2a3a4230gn"
  • Depends on details of compilation
  • But no exploit for buffer size less than 8
  • Ganapathy, Seshia, Jha, Reps, Bryant, ICSE 05

15
Motivating Example 3
bitW popSpec(bitW x) int cnt 0 for
(int i0 iltW i) if (xi) cnt
return cnt
bitW popSketch(bitW x) loop (??) x
(x??) ((xgtgt??)??) return x
  • Is there a way to expand the program sketch to
    make it match the spec?
  • Answer
  • W16
  • Solar-Lezama, et al., ASPLOS 06

x (x0x5555) ((xgtgt1)0x5555) x
(x0x3333) ((xgtgt2)0x3333) x (x0x0077)
((xgtgt8)0x0077) x (x0x000f)
((xgtgt4)0x000f)
16
Motivating Example 4
Sequential Reference Model
Pipelined Microprocessor
  • Is pipelined microprocessor identical to
    sequential reference model?
  • Strategy
  • Represent machine instructions, data, and state
    as bit vectors
  • Compatible with hardware description language
    representation
  • Verifier finds abstractions automatically

17
Bit Vector Formulas
  • Fixed width data words
  • Arithmetic operations
  • Add/subtract/multiply/divide,
  • Twos complement, unsigned,
  • Bit-wise logical operations
  • Bitwise and/or/xor, shift/extract, concatenate
  • Predicates
  • , lt
  • Task
  • Is formula satisfiable?
  • E.g., a gt 0 aa lt 0

50000 50000 -1794967296 (on 32-bit machine)
18
Decision Procedures
  • Core technology for formal reasoning
  • Boolean SAT
  • Pure Boolean formula
  • SAT Modulo Theories (SMT)
  • Support additional logic fragments
  • Example theories
  • Linear arithmetic over reals or integers
  • Functions with equality
  • Bit vectors
  • Combinations of theories

19
Recent Progress in SAT Solving
20
BV Decision ProceduresSome History
  • B.C. (Before Chaff)
  • String operations (concatenate, field extraction)
  • Linear arithmetic with bounds checking
  • Modular arithmetic
  • Limitations
  • Cannot handle full range of bit-vector operations

21
BV Decision ProceduresUsing SAT
  • SAT-Based Bit Blasting
  • Generate Boolean circuit based on bit-level
    behavior of operations
  • Convert to Conjunctive Normal Form (CNF) and
    check with best available SAT checker
  • Handles arbitrary operations
  • Effective in Many Applications
  • CBMC Clarke, Kroening, Lerda, TACAS 04
  • Microsoft Cogent SLAM Cook, Kroening,
    Sharygina, CAV 05
  • CVC-Lite Dill, Barrett, Ganesh, Yices deMoura,
    et al

22
Bit-Vector Challenge
  • Is there a better way than bit blasting?
  • Requirements
  • Provide same functionality as with bit blasting
  • Find abstractions based on word-level structure
  • Improve on performance of bit blasting
  • Observation
  • Must have bit blasting at core
  • Only approach that covers full functionality
  • Want to exploit special cases
  • Formula satisfied by small values
  • Simple algebraic properties imply
    unsatisfiability
  • Small unsatisfiable core
  • Solvable by modular arithmetic

23
Some Recent Ideas
  • Iterative Approximation
  • UCLID Bryant, Kroening, Ouaknine, Seshia,
    Strichman, Brady, TACAS 07
  • Use bit blasting as core technique
  • Apply to simplified versions of formula
  • Successive approximations until solve or show
    unsatisfiable
  • Using Modular Arithmetic
  • STP Ganesh Dill, CAV 07
  • Algebraic techniques to solve special case forms
  • Layered Approach
  • MathSat Bruttomesso, Cimatti, Franzen, Griggio,
    Hanna, Nadel, Palti, Sebastiani, CAV 07
  • Use successively more detailed solvers

24
Iterative Approach Background Approximating
Formula
?
Original Formula
  • Example Approximation Techniques
  • Underapproximating
  • Restrict word-level variables to smaller ranges
    of values
  • Overapproximating
  • Replace subformula with Boolean variable

25
Starting Iterations
?
?1-
  • Initial Underapproximation
  • (Greatly) restrict ranges of word-level variables
  • Intuition Satisfiable formula often has
    small-domain solution

26
First Half of Iteration
?
?1-
  • SAT Result for ?1-
  • Satisfiable
  • Then have found solution for ?
  • Unsatisfiable
  • Use UNSAT proof to generate overapproximation ?1
  • (Described later)

27
Second Half of Iteration
?1
?
?1-
  • SAT Result for ?1
  • Unsatisfiable
  • Then have shown ? unsatisfiable
  • Satisfiable
  • Solution indicates variable ranges that must be
    expanded
  • Generate refined underapproximation

28
Iterative Behavior
?2
?1
  • Underapproximations
  • Successively more precise abstractions of ?
  • Allow wider variable ranges
  • Overapproximations
  • No predictable relation
  • UNSAT proof not unique

? ? ?
?k
?
?k-
? ? ?
?2-
?1-
29
Overall Effect
  • Soundness
  • Only terminate with solution on
    underapproximation
  • Only terminate as UNSAT on overapproximation
  • Completeness
  • Successive underapproximations approach ?
  • Finite variable ranges guarantee termination
  • In worst case, get ?k- ? ?

30
Generating Overapproximation
  • Given
  • Underapproximation ?1-
  • Bit-blasted translation of ?1- into Boolean
    formula
  • Proof that Boolean formula unsatisfiable
  • Generate
  • Overapproximation ?1
  • If ?1 satisfiable, must lead to refined
    underapproximation
  • Generate ?2- such that
  • ?1- ? ?2- ? ?

31
Bit-Vector Formula Structure
  • DAG representation to allow shared subformulas

?
32
Structure of Underapproximation
?-
  • Linear complexity translation to CNF
  • Each word-level variable encoded as set of
    Boolean variables
  • Additional Boolean variables represent subformula
    values

33
Encoding Range Constraints
  • Explicit
  • View as additional predicates in formula
  • Implicit
  • Reduce number of variables in encoding
  • Constraint Encoding
  • 0 ? w ? 8 0 0 0 0 w2w1w0
  • -4 ? x ? 4 xsxsxs xsxsx1x0
  • Yields smaller SAT encodings

0 ? w ? 8 ? -4 ? x ? 4
34
UNSAT Proof
  • Subset of clauses that is unsatisfiable
  • Clause variables define portion of DAG
  • Subgraph that cannot be satisfied with given
    range constraints

x y
x 2 z ? 1
a
Æ
w 0xFFFF x
Ç
x 26 v
35
Extracting Circuit from UNSAT Proof
  • Subgraph that cannot be satisfied with given
    range constraints
  • Even when replace rest of graph with
    unconstrained variables

x y
x 2 z ? 1
UNSAT
a
Æ
b1
b2
36
Generated Overapproximation
  • Remove range constraints on word-level variables
  • Creates overapproximation
  • Ignores correlations between values of subformulas

x y
x 2 z ? 1
a
?1
Æ
b1
b2
37
Refinement Property
  • Claim
  • ?1 has no solutions that satisfy ?1-s range
    constraints
  • Because ?1 contains portion of ?1- that was
    shown to be unsatisfiable under range constraints

x y
x 2 z ? 1
UNSAT
a
Æ
?1
b1
b2
38
Refinement Property (Cont.)
  • Consequence
  • Solving ?1 will expand range of some variables
  • Leading to more exact underapproximation ?2-

x y
x 2 z ? 1
a
?1
Æ
b1
b2
39
Effect of Iteration
?1
UNSAT proof generate overapproximation
?
?1-
  • Each Complete Iteration
  • Expands ranges of some word-level variables
  • Creates refined underapproximation

40
Approximation Methods
  • So Far
  • Range constraints
  • Underapproximate by constraining values of
    word-level variables
  • Subformula elimination
  • Overapproximate by assuming subformula value
    arbitrary
  • General Requirements
  • Systematic under- and over-approximations
  • Way to connect from one to another
  • Goal Devise Additional Approximation Strategies

41
Function Approximation Example
x x x
0 1 else
y 0 0 0 0
y 1 0 1 x
y else 0 y
  • Motivation
  • Multiplication (and division) are difficult cases
    for SAT
  • Prohibit Via Additional Range Constraints
  • Gives underapproximation
  • Restricts values of (possibly intermediate) terms
  • Abstract as f (x,y)
  • Overapproximate as uninterpreted function f
  • Value constrained only by functional consistency

42
Results UCLID BV vs. Bit-blasting
results on 2.8 GHz Xeon, 2 GB RAM
  • UCLID always better than bit blasting
  • Generally better than other available procedures
  • SAT time is the dominating factor

43
Challenges with Iterative Approximation
  • Formulating Overall Strategy
  • Which abstractions to apply, when and where
  • How quickly to relax constraints in iterations
  • Which variables to expand and by how much?
  • Too conservative Each call to SAT solver incurs
    cost
  • Too lenient Devolves to complete bit blasting.
  • Predicting SAT Solver Performance
  • Hard to predict time required by call to SAT
    solver
  • Will particular abstraction simplify or
    complicate SAT?
  • Combination Especially Difficult
  • Multiple iterations with unpredictable inner loop

44
STP Linear Equation Solving
  • Ganesh Dill, CAV 07
  • Solve linear equations over integers mod 2w
  • Capture range of solutions with Boolean Variables
  • Example Problem
  • Variables 3-bit unsigned integers
  • x x2 x1 x0 y y2 y1 y0 z z2 z1 z0
  • Linear equations conjunction of linear
    constraints
  • General Form
  • A x b mod 2w

3x 4y 2z 0 mod 8
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
45
Solution Method
  • Equations
  • Some Number Theory
  • Odd number has multiplicative inverse mod 2w
  • Mod 8 3-1 3
  • Additive inverse mod 2w -x 2w - x
  • Mod 8 -4 4 -2 6
  • Solve first equation for x

3x 4y 2z 0 mod 8
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
33x 34y 36z mod 8
x 4y 2z mod 8
46
Solution Method (cont.)
  • Substitutions

2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
x 4y 2z mod 8
2(4y2z) 2y 2 0 mod 8
2(4y2z) 4y 2z 0 mod 8
2y 4z 2 0 mod 8
4y 6z 0 mod 8
47
What if All Coefficients Even?
  • Result of Substitutions
  • Even numbers do not have multiplicative inverses
    mod 8
  • Observation
  • Can divide through and reduce modulus

2y 4z 2 0 mod 8
4y 6z 0 mod 8
y 2z 1 0 mod 4
2y 3z 0 mod 4
y 2z 3 mod 4
z 2 mod 4
y 3 mod 4
48
General Solutions
  • Original variables 3-bit unsigned integers
  • x x2 x1 x0 y y2 y1 y0 z z2 z1 z0
  • Solutions
  • Constrained variables
  • y y2 1 1 z z2 1 0
  • Back Substitution
  • Constrained variables
  • x 0 0 0

y 3 mod 4
z 2 mod 4
x 4y 6z mod 8
x 0 mod 8
49
Linear Equation Solutions
  • Equations
  • Encoding All Possible Solutions
  • x 0 0 0 y y2 1 1 z z2 1 0
  • y2, z2 arbitrary Boolean variables
  • 4 possible solutions (out of original 512)
  • General Principle
  • Form of LU decomposition
  • Polynomial time algorithm
  • Boolean variables in solution to express set of
    solutions
  • Only works when have conjunction of linear
    constraints

3x 4y 2z 0 mod 8
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
50
Layered Solver
  • Bruttomesso, et al, CAV 07
  • Part of MathSAT project
  • DPLL(T) Framework
  • SAT solver coupled with solver for mathematical
    theory T
  • BV theory solver works with conjunctions of
    constraints

51
DPLL(T) Formula Structure
Boolean Structure
Atoms
?
  • Atoms
  • Predicates applied to bit-vector expressions
  • Boolean Variables

52
DPLL(T) Operation
  • Actions
  • DPLL engine satisfies Boolean portion
  • Theory solver determines whether resulting
    conjunction of atoms satifiable

?
?
?
?
x 2 z gt 1
x 26 v
w 0xFFFF ? x
x y
x ltlt 1 gt y
  • Solver provides information to DPLL engine to aid
    search
  • Nonchronological backtracking
  • Conflict clause generation
  • Successful approach for other decision procedures

53
MathSAT Layers
  • Uses increasingly detailed solver layers
  • Only progress if cant find conflict using more
    abstract rules
  • Layers
  • Equality with uninterpreted functions
  • Treats all bit-level functions and operators as
    uninterpreted
  • Simple handling of concatenations, extractions,
    and transitivity
  • Full solver using linear arithmetic SAT

54
Summary Modeling Levels
  • Bits
  • Limited ability to scale
  • Hard to apply functional abstractions
  • Words
  • Allows abstracting data while precisely
    representing control
  • Overlooks finite word-size effects
  • Bit Vectors
  • Realistic semantic model for hardware software
  • Captures all details of actual operation
  • Detects errors related to overflow and other
    artifacts of finite representation
  • Can apply abstractions found at word-level

55
Areas of Agreement
  • SAT-Based Framework Is Only Logical Choice
  • SAT solvers are good getting better
  • Want to Automatically Exploit Abstractions
  • Function structure
  • Arithmetic properties
  • E.g., associativity, commutativity
  • Arithmetic reductions
  • E.g., LU decomposition
  • Base Level Should Be SAT
  • Only semantically complete approach

56
Choices
  • Optimize for Special Formula Classes
  • E.g., STP optimized for conjunctions of
    constraints
  • Common in software verification testing
  • Iterative Abstraction
  • Natural framework for attempting different
    abstractions
  • Having SAT solver in inner loop makes performance
    tuning difficult
  • DPLL(T) Framework
  • Theory solver only deals with conjunctions
  • May need to invoke SAT solver in inner loop
  • Hard to coordinate outer and inner search
    procedures
  • Others?

57
Observations
  • Bit-Vector Modeling Gaining in Popularity
  • Recognition of importance
  • Benchmarks and competitions
  • Just Now Improving on Bit Blasting SAT
  • Lots More Work to be Done
Write a Comment
User Comments (0)
About PowerShow.com