Title: Modeling Data in Formal Verification Bits, Bit Vectors, or Words
1Modeling Data in Formal VerificationBits, Bit
Vectors, or Words
Randal E. Bryant Carnegie Mellon University
http//www.cs.cmu.edu/bryant
2Overview
- Issue
- How should data be modeled in formal analysis?
- Verification, test generation, security analysis,
- Approaches
- Bits Every bit is represented individually
- Basis for most CAD, model checking
- Words View each word as arbitrary value
- E.g., unbounded integers
- Historic program verification work
- Bit Vectors Finite precision words
- Captures true semantics of hardware and software
- More opportunities for abstraction than with bits
3Bit-Level Modeling
Control Logic
- Represent Every Bit of State Individually
- Behavior expressed as Boolean next-state over
current state - Historic method for most CAD, testing, and
verification tools - E.g., model checkers
4Bit-Level Modeling in Practice
- Strengths
- Allows precise modeling of system
- Well developed technology
- BDDs SAT for Boolean reasoning
- Limitations
- Every state bit introduces two Boolean variables
- Current state next state
- Overly detailed modeling of system functions
- Dont want to capture full details of FPU
- Making It Work
- Use extensive abstraction to reduce bit count
- Hard to abstract functionality
5Word-Level Abstraction 1 Bits ? Integers
x0
x1
x2
xn-1
- View Data as Symbolic Words
- Arbitrary integers
- No assumptions about size or encoding
- Classic model for reasoning about software
- Can store in memories registers
6Abstracting Data Bits
Control Logic
7Word-Level Abstraction 2 Uninterpreted
Functions
f
- For any Block that Transforms or Evaluates Data
- Replace with generic, unspecified function
- Only assumed property is functional consistency
- a x ? b y ? f (a, b) f (x, y)
8Abstracting Functions
Control Logic
Data Path
Com. Log. 1
Com. Log. 1
- For Any Block that Transforms Data
- Replace by uninterpreted function
- Ignore detailed functionality
- Conservative approximation of actual system
9Word-Level Modeling History
- Historic
- Used by theorem provers
- More Recently
- Burch Dill, CAV 94
- Verify that pipelined processor has same behavior
as unpipelined reference model - Use word-level abstractions of data paths and
memories - Use decision procedure to determine equivalence
- Bryant, Lahiri, Seshia, CAV 02
- UCLID verifier
- Tool for describing verifying systems at word
level
10Pipeline Verification Example
Pipelined Processor
Reference Model
11Abstracted Pipeline Verification
Pipelined Processor
Reference Model
12Experience with Word-Level Modeling
- Powerful Abstraction Tool
- Allows focus on control of large-scale system
- Can model systems with very large memories
- Hard to Generate Abstract Model
- Hand-generated how to validate?
- Automatic abstraction limited success
- Andraus Sakallah, DAC 2004
- Realistic Features Break Abstraction
- E.g., Set ALU function to A0 to pass operand to
output - Desire
- Should be able to mix detailed bit-level
representation with abstracted word-level
representation
13Bit Vectors Motivating Example 1
int abs(int x) int mask xgtgt31 return (x
mask) mask 1
int test_abs(int x) return (x lt 0) ? -x x
- Do these functions produce identical results?
- Strategy
- Represent and reason about bit-level program
behavior - Specific to machine word size, integer
representations, and operations
14Motivating Example 2
void fun() char fmt16 fgets(fmt, 16,
stdin) fmt15 '\0' printf(fmt)
- Is there an input string that causes value 234 to
be written to address a4a3a2a1?
- Answer
- Yes "a1a2a3a4230gn"
- Depends on details of compilation
- But no exploit for buffer size less than 8
- Ganapathy, Seshia, Jha, Reps, Bryant, ICSE 05
15Motivating Example 3
bitW popSpec(bitW x) int cnt 0 for
(int i0 iltW i) if (xi) cnt
return cnt
bitW popSketch(bitW x) loop (??) x
(x??) ((xgtgt??)??) return x
- Is there a way to expand the program sketch to
make it match the spec?
- Answer
- W16
- Solar-Lezama, et al., ASPLOS 06
x (x0x5555) ((xgtgt1)0x5555) x
(x0x3333) ((xgtgt2)0x3333) x (x0x0077)
((xgtgt8)0x0077) x (x0x000f)
((xgtgt4)0x000f)
16Motivating Example 4
Sequential Reference Model
Pipelined Microprocessor
- Is pipelined microprocessor identical to
sequential reference model? - Strategy
- Represent machine instructions, data, and state
as bit vectors - Compatible with hardware description language
representation - Verifier finds abstractions automatically
17Bit Vector Formulas
- Fixed width data words
- Arithmetic operations
- Add/subtract/multiply/divide,
- Twos complement, unsigned,
- Bit-wise logical operations
- Bitwise and/or/xor, shift/extract, concatenate
- Predicates
- , lt
- Task
- Is formula satisfiable?
- E.g., a gt 0 aa lt 0
50000 50000 -1794967296 (on 32-bit machine)
18Decision Procedures
- Core technology for formal reasoning
- Boolean SAT
- Pure Boolean formula
- SAT Modulo Theories (SMT)
- Support additional logic fragments
- Example theories
- Linear arithmetic over reals or integers
- Functions with equality
- Bit vectors
- Combinations of theories
19Recent Progress in SAT Solving
20BV Decision ProceduresSome History
- B.C. (Before Chaff)
- String operations (concatenate, field extraction)
- Linear arithmetic with bounds checking
- Modular arithmetic
- Limitations
- Cannot handle full range of bit-vector operations
21BV Decision ProceduresUsing SAT
- SAT-Based Bit Blasting
- Generate Boolean circuit based on bit-level
behavior of operations - Convert to Conjunctive Normal Form (CNF) and
check with best available SAT checker - Handles arbitrary operations
- Effective in Many Applications
- CBMC Clarke, Kroening, Lerda, TACAS 04
- Microsoft Cogent SLAM Cook, Kroening,
Sharygina, CAV 05 - CVC-Lite Dill, Barrett, Ganesh, Yices deMoura,
et al
22Bit-Vector Challenge
- Is there a better way than bit blasting?
- Requirements
- Provide same functionality as with bit blasting
- Find abstractions based on word-level structure
- Improve on performance of bit blasting
- Observation
- Must have bit blasting at core
- Only approach that covers full functionality
- Want to exploit special cases
- Formula satisfied by small values
- Simple algebraic properties imply
unsatisfiability - Small unsatisfiable core
- Solvable by modular arithmetic
23Some Recent Ideas
- Iterative Approximation
- UCLID Bryant, Kroening, Ouaknine, Seshia,
Strichman, Brady, TACAS 07 - Use bit blasting as core technique
- Apply to simplified versions of formula
- Successive approximations until solve or show
unsatisfiable - Using Modular Arithmetic
- STP Ganesh Dill, CAV 07
- Algebraic techniques to solve special case forms
- Layered Approach
- MathSat Bruttomesso, Cimatti, Franzen, Griggio,
Hanna, Nadel, Palti, Sebastiani, CAV 07 - Use successively more detailed solvers
24Iterative Approach Background Approximating
Formula
?
Original Formula
- Example Approximation Techniques
- Underapproximating
- Restrict word-level variables to smaller ranges
of values - Overapproximating
- Replace subformula with Boolean variable
25Starting Iterations
?
?1-
- Initial Underapproximation
- (Greatly) restrict ranges of word-level variables
- Intuition Satisfiable formula often has
small-domain solution
26First Half of Iteration
?
?1-
- SAT Result for ?1-
- Satisfiable
- Then have found solution for ?
- Unsatisfiable
- Use UNSAT proof to generate overapproximation ?1
- (Described later)
27Second Half of Iteration
?1
?
?1-
- SAT Result for ?1
- Unsatisfiable
- Then have shown ? unsatisfiable
- Satisfiable
- Solution indicates variable ranges that must be
expanded - Generate refined underapproximation
28Iterative Behavior
?2
?1
- Underapproximations
- Successively more precise abstractions of ?
- Allow wider variable ranges
- Overapproximations
- No predictable relation
- UNSAT proof not unique
? ? ?
?k
?
?k-
? ? ?
?2-
?1-
29Overall Effect
- Soundness
- Only terminate with solution on
underapproximation - Only terminate as UNSAT on overapproximation
- Completeness
- Successive underapproximations approach ?
- Finite variable ranges guarantee termination
- In worst case, get ?k- ? ?
30Generating Overapproximation
- Given
- Underapproximation ?1-
- Bit-blasted translation of ?1- into Boolean
formula - Proof that Boolean formula unsatisfiable
- Generate
- Overapproximation ?1
- If ?1 satisfiable, must lead to refined
underapproximation - Generate ?2- such that
- ?1- ? ?2- ? ?
31Bit-Vector Formula Structure
- DAG representation to allow shared subformulas
?
32Structure of Underapproximation
?-
- Linear complexity translation to CNF
- Each word-level variable encoded as set of
Boolean variables - Additional Boolean variables represent subformula
values
33Encoding Range Constraints
- Explicit
- View as additional predicates in formula
- Implicit
- Reduce number of variables in encoding
- Constraint Encoding
- 0 ? w ? 8 0 0 0 0 w2w1w0
- -4 ? x ? 4 xsxsxs xsxsx1x0
- Yields smaller SAT encodings
0 ? w ? 8 ? -4 ? x ? 4
34UNSAT Proof
- Subset of clauses that is unsatisfiable
- Clause variables define portion of DAG
- Subgraph that cannot be satisfied with given
range constraints
x y
x 2 z ? 1
a
Æ
w 0xFFFF x
Ç
x 26 v
35Extracting Circuit from UNSAT Proof
- Subgraph that cannot be satisfied with given
range constraints - Even when replace rest of graph with
unconstrained variables
x y
x 2 z ? 1
UNSAT
a
Æ
b1
b2
36Generated Overapproximation
- Remove range constraints on word-level variables
- Creates overapproximation
- Ignores correlations between values of subformulas
x y
x 2 z ? 1
a
?1
Æ
b1
b2
37Refinement Property
- Claim
- ?1 has no solutions that satisfy ?1-s range
constraints - Because ?1 contains portion of ?1- that was
shown to be unsatisfiable under range constraints
x y
x 2 z ? 1
UNSAT
a
Æ
?1
b1
b2
38Refinement Property (Cont.)
- Consequence
- Solving ?1 will expand range of some variables
- Leading to more exact underapproximation ?2-
x y
x 2 z ? 1
a
?1
Æ
b1
b2
39Effect of Iteration
?1
UNSAT proof generate overapproximation
?
?1-
- Each Complete Iteration
- Expands ranges of some word-level variables
- Creates refined underapproximation
40Approximation Methods
- So Far
- Range constraints
- Underapproximate by constraining values of
word-level variables - Subformula elimination
- Overapproximate by assuming subformula value
arbitrary - General Requirements
- Systematic under- and over-approximations
- Way to connect from one to another
- Goal Devise Additional Approximation Strategies
41Function Approximation Example
x x x
0 1 else
y 0 0 0 0
y 1 0 1 x
y else 0 y
- Motivation
- Multiplication (and division) are difficult cases
for SAT - Prohibit Via Additional Range Constraints
- Gives underapproximation
- Restricts values of (possibly intermediate) terms
- Abstract as f (x,y)
- Overapproximate as uninterpreted function f
- Value constrained only by functional consistency
42Results UCLID BV vs. Bit-blasting
results on 2.8 GHz Xeon, 2 GB RAM
- UCLID always better than bit blasting
- Generally better than other available procedures
- SAT time is the dominating factor
43Challenges with Iterative Approximation
- Formulating Overall Strategy
- Which abstractions to apply, when and where
- How quickly to relax constraints in iterations
- Which variables to expand and by how much?
- Too conservative Each call to SAT solver incurs
cost - Too lenient Devolves to complete bit blasting.
- Predicting SAT Solver Performance
- Hard to predict time required by call to SAT
solver - Will particular abstraction simplify or
complicate SAT? - Combination Especially Difficult
- Multiple iterations with unpredictable inner loop
44STP Linear Equation Solving
- Ganesh Dill, CAV 07
- Solve linear equations over integers mod 2w
- Capture range of solutions with Boolean Variables
- Example Problem
- Variables 3-bit unsigned integers
- x x2 x1 x0 y y2 y1 y0 z z2 z1 z0
- Linear equations conjunction of linear
constraints - General Form
- A x b mod 2w
3x 4y 2z 0 mod 8
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
45Solution Method
- Equations
- Some Number Theory
- Odd number has multiplicative inverse mod 2w
- Mod 8 3-1 3
- Additive inverse mod 2w -x 2w - x
- Mod 8 -4 4 -2 6
- Solve first equation for x
3x 4y 2z 0 mod 8
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
33x 34y 36z mod 8
x 4y 2z mod 8
46Solution Method (cont.)
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
x 4y 2z mod 8
2(4y2z) 2y 2 0 mod 8
2(4y2z) 4y 2z 0 mod 8
2y 4z 2 0 mod 8
4y 6z 0 mod 8
47What if All Coefficients Even?
- Result of Substitutions
- Even numbers do not have multiplicative inverses
mod 8 - Observation
- Can divide through and reduce modulus
2y 4z 2 0 mod 8
4y 6z 0 mod 8
y 2z 1 0 mod 4
2y 3z 0 mod 4
y 2z 3 mod 4
z 2 mod 4
y 3 mod 4
48General Solutions
- Original variables 3-bit unsigned integers
- x x2 x1 x0 y y2 y1 y0 z z2 z1 z0
- Solutions
- Constrained variables
- y y2 1 1 z z2 1 0
- Back Substitution
- Constrained variables
- x 0 0 0
y 3 mod 4
z 2 mod 4
x 4y 6z mod 8
x 0 mod 8
49Linear Equation Solutions
- Equations
- Encoding All Possible Solutions
- x 0 0 0 y y2 1 1 z z2 1 0
- y2, z2 arbitrary Boolean variables
- 4 possible solutions (out of original 512)
- General Principle
- Form of LU decomposition
- Polynomial time algorithm
- Boolean variables in solution to express set of
solutions - Only works when have conjunction of linear
constraints
3x 4y 2z 0 mod 8
2x 2y 2 0 mod 8
2x 4y 2z 0 mod 8
50Layered Solver
- Bruttomesso, et al, CAV 07
- Part of MathSAT project
- DPLL(T) Framework
- SAT solver coupled with solver for mathematical
theory T - BV theory solver works with conjunctions of
constraints
51DPLL(T) Formula Structure
Boolean Structure
Atoms
?
- Atoms
- Predicates applied to bit-vector expressions
- Boolean Variables
52DPLL(T) Operation
- Actions
- DPLL engine satisfies Boolean portion
- Theory solver determines whether resulting
conjunction of atoms satifiable
?
?
?
?
x 2 z gt 1
x 26 v
w 0xFFFF ? x
x y
x ltlt 1 gt y
- Solver provides information to DPLL engine to aid
search - Nonchronological backtracking
- Conflict clause generation
- Successful approach for other decision procedures
53MathSAT Layers
- Uses increasingly detailed solver layers
- Only progress if cant find conflict using more
abstract rules - Layers
- Equality with uninterpreted functions
- Treats all bit-level functions and operators as
uninterpreted - Simple handling of concatenations, extractions,
and transitivity - Full solver using linear arithmetic SAT
54Summary Modeling Levels
- Bits
- Limited ability to scale
- Hard to apply functional abstractions
- Words
- Allows abstracting data while precisely
representing control - Overlooks finite word-size effects
- Bit Vectors
- Realistic semantic model for hardware software
- Captures all details of actual operation
- Detects errors related to overflow and other
artifacts of finite representation - Can apply abstractions found at word-level
55Areas of Agreement
- SAT-Based Framework Is Only Logical Choice
- SAT solvers are good getting better
- Want to Automatically Exploit Abstractions
- Function structure
- Arithmetic properties
- E.g., associativity, commutativity
- Arithmetic reductions
- E.g., LU decomposition
- Base Level Should Be SAT
- Only semantically complete approach
56Choices
- Optimize for Special Formula Classes
- E.g., STP optimized for conjunctions of
constraints - Common in software verification testing
- Iterative Abstraction
- Natural framework for attempting different
abstractions - Having SAT solver in inner loop makes performance
tuning difficult - DPLL(T) Framework
- Theory solver only deals with conjunctions
- May need to invoke SAT solver in inner loop
- Hard to coordinate outer and inner search
procedures - Others?
57Observations
- Bit-Vector Modeling Gaining in Popularity
- Recognition of importance
- Benchmarks and competitions
- Just Now Improving on Bit Blasting SAT
- Lots More Work to be Done