Modeling Data in Formal Verification Bits, Bit

Vectors, or Words

Randal E. Bryant Carnegie Mellon University

http//www.cs.cmu.edu/bryant

Overview

- Issue
- How should data be modeled in formal analysis?
- Verification, test generation, security analysis,

- Approaches
- Bits Every bit is represented individually
- Basis for most CAD, model checking
- Words View each word as arbitrary value
- E.g., unbounded integers
- Historic program verification work
- Bit Vectors Finite precision words
- Captures true semantics of hardware and software
- More opportunities for abstraction than with bits

Bit-Level Modeling

Control Logic

- Represent Every Bit of State Individually
- Behavior expressed as Boolean next-state over

current state - Historic method for most CAD, testing, and

verification tools - E.g., model checkers

Bit-Level Modeling in Practice

- Strengths
- Allows precise modeling of system
- Well developed technology
- BDDs SAT for Boolean reasoning
- Limitations
- Every state bit introduces two Boolean variables
- Current state next state
- Overly detailed modeling of system functions
- Dont want to capture full details of FPU
- Making It Work
- Use extensive abstraction to reduce bit count
- Hard to abstract functionality

Word-Level Abstraction 1 Bits ? Integers

x0

x1

x2

xn-1

- View Data as Symbolic Words
- Arbitrary integers
- No assumptions about size or encoding
- Classic model for reasoning about software
- Can store in memories registers

Abstracting Data Bits

Control Logic

Word-Level Abstraction 2 Uninterpreted

Functions

f

- For any Block that Transforms or Evaluates Data
- Replace with generic, unspecified function
- Only assumed property is functional consistency
- a x ? b y ? f (a, b) f (x, y)

Abstracting Functions

Control Logic

Data Path

Com. Log. 1

Com. Log. 1

- For Any Block that Transforms Data
- Replace by uninterpreted function
- Ignore detailed functionality
- Conservative approximation of actual system

Word-Level Modeling History

- Historic
- Used by theorem provers
- More Recently
- Burch Dill, CAV 94
- Verify that pipelined processor has same behavior

as unpipelined reference model - Use word-level abstractions of data paths and

memories - Use decision procedure to determine equivalence
- Bryant, Lahiri, Seshia, CAV 02
- UCLID verifier
- Tool for describing verifying systems at word

level

Pipeline Verification Example

Pipelined Processor

Reference Model

Abstracted Pipeline Verification

Pipelined Processor

Reference Model

Experience with Word-Level Modeling

- Powerful Abstraction Tool
- Allows focus on control of large-scale system
- Can model systems with very large memories
- Hard to Generate Abstract Model
- Hand-generated how to validate?
- Automatic abstraction limited success
- Andraus Sakallah, DAC 2004
- Realistic Features Break Abstraction
- E.g., Set ALU function to A0 to pass operand to

output - Desire
- Should be able to mix detailed bit-level

representation with abstracted word-level

representation

Bit Vectors Motivating Example 1

int abs(int x) int mask xgtgt31 return (x

mask) mask 1

int test_abs(int x) return (x lt 0) ? -x x

- Do these functions produce identical results?
- Strategy
- Represent and reason about bit-level program

behavior - Specific to machine word size, integer

representations, and operations

Motivating Example 2

void fun() char fmt16 fgets(fmt, 16,

stdin) fmt15 '\0' printf(fmt)

- Is there an input string that causes value 234 to

be written to address a4a3a2a1?

- Answer
- Yes "a1a2a3a4230gn"
- Depends on details of compilation
- But no exploit for buffer size less than 8
- Ganapathy, Seshia, Jha, Reps, Bryant, ICSE 05

Motivating Example 3

bitW popSpec(bitW x) int cnt 0 for

(int i0 iltW i) if (xi) cnt

return cnt

bitW popSketch(bitW x) loop (??) x

(x??) ((xgtgt??)??) return x

- Is there a way to expand the program sketch to

make it match the spec?

- Answer
- W16
- Solar-Lezama, et al., ASPLOS 06

x (x0x5555) ((xgtgt1)0x5555) x

(x0x3333) ((xgtgt2)0x3333) x (x0x0077)

((xgtgt8)0x0077) x (x0x000f)

((xgtgt4)0x000f)

Motivating Example 4

Sequential Reference Model

Pipelined Microprocessor

- Is pipelined microprocessor identical to

sequential reference model? - Strategy
- Represent machine instructions, data, and state

as bit vectors - Compatible with hardware description language

representation - Verifier finds abstractions automatically

Bit Vector Formulas

- Fixed width data words
- Arithmetic operations
- Add/subtract/multiply/divide,
- Twos complement, unsigned,
- Bit-wise logical operations
- Bitwise and/or/xor, shift/extract, concatenate
- Predicates
- , lt
- Task
- Is formula satisfiable?
- E.g., a gt 0 aa lt 0

50000 50000 -1794967296 (on 32-bit machine)

Decision Procedures

- Core technology for formal reasoning
- Boolean SAT
- Pure Boolean formula
- SAT Modulo Theories (SMT)
- Support additional logic fragments
- Example theories
- Linear arithmetic over reals or integers
- Functions with equality
- Bit vectors
- Combinations of theories

Recent Progress in SAT Solving

BV Decision Procedures Some History

- B.C. (Before Chaff)
- String operations (concatenate, field extraction)
- Linear arithmetic with bounds checking
- Modular arithmetic
- Limitations
- Cannot handle full range of bit-vector operations

BV Decision Procedures Using SAT

- SAT-Based Bit Blasting
- Generate Boolean circuit based on bit-level

behavior of operations - Convert to Conjunctive Normal Form (CNF) and

check with best available SAT checker - Handles arbitrary operations
- Effective in Many Applications
- CBMC Clarke, Kroening, Lerda, TACAS 04
- Microsoft Cogent SLAM Cook, Kroening,

Sharygina, CAV 05 - CVC-Lite Dill, Barrett, Ganesh, Yices deMoura,

et al

Bit-Vector Challenge

- Is there a better way than bit blasting?
- Requirements
- Provide same functionality as with bit blasting
- Find abstractions based on word-level structure
- Improve on performance of bit blasting
- Observation
- Must have bit blasting at core
- Only approach that covers full functionality
- Want to exploit special cases
- Formula satisfied by small values
- Simple algebraic properties imply

unsatisfiability - Small unsatisfiable core
- Solvable by modular arithmetic

Some Recent Ideas

- Iterative Approximation
- UCLID Bryant, Kroening, Ouaknine, Seshia,

Strichman, Brady, TACAS 07 - Use bit blasting as core technique
- Apply to simplified versions of formula
- Successive approximations until solve or show

unsatisfiable - Using Modular Arithmetic
- STP Ganesh Dill, CAV 07
- Algebraic techniques to solve special case forms
- Layered Approach
- MathSat Bruttomesso, Cimatti, Franzen, Griggio,

Hanna, Nadel, Palti, Sebastiani, CAV 07 - Use successively more detailed solvers

Iterative Approach Background Approximating

Formula

?

Original Formula

- Example Approximation Techniques
- Underapproximating
- Restrict word-level variables to smaller ranges

of values - Overapproximating
- Replace subformula with Boolean variable

Starting Iterations

?

?1-

- Initial Underapproximation
- (Greatly) restrict ranges of word-level variables
- Intuition Satisfiable formula often has

small-domain solution

First Half of Iteration

?

?1-

- SAT Result for ?1-
- Satisfiable
- Then have found solution for ?
- Unsatisfiable
- Use UNSAT proof to generate overapproximation ?1
- (Described later)

Second Half of Iteration

?1

?

?1-

- SAT Result for ?1
- Unsatisfiable
- Then have shown ? unsatisfiable
- Satisfiable
- Solution indicates variable ranges that must be

expanded - Generate refined underapproximation

Iterative Behavior

?2

?1

- Underapproximations
- Successively more precise abstractions of ?
- Allow wider variable ranges
- Overapproximations
- No predictable relation
- UNSAT proof not unique

? ? ?

?k

?

?k-

? ? ?

?2-

?1-

Overall Effect

- Soundness
- Only terminate with solution on

underapproximation - Only terminate as UNSAT on overapproximation
- Completeness
- Successive underapproximations approach ?
- Finite variable ranges guarantee termination
- In worst case, get ?k- ? ?

Generating Overapproximation

- Given
- Underapproximation ?1-
- Bit-blasted translation of ?1- into Boolean

formula - Proof that Boolean formula unsatisfiable
- Generate
- Overapproximation ?1
- If ?1 satisfiable, must lead to refined

underapproximation - Generate ?2- such that
- ?1- ? ?2- ? ?

Bit-Vector Formula Structure

- DAG representation to allow shared subformulas

?

Structure of Underapproximation

?-

- Linear complexity translation to CNF
- Each word-level variable encoded as set of

Boolean variables - Additional Boolean variables represent subformula

values

Encoding Range Constraints

- Explicit
- View as additional predicates in formula
- Implicit
- Reduce number of variables in encoding
- Constraint Encoding
- 0 ? w ? 8 0 0 0 0 w2w1w0
- -4 ? x ? 4 xsxsxs xsxsx1x0
- Yields smaller SAT encodings

0 ? w ? 8 ? -4 ? x ? 4

UNSAT Proof

- Subset of clauses that is unsatisfiable
- Clause variables define portion of DAG
- Subgraph that cannot be satisfied with given

range constraints

x y

x 2 z ? 1

a

Æ

w 0xFFFF x

Ç

x 26 v

Extracting Circuit from UNSAT Proof

- Subgraph that cannot be satisfied with given

range constraints - Even when replace rest of graph with

unconstrained variables

x y

x 2 z ? 1

UNSAT

a

Æ

b1

b2

Generated Overapproximation

- Remove range constraints on word-level variables
- Creates overapproximation
- Ignores correlations between values of subformulas

x y

x 2 z ? 1

a

?1

Æ

b1

b2

Refinement Property

- Claim
- ?1 has no solutions that satisfy ?1-s range

constraints - Because ?1 contains portion of ?1- that was

shown to be unsatisfiable under range constraints

x y

x 2 z ? 1

UNSAT

a

Æ

?1

b1

b2

Refinement Property (Cont.)

- Consequence
- Solving ?1 will expand range of some variables
- Leading to more exact underapproximation ?2-

x y

x 2 z ? 1

a

?1

Æ

b1

b2

Effect of Iteration

?1

UNSAT proof generate overapproximation

?

?1-

- Each Complete Iteration
- Expands ranges of some word-level variables
- Creates refined underapproximation

Approximation Methods

- So Far
- Range constraints
- Underapproximate by constraining values of

word-level variables - Subformula elimination
- Overapproximate by assuming subformula value

arbitrary - General Requirements
- Systematic under- and over-approximations
- Way to connect from one to another
- Goal Devise Additional Approximation Strategies

Function Approximation Example

x x x

0 1 else

y 0 0 0 0

y 1 0 1 x

y else 0 y

- Motivation
- Multiplication (and division) are difficult cases

for SAT - Prohibit Via Additional Range Constraints
- Gives underapproximation
- Restricts values of (possibly intermediate) terms
- Abstract as f (x,y)
- Overapproximate as uninterpreted function f
- Value constrained only by functional consistency

Results UCLID BV vs. Bit-blasting

results on 2.8 GHz Xeon, 2 GB RAM

- UCLID always better than bit blasting
- Generally better than other available procedures
- SAT time is the dominating factor

Challenges with Iterative Approximation

- Formulating Overall Strategy
- Which abstractions to apply, when and where
- How quickly to relax constraints in iterations
- Which variables to expand and by how much?
- Too conservative Each call to SAT solver incurs

cost - Too lenient Devolves to complete bit blasting.
- Predicting SAT Solver Performance
- Hard to predict time required by call to SAT

solver - Will particular abstraction simplify or

complicate SAT? - Combination Especially Difficult
- Multiple iterations with unpredictable inner loop

STP Linear Equation Solving

- Ganesh Dill, CAV 07
- Solve linear equations over integers mod 2w
- Capture range of solutions with Boolean Variables
- Example Problem
- Variables 3-bit unsigned integers
- x x2 x1 x0 y y2 y1 y0 z z2 z1 z0
- Linear equations conjunction of linear

constraints - General Form
- A x b mod 2w

3x 4y 2z 0 mod 8

2x 2y 2 0 mod 8

2x 4y 2z 0 mod 8

Solution Method

- Equations
- Some Number Theory
- Odd number has multiplicative inverse mod 2w
- Mod 8 3-1 3
- Additive inverse mod 2w -x 2w - x
- Mod 8 -4 4 -2 6
- Solve first equation for x

3x 4y 2z 0 mod 8

2x 2y 2 0 mod 8

2x 4y 2z 0 mod 8

33x 34y 36z mod 8

x 4y 2z mod 8

Solution Method (cont.)

- Substitutions

2x 2y 2 0 mod 8

2x 4y 2z 0 mod 8

x 4y 2z mod 8

2(4y2z) 2y 2 0 mod 8

2(4y2z) 4y 2z 0 mod 8

2y 4z 2 0 mod 8

4y 6z 0 mod 8

What if All Coefficients Even?

- Result of Substitutions
- Even numbers do not have multiplicative inverses

mod 8 - Observation
- Can divide through and reduce modulus

2y 4z 2 0 mod 8

4y 6z 0 mod 8

y 2z 1 0 mod 4

2y 3z 0 mod 4

y 2z 3 mod 4

z 2 mod 4

y 3 mod 4

General Solutions

- Original variables 3-bit unsigned integers
- x x2 x1 x0 y y2 y1 y0 z z2 z1 z0
- Solutions
- Constrained variables
- y y2 1 1 z z2 1 0
- Back Substitution
- Constrained variables
- x 0 0 0

y 3 mod 4

z 2 mod 4

x 4y 6z mod 8

x 0 mod 8

Linear Equation Solutions

- Equations
- Encoding All Possible Solutions
- x 0 0 0 y y2 1 1 z z2 1 0
- y2, z2 arbitrary Boolean variables
- 4 possible solutions (out of original 512)
- General Principle
- Form of LU decomposition
- Polynomial time algorithm
- Boolean variables in solution to express set of

solutions - Only works when have conjunction of linear

constraints

3x 4y 2z 0 mod 8

2x 2y 2 0 mod 8

2x 4y 2z 0 mod 8

Layered Solver

- Bruttomesso, et al, CAV 07
- Part of MathSAT project
- DPLL(T) Framework
- SAT solver coupled with solver for mathematical

theory T - BV theory solver works with conjunctions of

constraints

DPLL(T) Formula Structure

Boolean Structure

Atoms

?

- Atoms
- Predicates applied to bit-vector expressions
- Boolean Variables

DPLL(T) Operation

- Actions
- DPLL engine satisfies Boolean portion
- Theory solver determines whether resulting

conjunction of atoms satifiable

?

?

?

?

x 2 z gt 1

x 26 v

w 0xFFFF ? x

x y

x ltlt 1 gt y

- Solver provides information to DPLL engine to aid

search - Nonchronological backtracking
- Conflict clause generation
- Successful approach for other decision procedures

MathSAT Layers

- Uses increasingly detailed solver layers
- Only progress if cant find conflict using more

abstract rules - Layers
- Equality with uninterpreted functions
- Treats all bit-level functions and operators as

uninterpreted - Simple handling of concatenations, extractions,

and transitivity - Full solver using linear arithmetic SAT

Summary Modeling Levels

- Bits
- Limited ability to scale
- Hard to apply functional abstractions
- Words
- Allows abstracting data while precisely

representing control - Overlooks finite word-size effects
- Bit Vectors
- Realistic semantic model for hardware software
- Captures all details of actual operation
- Detects errors related to overflow and other

artifacts of finite representation - Can apply abstractions found at word-level

Areas of Agreement

- SAT-Based Framework Is Only Logical Choice
- SAT solvers are good getting better
- Want to Automatically Exploit Abstractions
- Function structure
- Arithmetic properties
- E.g., associativity, commutativity
- Arithmetic reductions
- E.g., LU decomposition
- Base Level Should Be SAT
- Only semantically complete approach

Choices

- Optimize for Special Formula Classes
- E.g., STP optimized for conjunctions of

constraints - Common in software verification testing
- Iterative Abstraction
- Natural framework for attempting different

abstractions - Having SAT solver in inner loop makes performance

tuning difficult - DPLL(T) Framework
- Theory solver only deals with conjunctions
- May need to invoke SAT solver in inner loop
- Hard to coordinate outer and inner search

procedures - Others?

Observations

- Bit-Vector Modeling Gaining in Popularity
- Recognition of importance
- Benchmarks and competitions
- Just Now Improving on Bit Blasting SAT
- Lots More Work to be Done