The Benefit of Concurrent Model Checking

About This Presentation

Title:

The Benefit of Concurrent Model Checking

Description:

The Benefit of Concurrent Model Checking BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 24

Provided by: Alan204

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: The Benefit of Concurrent Model Checking

1
The Benefit of Concurrent Model Checking

BVSRC
Berkeley Verification and Synthesis Research
Center
Baruch Sterin, A. Mishchenko, N. Een, Robert
Brayton
BVSRC
UC Berkeley
Thanks to NSF, SRC, NSA, and Industrial
Sponsors,
IBM, Intel, Synopsys, Mentor, Magma, Altera,
Atrenta, Microsemi, Jasper, Oasys, Real Intent,
Tabula, Verific

2
Overview

Overview
Model checking engines
Example
Non-concurrent
Hybrid approach
Concurrent verify and refine.
Flow
Example
Why more powerful
Questions and objections addressed
Future work

3
Concurrent Model Checking

Overview
Employ multiple MC engines using hybrid
concurrency on a multi-core server
Benefits
Faster
almost linear speedup
plus does not waste time making a wrong decision.
More powerful
can solve harder problems
Makes sequential approach obsolete
No reason not to use concurrency
even for 1 core
simpler
Concurrency controlled by Python front end.

4
Model Checking Engines

Random simulation
Semi-formal simulation
Bounded model checking (BMC) 15
BDD-based reachability 725
Property directed reachability (PDR) 4
Interpolation 14
Synthesis
rewriting 10
retiming 13
sequential signal correspondence 26
with constraint extraction
phase abstraction 27
temporal decomposition 23
Abstraction 8
counterexample-based (CB) 19
proof-based (PB) 2021
Speculation 23

Verification engines
1-3 incomplete
4-6 complete
Transformation engines
7 equivalence preserving
8-9 abstracting

5
Example of non-concurrent MC
Read_file test_lru_consist_miss_slbc.sixth_sense_
style_1sif_prop2_fixed2 PIs 532, POs 1, FF
2389, ANDs 12049 prove quick_verify (try many
engines to see if one can prove) Simplifying Numb
er of constraints 3 Forward retiming,
quick_simp, scorr_constr, trm PIs 532, POs
1, FF 2342, ANDs 11054 Simplify PIs
532, POs 1, FF 2335, ANDs 10607 Phase
abstraction PIs 283, POs 2, FF 1460,
ANDs 8911 quick_verify (try many engines to
see if one can prove) Abstracting Initial
abstraction PIs 1624, POs 2, FF 119,
ANDs 1716, max depth 39 Testing with BMC bmc3
-C 100000 -T 50 -F 78 No CEX found in 51
frames Latches reduced from 1460 to 119 Simplify
PIs 1624, POs 2, FF 119, ANDs 1687,
max depth 51 Trimming PIs 158, POs 2,
FF 119, ANDs 734, max depth 51 Simplify
PIs 158, POs 2, FF 119, ANDs 731, max
depth 51 quick_verify (try many engines to see
if one can prove) Speculating Initial
speculation PIs 158, POs 26, FF 119,
ANDs 578, max depth 51 Fast interpolation
reduced POs to 24 Testing with BMC bmc3 -C 150000
-T 75 No CEX found in 1999 frames PIs 158,
POs 24, FF 119, ANDs 578, max depth
1999 Simplify PIs 158, POs 24, FF 119,
ANDs 535, max depth 1999 Trimming PIs
86, POs 24, FF 119, ANDs 513, max depth
1999 Verifying (try many engines to see if one
can prove) Running reach -v -B 1000000 -F 10000
-T 75 BDD reachability aborted RUNNING
interpolation with 20000 conflicts, 50 sec, max
100 frames 'UNSAT Elapsed time 457.87
seconds, total 458.52 seconds
6

NOTES
The file IE1.aig is first read in and its
statistics are reported as 532 primary inputs, 1
output, 2389 flip-flops, and 12049 AIG nodes.
3 implicit constraints were found, but they were
only mildly useful in simplifying the problem.
Phase abstraction found a cycle of length 2 and
this was useful for simplifying the problem to
1460 FF from 2335 FF. Note that the number of
outputs increased to 2 because the problem was
unrolled 2 time frames.
Abstraction was very successful in reducing the
FF count to 119. This was proved valid out to 39
time frames.
BMC verified that the abstraction produced is
actually valid at least to 51 frames, which gives
us good confidence that the abstraction is valid
for all time.
Trimming reduced the inputs relevant to the
abstraction from 1624 to 158 and simplify reduced
the number of AIG nodes to 731.
Speculate produced a speculative reduced model
(SRM) with 24 new outputs to be proved and low
resource interpolation proved 2 of them. The SRM
model is simpler and has only 578 AIG nodes. The
SRM was tested with BMC and proved valid out to
1999 frames.
Subsequent trimming and simplification reduced
the PIs to 86 and the AIG nodes to 513.
The final verification step first tried BDD
reachability allowing it 75 sec. and to grow to
up to 1M BDD nodes. It could not converge with
these resources so it was aborted. Then
interpolation was able to prove UNSAT, and hence
all 24 outputs are proved.
Although quick_verify was applied between
simplification and abstraction, and between
abstraction and speculation, it was not able to
prove anything, so its output is not shown.
The total time for this proof was 457 sec. run on
a Lenovo X301 laptop.

7
test_lru_consist_miss_slbc.sixth_sense_style_1sif_
prop2_fixed2.aig PIs532,POs1,FF2389,ANDs12049
Executing super_prove 'INTRP', 'BMC',
'pre_simp' For_Retime PIs532,POs1,FF2365,AND
s11064 Number of constraints 2, frames
1 PIs529,POs1,FF2342,ANDs10611 Simplify
PIs529,POs1,FF2265,ANDs10068 Trying
temporal decomposition - for max 15.0 sec. No
reduction Trying phase abstraction - Max phase
2 1, 2 Reparam PIs 1056 gt 264 Simplify with
2 phases PIs264,POs2,FF1462,ANDs8319 Method
pre_simp ended first in 89 sec.
PIs264,POs2,FF1462,ANDs8319 Running
abstract 'INTRP', 'BMC3', 'initial_abstract' Me
thod initial_abstract ended first in 106
sec. Initial abstraction PIs1621,POs2,FF105,A
NDs1427,max depth42 Iterating abstraction
refinement PIs1621,POs2,FF105,ANDs1427,max
depth42 Latches reduced from 1462 to
105 Running pre_simp Reparam PIs 330 gt
328 PIs328,POs2,FF105,ANDs1184,max
depth42 Min_Retime PIs328,POs2,FF98,ANDs116
4,max depth42 Reparam PIs 328 gt 299 Simplify
PIs299,POs2,FF98,ANDs1064,max
depth42 Reparam PIs 299 gt 266 Trying temporal
decomposition - for max 15.0 sec. No
reduction Reparam PIs 266 gt 261 Running
speculate 'INTRP', 'BMC3', 'initial_speculate'
Method initial_speculate ended first in 38
sec. Initial speculation PIs261,POs38,FF96,AN
Ds833,max depth42 Iterating speculation
refinement BMC3 -- cex in 0.17 sec. at depth 22
gt PIs261,POs37,FF96,ANDs830,max
depth42 INTRP UNSAT in 1.4 sec. Total clock
time taken by super_prove 366.549089 sec.
Same example of with concurrent MCwithout PDR
8
Same example of with concurrent MC but with PDR
test_lru_consist_miss_slbc.sixth_sense_style_1sif_
prop2_fixed2 PIs532,POs1,FF2389,ANDs12049
Executing super_prove 'PDR', 'INTRP', 'BMC',
'PDRm', 'pre_simp' PIs532,POs1,FF2389,AN
Ds12049 For_Retime PIs532,POs1,FF2365,ANDs1
1064 Number of constraints 2, frames
1 Reparam PIs 532 gt 529 PIs529,POs1,FF2342,AN
Ds10611 Simplify PIs529,POs1,FF2265,ANDs100
68 PDRm proved UNSAT in 42 sec. Total clock time
taken by super_prove 42.384159 sec.
9
Hybrid Approach
c_verify
REACH and REACHm optional depending on size
(PIs, FFs)
c_refine
refine
10
c_prove
11
Concurrent Prover Flow - hybrid
c_prove
Start
UNSAT SAT
UNSAT SAT
undecided
backup
kill
SAT
UNSAT SAT
undecided
pause
UNSAT SAT
CEX
c_refine
UNSAT
SAT
undecided
pause
c_refine
UNSAT
CEX
SAT
undecided
means runs concurrently
SAT
(c_prove outputk)
End with a definitive answer
12
Multiple output variation on c_refine

If there are more than X outputs
group outputs and use poor mans concurrency
(PMC)
repeatedly take a group of X outputs at a time
start with time-out of 2 sec.
after all output groups done, double time-out and
repeat
if cex found
refine and start at last time-out value and
last group of X where cex was found.

13
Example of Concurrent Flow
l2snfsm_prop11_fixed2 PIs38,POs1,FF372,ANDs215
0 Executing super_prove Initial
PIs38,POs1,FF372,ANDs2150 Running
Simplification 'PDR', 'INTRP', 'BMC', 'PDRm',
'pre_simp' these run in parallel PIs38,POs1,FF
371,ANDs2150 Fwd_Retime PIs38,POs1,FF349,AN
Ds2056 No constraints found Simplify
PIs38,POs1,FF336,ANDs1951 Trying temporal
decomposition - for max 15.0 sec. No
reduction Method pre_simp ended first in 9
sec. PIs38,POs1,FF336,ANDs1951
14
Running abstract Start PIs38,POs1,FF336,AN
Ds1951 'PDR', 'INTRP', 'BMC3', 'PDRm',
'initial_abstract' Running initial_abstract with
bob10,stable6,time100,depth20 Method
initial_abstract ended first in 103
sec. PIs38,POs1,FF336,ANDs1951,max
depth11 Initial abstraction PIs116,POs1,FF25
8,ANDs1576,max depth11 Iterating abstraction
refinement Verify time set to 125 PIs116,POs1,FF
258,ANDs1576,max depth11 Reparam PIs 116 gt
59 changes inputs to be smaller
number . many iterations here SIM -- cex in
41.48 sec. at depth 104 gt cex_po
0 PIs45,POs1,FF329,ANDs1925,max
depth11 Reparam PIs 45 gt 39 Latches reduced
from 336 to 329 simplify PIs39,POs1,FF329,ANDs
1924,max depth11 Min_Retime
PIs39,POs1,FF329,ANDs1914,max depth11 No
constraints found Simplify PIs39,POs1,FF328,A
NDs1900,max depth11 Trying temporal
decomposition - for max 15.0 sec. No reduction
15
Running speculate 'PDR', 'INTRP', 'BMC3',
'PDRm', 'initial_speculate' Method
initial_speculate ended first in 39 sec. Initial
speculation PIs39,POs241,FF178,ANDs1335,max
depth11 Iterating speculation refinement PDRM
-- cex in 5.64 sec. at depth 40 gt
PIs39,POs239,FF178,ANDs1332,max
depth11 BMC3 -- cex in 1.84 sec. at depth 22 gt
PIs39,POs235,FF178,ANDs1326,max
depth22 many iterations here BMC3 -- cex
in 11.91 sec. at depth 25 gt PIs39,POs204,FF19
1,ANDs1350,max depth25 BMC3 -- cex in 17.77
sec. at depth 25 gt PIs39,POs203,FF195,ANDs13
81,max depth25 BMC -- cex in 29.44 sec. at
depth 25 gt PIs39,POs204,FF195,ANDs1390,max
depth25 BMC -- cex in 37.03 sec. at depth 26 gt
PIs39,POs203,FF195,ANDs1389,max
depth25 Find_cex_par turned on poor mans
concurrency turned on here Verify time set to
148 Number of POs 203 gt 69 t_poor 2 PDRM
UNSAT in 0.08 sec. PDRM UNSAT in 0.07
sec. many iterations here PDR UNSAT in 0.25
sec. PDRM UNSAT in 0.02 sec. all outputs
processed gt 69 outputs proved Number of POs
reduced to 0 Total clock time taken by
super_prove 483.238051 sec. Out7 'UNSAT'
16
Why is concurrent more powerful?

Example of Iterating speculation refinement
verify time set to 50
Initial size PIs171,POs41,FF255, ANDs2275
SIMULATION cex 4.268283 sec, frame 911
SIMULATION cex 0.096659 sec, frame 17
BMC cex 6.534474 sec, frame 17
SIMULATION cex 0.726484 sec, frame 1363
SIMULATION cex 5.740357 sec, frame 391
BMC cex 9.506526 sec, frame 17
SIMULATION cex 6.436064 sec, frame 984
SIMULATION cex 1.212145 sec, frame 444
PDRM cex 4.335237 sec, frame 18
BMC cex 9.853237 sec, frame 17
SIMULATION cex 6.335866 sec, frame 81
SIMULATION cex 4.595637 sec, frame 22
SIMULATION cex 4.594522 sec, frame 40
SIMULATION cex 9.182059 sec, frame 58
PDRM cex 5.637425 sec, frame 20
BMC cex 9.861210 sec, frame 17

17
Why is concurrent more powerful?
refine
refine
refine
refine
refine
refine
refine
refine
refine
refine
cex
cex
cex
cex
cex
cex
cex
cex
cex
cex
Final abstraction/ speculation
Initial abstraction/ speculation
18
Hard examples - academic
Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples Hard HWMCC10 Examples
Name Prim. Inputs Flip flops And nodes Result Time ( sec.)
bobsmhdlc0 61 291 1647 Unsat 434
bobsmhdlc10 61 290 1628 Unsat 450
bobsmhdlc20 61 289 1612 Unsat 1002
bobsmhdlc30 61 300 1574 Unsat 1245
Pdtrod6x8p21 9 84 4318 Unsat 1224
Pdtpmsudc122 16 36 553 Unsat 48
Bobpcihm0 304 1422 9627 none -
Bobsminiuart0 16 114 571 none -
Bobsmcodic0 34 1850 18762 none -
Nusmvqueue1 82 84 2376 none -
Pdtpmsudc161 20 48 741 none -
Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC Notes 0 not solved by anyone 1 solved only by pdtrav 2 solved only by pdtrav and ABC
19
Name Primary Inputs Flip flops And nodes Result Time (sec)
bypass33 856 781 11945 Unsat 84
GCT_38 266 607 14308 Unsat 188
pmu_wr_11 74 1072 7155 Unsat 875
tp_p_w_0 35 208 1228 Unsat 601
KML_M_21 155 3795 20098 Unsat 353
test_hit_4 1570 3107 16701 Unsat 153
two_back62 144 1660 13411 Sat 173
bypass_28_0 156 68 3504 Unsat 9
MCS_MCS_13 247 2654 9985 Unsat 30
sc_sc_0 249 5609 31029 none -
DA_DA_11 168 429 4771 Unsat 37
p3_d_n_0 17 197 1355 Sat 180
pclem_0 77 1564 9460 Unsat 193
assert_p_7_0 207 157 3549 Unsat 396
MCA_MCA_0 131 1718 6615 Unsat 24
MCS_rand5 144 2707 10239 Unsat 441
mcx_z_10 4 2269 9974 none -
sc_ver2_0 19 959 3274 Sat 433
symm_0 34 815 4101 Sat 56
Erat_0 86 396 3016 Unsat 720
Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints Had multiple outputs all but the first were folded in as constraints
Hard examples - Industrial At
the time, the IBM SixSense program did not have a
PDR engine, so we eliminated those problems that
were made easier because of PDR in our code.
A subset of the IBM benchmarks, not solved by
SixthSense using its default Expert System flow
in two hours
20
Multiple output variation on c_refine

How long does it take?
Let O POs, E MC engines used concurrently,
C cores, T final time-out, X
outputs grouped together
Final sweep (with no cexs and assuming no memory
conflicts)
with using full concurrency time T(OE)/C
with grouping and full concurrency time
T(O/X)(XE)/C T(OE)/C
with grouping and PMC time T2
(O/X)(XE)/C 2T(OE)/C
Why not do full concurrency and no grouping?
Grouping done to lessen memory conflicts.
at most XE processes are concurrent on server
choose X so that little memory conflict (why not
choose X C/E?)
PMC done to find cex early when doing grouping.
easy cexs across all outputs are found early
When cexs found (some heuristics)
refine and start PMC at last time-out value
(instead of 2 sec.)
heuristic that expects next cex will take at
least that time to find
first try the last set of X where cex was found.
heuristic that expects that last group where cex
was found is most likely to yield the next cex.

Number of concurrent engines running per coren
21
Questions addressed

Memory Use and Conflicts?
experiments run on 2 processor 4-core each, 24
Gb, 64K L1, 256K L2, 4 Mb server
grouping designed to alleviate severe memory
conflicts.
did not observe slowdown due to memory conflicts,
but more experiments need to be done
Run-time speedup?
linear up to cores
concurrency alleviates wasting time due to wrong
decisions
solving problems not solved by sequential flow
Wasting processor power trying many things but
throw away all but one?
wastage if some cores sitting idle
alternative is to run wrong engine for a longer
time
Use SOTA algorithm?
too many MC algorithms
expert system proposed which learns which
algorithms are best for a given design project
(Z. Nevo - IBM)

22
Future Work

More and better engines
Improved BDD reachability engine (we hope)
We have 4
We had a quite weak (HWMCC08) in 08
Now have two reasonably good ones.
May have a much better one in a few months.
Improved circuit-based SAT solver
Currently used in signal correspondence to
simplify larger circuits
Faster but sometimes limited quality
Will be improved to see if it can compete with
MiniSat 1.14c
New specialized techniques for SEC
More use of concurrency
e.g. exchange information between engines.
will not work on parallelizing individual engines

23
To Learn More

Recent papers http//www.eecs.berkeley.edu/alanmi
/publications
IWLS
N. Een, A. Mishchenko, and R. Brayton, Efficient
implementation of property directed
reachability". IWLS'11.
B. Sterin, N. Een, A. Mishchenko and R. Brayton,
The Benefit of Concurrency in Model Checking,
IWLS11.
S. Ray and R. Brayton, Proving Stabilization
Using Liveness-to-Safety Conversion, IWLS11
Other
R. Brayton and A. Mishchenko, "ABC An academic
industrial-strength verification tool", Proc.
CAV'10, LNCS 6174, pp. 24-40.
N. Een, A. Mishchenko, and N. Amla, "A
single-instance incremental SAT formulation of
proof- and counterexample-based abstraction".
Proc. FMCAD10.
H. Savoj, D. Berthelot, A. Mishchenko, and R.
Brayton, Combinational techniques for sequential
equivalence checking". Proc. FMCAD10, pp.
158-162.
Send email
alanmi_at_eecs.berkeley.edu
brayton_at_eecs.berkeley.edu
een_at_eecs.berkeley.edu
Visit BVSRC webpage www.bvsrc.org

24
(No Transcript)
25
end
26
Why is concurrent more powerful?

Iterating speculation refinement
verify time set to 50
SIMULATION cex 4.26 sec, frame 911 gt
PIs171,POs41,FF255,ANDs2275,max depth28
SIMULATION cex 0.09 sec, frame 17 gt
PIs171,POs43,FF255,ANDs2280,max depth28
BMC cex 9.50 sec, frame 17 gt
PIs171,POs43,FF255,ANDs2282,max depth28
SIMULATION cex 6.43 sec, frame 984 gt
PIs171,POs47,FF255,ANDs2292,max depth28
SIMULATION cex 1.21 sec, frame 444 gt
PIs171,POs49,FF255,ANDs2302,max depth28
PDRM cex 4.33 sec, frame 18 gt
PIs171,POs48,FF255,ANDs2304,max depth28
BMC cex 9.85 sec, frame 17 gt
PIs171,POs55,FF256,ANDs2346,max depth28
SIMULATION cex 6.33 sec, frame 81 gt
PIs171,POs55,FF256,ANDs2347,max depth28
SIMULATION cex 4.59 sec, frame 22 gt
PIs171,POs55,FF257,ANDs2366,max depth28
SIMULATION cex 4.59 sec, frame 40 gt
PIs171,POs54,FF257,ANDs2363,max depth28
BMC cex 6.96 sec, frame 17 gt
PIs171,POs51,FF258,ANDs2377,max depth28
PDRM cex 5.84 sec, frame 22 gt
PIs171,POs51,FF259,ANDs2385,max depth28
BMC cex 7.11 sec, frame 17 gt
PIs171,POs47,FF259,ANDs2377,max depth28
PDRM cex 3.58 sec, frame 19 gt
PIs171,POs46,FF259,ANDs2374,max depth28
PDRM cex 6.04 sec, frame 19 gt
PIs171,POs45,FF259,ANDs2371,max depth28
PDRM cex 8.89 sec, frame 20 gt
PIs171,POs44,FF259,ANDs2372,max depth28
BMC cex 7.50 sec, frame 17 gt
PIs171,POs41,FF260,ANDs2366,max depth28

Write a Comment

User Comments (0)

About PowerShow.com

The Benefit of Concurrent Model Checking - PowerPoint PPT Presentation

The Benefit of Concurrent Model Checking

The Benefit of Concurrent Model Checking BVSRC Berkeley Verification and Synthesis Research Center Baruch Sterin, A. Mishchenko, N. Een, Robert Brayton – PowerPoint PPT presentation