Title: Scaling%20Formal%20Methods%20toward%20Hierarchical%20Protocols%20in%20Shared%20Memory%20Processors:%20Annual%20Review%20Presentation%20
1Scaling Formal Methods toward Hierarchical
Protocols in Shared Memory ProcessorsAnnual
Review Presentation April 2007
Intel SRC Customization Award 2005-TJ-1318
- Presenters
- Ganesh Gopalakrishnan
- Xiaofang Chen
- School of Computing, University of Utah
- Salt Lake City, UT
2Project Personnel
- IBM Mentor Dr. Steven M. German
- Intel Mentor Dr. Ching-Tsun Chou
- Primary Student
- Xiaofang Chen
- Summer internship planned - IBM T.J. Watson
(6/07) where the research discussed here in
Project 2 will be furthered - Other SRC Student
- Robert Palmer (work involving TLA modeling of
communication libraries) - Defense May 10 Expected to join Intel (6/07)
- 3 other PhD students, 1 MS student, 2 UGs in FV
- all working on FV of threading / msg-passing
software
3Multicores are the future!Their caches are
visibly central
gt 80 of chips shipped will be multi-core
(photo courtesy of Intel Corporation.)
4and the number of organizations of
multiprocessor caches is mindboggling (e.g.
imagine 80 cores and deeper hierarchies).
Shared / Private
Inclusive / Exclusive
5Protocol design happens in the thick of things
(many interfaces, constraints of performance,
power, testability).
From High-throughput coherence control and
hardware messaging in Everest, by Nanda et.al.,
IBM J.RD 45(2), 2001.
6Future Coherence Protocols
- Cache coherence protocols that are tuned for the
contexts in which they are operating can
significantly increase performance and reduce
power consumption Liqun Cheng - Producer-consumer sharing pattern-aware protocol
Cheng, HPCA07 - 21 speedup and 15 reduction in network traffic
- Interconnect-aware coherence protocols
Cheng, ISCA06 - Heterogeneous Interconnect
- Improve performance AND reduce power
- 11 speedup and 22 wire power savings
- Bottom-line Protocols are going to get more
complex!
7Designers have poor conceptual tools (e.g.,
Informal MSC drawings). Need better notations
and tools.
GDir
L1-1
L1-2
LDir
(S)
(I)
(S L1-1)
Swap
Req_S
Broadcast
Fwd_Req
NAck
Gnt_S
(S L1-2)
Gnt_S
8Design Abstractions in More Modern Flows
- An Interleaving Protocol Model (Murphi or TLA
are the languages of choice here) - FV here eliminates concurrency bugs
- Detailed HDL model
- FV here eliminates implementation bugs however
- Correspondence with Interleaving Model is lost
- Need more detailed models anyhow
- Interleaving Models are very abstract
- Monolithic Verification of HDL Code Does not
Scale - Design optimizations captured at HDL level
- Interleaving model becomes more obsolete
- Need an Integrated Flow
- Interleaving -gt High level HW View -gt Final HDL
9Related Work in Formal HW Design
- BlueSpec
- High level design is expressed using atomic
transactions - Synthesizes high level designs into hardware
implementations - Automatic scheduling of high level design steps
in hardware - May not meet performance goals
- Malik et.al. Formal Architecture and
Microarchitecture Modeling for Verification - Meant for Instruction Set Processors
- Need Formal theory of Refinement from
Interleaving to High level HW Models
10Our Goals
- Develop Methodology to Verify Realistic
Interleaving Models - Useful Benchmarks for others
- Our particular contributions are towards
Hierarchical protocols - Largely Inspired by Chou et.al.s work (FMCAD04)
- Xiaofang Chens PhD is wrapping up a nice story
here! - Develop Language and Formal Theory for Higher
Level HW Specification Refinement - Ideas largely due to German Janssen
- Xiaofang Chens PhD work is taking ideas from
initial proposal all the way to practical
realization!
11A summary of our work over Y1-2
- Three progressively better approaches to verify
hierarchical cache coherence protocols at the
interleaving level - A/G method of complementary abstractions
(FMCAD06) - Extensions to Non-inclusive hierarchies (TR
06-014) - Abstract each level separately (to be submitted)
- Error-trace checking (to be submitted)
- A theory of transaction based design and
verification (writeup finished initial
experiments finished) - Modular verification of transactions (writeup in
progress initial experiments finished) - Number the projects 1.1, 1.2, 1.3, 1.4,
2, and 3
12Project 1.1-4 Timeline
1.3 Abstraction per level (more scalable)
1.1 FMCAD06 results
1.2 Another hierarchical benchmark
(non-inclusive)
1.4 Automatic Recognition of spurious/real bugs
131.1-4 Hierarchical Protocols
Home Cluster
Remote Cluster 1
Remote Cluster 2
L1 Cache
L1 Cache
L1 Cache
L1 Cache
L1 Cache
L1 Cache
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
14Abstracted Protocol 1
Home Cluster
L1 Cache
L1 Cache
Remote Cluster 1
Remote Cluster 2
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
15Abstracted Protocol 2
Remote Cluster 1
L1 Cache
L1 Cache
Home Cluster
Remote Cluster 2
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
16Non-Circular Assume/Guarantee
- We cant verify this due to state explosion
- h r1 r2 Coh
- Instead
- Check-1 h R1 R2 Coh1 ? Guarant1
- Check-2 H r1 R2 Coh2 ? Guarant2
171.2 We applied the non-circular A/G method to a
Non-Inclusive Hierarchical Protocol.
- Protocol features
- Broadcast channels
- Non-imprecise local dir
- Verification challenges
- A/G cannot infer local dir from just
intra-clusters - Coherence may involve multiple L1 caches
18Verifying Non-Inclusive Protocols
- Inferring L2.State Excl from
- Outside the cluster
- Inside the cluster
- Use history variables to change non-inclusive to
inclusive protocols
19Experimental Results
Protocols of States Mem (GB) Model Check
Hierarchy gt 1,521,900,000 20 No
Abs-1 234,478,105 20 Y
Abs-2 283,124,383 20 Y
Reduction is over 65
201.3 We then tried a Split Hierarchy Per Level
Approach to using non-circular A/G
ABS 1
ABS 2
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
ABS 3
21A Sample Scenario
Home Cluster
Remote Cluster 1
Remote Cluster 2
Excl
Invld
4. Fwd Req_Ex
5. Grant
1. Req_Ex
6. Grant
3. Fwd Req_Ex
2. Fwd Req_Ex
22Map to Abstracted Protocols
Remote Cluster 1
Remote Cluster 2
Invld
Excl
4. Fwd Req_Ex
5. Grant
1. Req_Ex
6. Grant
2. Fwd Req_Ex
3. Fwd Req_Ex
23Experimental Results
Protocols of States Exec time (sec) Mem (GB) Model Check
Hierarchy gt 438,120,000 gt125,799 18 No
Inter 1,500,621 269 2 Y
Intra-1 564,878 48 2 Y
Intra-2 188,842 18 2 Y
Reduction is over 95 !
24Project 1.4 Automatic Recognition of Spurious /
Real Bugs in these approaches
- Problem statement
- Given an error trace of ABS protocol
- Is it a real bug of the original protocol?
- Solution
- In the original protocol, using BFS to guide the
model checking to match the error trace
Reason because our abstraction is just projection
25Basic Idea of Automatic Recognition
Error trace of Abs. protocol
Directed BFS of original protocol
v10, v20, v30
v10, v20
keep
keep
drop
v11, v22, v31
v13, v21, v30
v10, v20, v33
v11, v22
v16, v28
26Y3 Plans for Project 1
- Considerable Experience Gained
- Three Large Benchmark Protocols (each is 3000
lines of Murphi Code) - on the web
- Have Reduced Verif Complexity of Hier Protocols
by 90 - Can Identify Spurious Errors Automatically
- All Finite-state
- Not Parameterized
- No plans for Parameterized
- Y3 Plans Build Tool to support this methodology
27Summary of Projects 2 and 3
- Three progressively better approaches to verify
hierarchical cache coherence protocols at the
interleaving level - A/G method of complementary abstractions
(FMCAD06) - Extensions to deeper, and non-inclusive
hierarchies (TR 06-014) - Latest method that abstracts each level
separately (to be submitted) - Error-trace checking (to be submitted)
- A theory of transaction based design and
verification (writeup finished) - Modular verification of transactions (writeup in
progress) -
28Transaction Level HW Modeling
- The problem addressed Bridge the gap between
high-level specifications and RTL implementations
- Global properties cannot be formally verified at
RTL Level! - Specifications can be verified, but do they
correctly represent the implementations?
29Driving Design Benchmark due to German and Geert
Janssen
30What changes when moving from a spec to an
implementation?
- Atomicity
- Concurrency
- Granularity in modeling
1
1.1
1.3
home
client
home
client
1.2
router
buffer
31General Mappings between high level transitions
and transactions that help implement them
High Level Transition 1
1
High Level Transitions take some non-zero unit
of time (conceptual)
Low Level Transitions that help realize 1
1.2
1.1
Each Low Level Transition takes One Clock Cycle
1.3
32High-Level and Low-Level Computations
1
2
3
1.2
1.1
1.3
2.1
2.2
3.1
3.3
3.2
33Specification of High and Low Levels
1
In Murphi as a Guard ? Action Rule
1.2
1.1
In HMurphi as Multiple Guard ? Action
Rules enclosed in a Begin Transaction / End
Transaction The Guards Decide when each low
level transition can fire The Maximal Number of
Low Level Transitions Enabled in any state are
concurrently fired within each clock tick
1.3
34Transaction
- A transaction is a set of transitions in Impl
that correspond to a transition in Spec
Transaction Rule 1 Rule
n Endtransaction
35Executions
- Spec interleaving
- One enabled transition fires at each step
- Impl concurrent
- All enabled transitions fire at each step
1 2 3
1.1, 2.1 1.2 2.2, 3.1, 3.2
36A Few Notations
- Observable variables VH
- These are Variables used in both Spec and Impl
- Impl has additional internal variables also
- A variable v is inactive at a state s if all
transactions in Impl that can write to v are
quiescent at s
37A Formal Notion of Simulation
- For every concurrent execution of Impl, exists an
interleaving execution of Spec, VH n inactive(li)
match
l0
l1
l2
t0 t1 t2
h0
h2
h1
38Simulation Checks
Guard for Spec transition must hold
Spec transition
Spec(I)
Spec(I)
Observable vars changed by either Spec or Impl
must match
Impl transaction
I
I
I is a reachable state where the commit guard is
true
39Model Checking Approaches
- Monolithic
- Cross product construction
- Compositional
- Abstraction
- Assume/Guarantee
40Compositional Approach
- Abstraction
- Change read to an access of an input var
- Self-sourced read
- Add all transitions that write to a var
- Assume/Guarantee
- Require all writes to var guarantee prop P
- Assume P holds on all reads
41Example of Abstraction
Transaction 1
Transaction Rule (v1 d1) gt ...
Endtransaction
Transaction 2
Transaction n
42Example of Assume/Guarantee
Transaction 1 Request granted
State Excl
Impl.State Spec.State
Data d
Transaction 2 Update Cache
43Benchmarks
- High level in FMCAD04 tutorial
- Low level provided by German and Janssen
- Sizes
- 1 Home node, 1 remote node
Sizes are constrained by accessible VHDL tools!
44Implementations
- Muv HMurphi ? VHDL
- Written by German
- Mud
- Static analyzer for possible conflicts /
dependencies - VHDL verifier
- IBM RuleBase
45Preliminary Results
Approaches Approaches Flip-Flops Gates Time (min)
Monolithic Monolithic 212 8574 17
Decomposed W/W conflicts 108 5763 11
Decomposed closures 89 2194 3
This is for datapath 1 bit Intel Xeon CPU
3.0GHz, 2GB memory
46When Datapath gt 1 bit
- Cannot check monolithic approach
- RuleBase 300 F-F academic license restriction
- Decomposed approach
- W/W checks not affected
Datapath bits of F-F of Gates
1 89 2194
2 97 2380
26 289 6659
47Future Work
- Reduce the cost of W/W conflicts checking
- Localized reasoning
- Apply to pipeline
- More benchmarks
- Try other VHDL tools
- SixthSense etc.
48Publications, Software, Models
- FMCAD 2006 paper
- Presentation at Intel
- Journal version of hierarchical coherence
protocol verification (under prep) - TR on Theory of Transaction Based Specification
and Verification (under prep) - Detailed VHDL-level German Protocol developed
- Analysis Framework for HMurphi Developed
- Preliminary Verification Experiments using
Cadence IFV, IBM RuleBase, and IBM SixthSense - Xiaofang Chens Summer Internship at IBM T.J.
Watson Res. Ctr. - Roberts SRC Poster
- Techcon 2007 submission
- ? There will be more publications during 2007-8
following hiatus due to infrastructure build-up
(many delays!)