Scaling%20Formal%20Methods%20toward%20Hierarchical%20Protocols%20in%20Shared%20Memory%20Processors:%20Annual%20Review%20Presentation%20 - PowerPoint PPT Presentation

About This Presentation
Title:

Scaling%20Formal%20Methods%20toward%20Hierarchical%20Protocols%20in%20Shared%20Memory%20Processors:%20Annual%20Review%20Presentation%20

Description:

(photo courtesy of Intel Corporation.) 80% of chips. shipped will be. multi-core. 4 ... Swap. Broadcast. NAck. Fwd_Req. Gnt_S. Gnt_S (S: L1-2) 8. Design ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Scaling%20Formal%20Methods%20toward%20Hierarchical%20Protocols%20in%20Shared%20Memory%20Processors:%20Annual%20Review%20Presentation%20


1
Scaling Formal Methods toward Hierarchical
Protocols in Shared Memory ProcessorsAnnual
Review Presentation April 2007
Intel SRC Customization Award 2005-TJ-1318
  • Presenters
  • Ganesh Gopalakrishnan
  • Xiaofang Chen
  • School of Computing, University of Utah
  • Salt Lake City, UT

2
Project Personnel
  • IBM Mentor Dr. Steven M. German
  • Intel Mentor Dr. Ching-Tsun Chou
  • Primary Student
  • Xiaofang Chen
  • Summer internship planned - IBM T.J. Watson
    (6/07) where the research discussed here in
    Project 2 will be furthered
  • Other SRC Student
  • Robert Palmer (work involving TLA modeling of
    communication libraries)
  • Defense May 10 Expected to join Intel (6/07)
  • 3 other PhD students, 1 MS student, 2 UGs in FV
  • all working on FV of threading / msg-passing
    software

3
Multicores are the future!Their caches are
visibly central
gt 80 of chips shipped will be multi-core
(photo courtesy of Intel Corporation.)
4
and the number of organizations of
multiprocessor caches is mindboggling (e.g.
imagine 80 cores and deeper hierarchies).
Shared / Private
Inclusive / Exclusive
5
Protocol design happens in the thick of things
(many interfaces, constraints of performance,
power, testability).
From High-throughput coherence control and
hardware messaging in Everest, by Nanda et.al.,
IBM J.RD 45(2), 2001.
6
Future Coherence Protocols
  • Cache coherence protocols that are tuned for the
    contexts in which they are operating can
    significantly increase performance and reduce
    power consumption Liqun Cheng
  • Producer-consumer sharing pattern-aware protocol
    Cheng, HPCA07
  • 21 speedup and 15 reduction in network traffic
  • Interconnect-aware coherence protocols
    Cheng, ISCA06
  • Heterogeneous Interconnect
  • Improve performance AND reduce power
  • 11 speedup and 22 wire power savings
  • Bottom-line Protocols are going to get more
    complex!

7
Designers have poor conceptual tools (e.g.,
Informal MSC drawings). Need better notations
and tools.
GDir
L1-1
L1-2
LDir
(S)
(I)
(S L1-1)
Swap
Req_S
Broadcast
Fwd_Req
NAck
Gnt_S
(S L1-2)
Gnt_S
8
Design Abstractions in More Modern Flows
  • An Interleaving Protocol Model (Murphi or TLA
    are the languages of choice here)
  • FV here eliminates concurrency bugs
  • Detailed HDL model
  • FV here eliminates implementation bugs however
  • Correspondence with Interleaving Model is lost
  • Need more detailed models anyhow
  • Interleaving Models are very abstract
  • Monolithic Verification of HDL Code Does not
    Scale
  • Design optimizations captured at HDL level
  • Interleaving model becomes more obsolete
  • Need an Integrated Flow
  • Interleaving -gt High level HW View -gt Final HDL

9
Related Work in Formal HW Design
  • BlueSpec
  • High level design is expressed using atomic
    transactions
  • Synthesizes high level designs into hardware
    implementations
  • Automatic scheduling of high level design steps
    in hardware
  • May not meet performance goals
  • Malik et.al. Formal Architecture and
    Microarchitecture Modeling for Verification
  • Meant for Instruction Set Processors
  • Need Formal theory of Refinement from
    Interleaving to High level HW Models

10
Our Goals
  • Develop Methodology to Verify Realistic
    Interleaving Models
  • Useful Benchmarks for others
  • Our particular contributions are towards
    Hierarchical protocols
  • Largely Inspired by Chou et.al.s work (FMCAD04)
  • Xiaofang Chens PhD is wrapping up a nice story
    here!
  • Develop Language and Formal Theory for Higher
    Level HW Specification Refinement
  • Ideas largely due to German Janssen
  • Xiaofang Chens PhD work is taking ideas from
    initial proposal all the way to practical
    realization!

11
A summary of our work over Y1-2
  • Three progressively better approaches to verify
    hierarchical cache coherence protocols at the
    interleaving level
  • A/G method of complementary abstractions
    (FMCAD06)
  • Extensions to Non-inclusive hierarchies (TR
    06-014)
  • Abstract each level separately (to be submitted)
  • Error-trace checking (to be submitted)
  • A theory of transaction based design and
    verification (writeup finished initial
    experiments finished)
  • Modular verification of transactions (writeup in
    progress initial experiments finished)
  • Number the projects 1.1, 1.2, 1.3, 1.4,
    2, and 3

12
Project 1.1-4 Timeline
1.3 Abstraction per level (more scalable)
1.1 FMCAD06 results
1.2 Another hierarchical benchmark
(non-inclusive)
1.4 Automatic Recognition of spurious/real bugs
13
1.1-4 Hierarchical Protocols
Home Cluster
Remote Cluster 1
Remote Cluster 2
L1 Cache
L1 Cache
L1 Cache
L1 Cache
L1 Cache
L1 Cache
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
14
Abstracted Protocol 1
Home Cluster
L1 Cache
L1 Cache
Remote Cluster 1
Remote Cluster 2
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
15
Abstracted Protocol 2
Remote Cluster 1
L1 Cache
L1 Cache
Home Cluster
Remote Cluster 2
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
16
Non-Circular Assume/Guarantee
  • We cant verify this due to state explosion
  • h r1 r2 Coh
  • Instead
  • Check-1 h R1 R2 Coh1 ? Guarant1
  • Check-2 H r1 R2 Coh2 ? Guarant2

17
1.2 We applied the non-circular A/G method to a
Non-Inclusive Hierarchical Protocol.
  • Protocol features
  • Broadcast channels
  • Non-imprecise local dir
  • Verification challenges
  • A/G cannot infer local dir from just
    intra-clusters
  • Coherence may involve multiple L1 caches

18
Verifying Non-Inclusive Protocols
  • Inferring L2.State Excl from
  • Outside the cluster
  • Inside the cluster
  • Use history variables to change non-inclusive to
    inclusive protocols

19
Experimental Results
Protocols of States Mem (GB) Model Check
Hierarchy gt 1,521,900,000 20 No
Abs-1 234,478,105 20 Y
Abs-2 283,124,383 20 Y
Reduction is over 65
20
1.3 We then tried a Split Hierarchy Per Level
Approach to using non-circular A/G
ABS 1
ABS 2
L2 CacheLocal Dir
L2 CacheLocal Dir
L2 CacheLocal Dir
RAC
RAC
RAC
Global Dir
Main Memory
ABS 3
21
A Sample Scenario
Home Cluster
Remote Cluster 1
Remote Cluster 2
Excl
Invld
4. Fwd Req_Ex
5. Grant
1. Req_Ex
6. Grant
3. Fwd Req_Ex
2. Fwd Req_Ex
22
Map to Abstracted Protocols
Remote Cluster 1
Remote Cluster 2
Invld
Excl
4. Fwd Req_Ex
5. Grant
1. Req_Ex
6. Grant
2. Fwd Req_Ex
3. Fwd Req_Ex
23
Experimental Results
Protocols of States Exec time (sec) Mem (GB) Model Check
Hierarchy gt 438,120,000 gt125,799 18 No
Inter 1,500,621 269 2 Y
Intra-1 564,878 48 2 Y
Intra-2 188,842 18 2 Y
Reduction is over 95 !
24
Project 1.4 Automatic Recognition of Spurious /
Real Bugs in these approaches
  • Problem statement
  • Given an error trace of ABS protocol
  • Is it a real bug of the original protocol?
  • Solution
  • In the original protocol, using BFS to guide the
    model checking to match the error trace

Reason because our abstraction is just projection
25
Basic Idea of Automatic Recognition
Error trace of Abs. protocol
Directed BFS of original protocol
v10, v20, v30
v10, v20
keep
keep
drop
v11, v22, v31
v13, v21, v30
v10, v20, v33
v11, v22



v16, v28
26
Y3 Plans for Project 1
  • Considerable Experience Gained
  • Three Large Benchmark Protocols (each is 3000
    lines of Murphi Code)
  • on the web
  • Have Reduced Verif Complexity of Hier Protocols
    by 90
  • Can Identify Spurious Errors Automatically
  • All Finite-state
  • Not Parameterized
  • No plans for Parameterized
  • Y3 Plans Build Tool to support this methodology

27
Summary of Projects 2 and 3
  • Three progressively better approaches to verify
    hierarchical cache coherence protocols at the
    interleaving level
  • A/G method of complementary abstractions
    (FMCAD06)
  • Extensions to deeper, and non-inclusive
    hierarchies (TR 06-014)
  • Latest method that abstracts each level
    separately (to be submitted)
  • Error-trace checking (to be submitted)
  • A theory of transaction based design and
    verification (writeup finished)
  • Modular verification of transactions (writeup in
    progress)

28
Transaction Level HW Modeling
  • The problem addressed Bridge the gap between
    high-level specifications and RTL implementations
  • Global properties cannot be formally verified at
    RTL Level!
  • Specifications can be verified, but do they
    correctly represent the implementations?

29
Driving Design Benchmark due to German and Geert
Janssen
30
What changes when moving from a spec to an
implementation?
  • Atomicity
  • Concurrency
  • Granularity in modeling

1
1.1
1.3
home
client
home
client
1.2
router
buffer
31
General Mappings between high level transitions
and transactions that help implement them
High Level Transition 1
1
High Level Transitions take some non-zero unit
of time (conceptual)
Low Level Transitions that help realize 1
1.2
1.1
Each Low Level Transition takes One Clock Cycle
1.3
32
High-Level and Low-Level Computations
1
2
3
1.2
1.1
1.3
2.1
2.2
3.1
3.3
3.2
33
Specification of High and Low Levels
1
In Murphi as a Guard ? Action Rule
1.2
1.1
In HMurphi as Multiple Guard ? Action
Rules enclosed in a Begin Transaction / End
Transaction The Guards Decide when each low
level transition can fire The Maximal Number of
Low Level Transitions Enabled in any state are
concurrently fired within each clock tick
1.3
34
Transaction
  • A transaction is a set of transitions in Impl
    that correspond to a transition in Spec

Transaction Rule 1 Rule
n Endtransaction
35
Executions
  • Spec interleaving
  • One enabled transition fires at each step
  • Impl concurrent
  • All enabled transitions fire at each step


1 2 3
1.1, 2.1 1.2 2.2, 3.1, 3.2

36
A Few Notations
  • Observable variables VH
  • These are Variables used in both Spec and Impl
  • Impl has additional internal variables also
  • A variable v is inactive at a state s if all
    transactions in Impl that can write to v are
    quiescent at s

37
A Formal Notion of Simulation
  • For every concurrent execution of Impl, exists an
    interleaving execution of Spec, VH n inactive(li)
    match



l0
l1
l2
t0 t1 t2

h0
h2
h1
38
Simulation Checks
Guard for Spec transition must hold
Spec transition
Spec(I)
Spec(I)
Observable vars changed by either Spec or Impl
must match
Impl transaction
I
I
I is a reachable state where the commit guard is
true
39
Model Checking Approaches
  • Monolithic
  • Cross product construction
  • Compositional
  • Abstraction
  • Assume/Guarantee

40
Compositional Approach
  • Abstraction
  • Change read to an access of an input var
  • Self-sourced read
  • Add all transitions that write to a var
  • Assume/Guarantee
  • Require all writes to var guarantee prop P
  • Assume P holds on all reads

41
Example of Abstraction
Transaction 1
Transaction Rule (v1 d1) gt ...
Endtransaction
Transaction 2

Transaction n
42
Example of Assume/Guarantee
Transaction 1 Request granted
State Excl

Impl.State Spec.State
Data d
Transaction 2 Update Cache
43
Benchmarks
  • High level in FMCAD04 tutorial
  • Low level provided by German and Janssen
  • Sizes
  • 1 Home node, 1 remote node

Sizes are constrained by accessible VHDL tools!
44
Implementations
  • Muv HMurphi ? VHDL
  • Written by German
  • Mud
  • Static analyzer for possible conflicts /
    dependencies
  • VHDL verifier
  • IBM RuleBase

45
Preliminary Results
Approaches Approaches Flip-Flops Gates Time (min)
Monolithic Monolithic 212 8574 17
Decomposed W/W conflicts 108 5763 11
Decomposed closures 89 2194 3
This is for datapath 1 bit Intel Xeon CPU
3.0GHz, 2GB memory
46
When Datapath gt 1 bit
  • Cannot check monolithic approach
  • RuleBase 300 F-F academic license restriction
  • Decomposed approach
  • W/W checks not affected

Datapath bits of F-F of Gates
1 89 2194
2 97 2380
26 289 6659
47
Future Work
  • Reduce the cost of W/W conflicts checking
  • Localized reasoning
  • Apply to pipeline
  • More benchmarks
  • Try other VHDL tools
  • SixthSense etc.

48
Publications, Software, Models
  • FMCAD 2006 paper
  • Presentation at Intel
  • Journal version of hierarchical coherence
    protocol verification (under prep)
  • TR on Theory of Transaction Based Specification
    and Verification (under prep)
  • Detailed VHDL-level German Protocol developed
  • Analysis Framework for HMurphi Developed
  • Preliminary Verification Experiments using
    Cadence IFV, IBM RuleBase, and IBM SixthSense
  • Xiaofang Chens Summer Internship at IBM T.J.
    Watson Res. Ctr.
  • Roberts SRC Poster
  • Techcon 2007 submission
  • ? There will be more publications during 2007-8
    following hiatus due to infrastructure build-up
    (many delays!)
Write a Comment
User Comments (0)
About PowerShow.com