Modeling and Validation of Programmable Embedded Systems - PowerPoint PPT Presentation

1 / 111
About This Presentation
Title:

Modeling and Validation of Programmable Embedded Systems

Description:

Modeling and Validation of Programmable Embedded Systems. Prabhat Mishra ... (English Document) RTL. Design. Simulation. Manual Process. Bottlenecks of ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 112
Provided by: prabhat
Category:

less

Transcript and Presenter's Notes

Title: Modeling and Validation of Programmable Embedded Systems


1
Modeling and Validation of Programmable Embedded
Systems
  • Prabhat Mishra
  • Dept. of Computer and Information Science and
    Engineering
  • University of Florida

2
Outline
  • Ongoing research
  • Modeling and Validation of Programmable Embedded
    Systems
  • Programmable embedded systems
  • Motivation
  • Traditional validation techniques
  • Language-driven validation methodology
  • Conclusion

3
Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
4
Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
Specification
HW/SW Partitioning
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
5
Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
6
Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
7
Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
8
Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
9
Ongoing Research
  • Validation of Specification
  • ACM TECS 2004, Kluwer DAES 2003, DATE 2002,
    ASPDAC 2002
  • Design Space Exploration
  • ACM TECS 2004, VLSI 2004, RSP 2003, ISSS 2001,
    VLSI 2001
  • Instruction-Set Simulation
  • CODESISSS 2005, DAC 2003, CODESISSS 2003
  • Functional Test Generation
  • DATE 2005, DATE 2004, HLDVT 2002
  • Equivalence Checking
  • IEEE DesignTest 2004, IJES 2005

10
Outline
  • Ongoing research
  • Modeling and Validation of Programmable Embedded
    Systems
  • Programmable embedded systems
  • Motivation
  • Traditional validation techniques
  • Language-driven validation methodology
  • Conclusion

11
Programmable Embedded Systems
  • Computing is an integral part of daily life
  • Two types of computations
  • Desktop-based systems
  • PC, laptops, workstations, servers,
  • Embedded systems
  • handheld and household items, military and
    medical equipments

12
Programmable Embedded Systems
  • Computing is an integral part of daily life
  • Two types of computations
  • Desktop-based systems
  • PC, laptops, workstations, servers,
  • Embedded systems
  • handheld and household items, military and
    medical equipments
  • Difference
  • Application specific versus general purpose
  • Commonality
  • Use processor, co-processor, and memories to
    execute application programs
  • Programmable Embedded Systems
  • Programmable Architectures

13
Programmable Embedded Systems
Programmable Embedded Systems
A2D Converter
Processor Core
Coprocessor
ASIC / FPGA
Coprocessor
D2A Converter
Memory Subsystem
Sensors / Actuators
DMA Controller
Embedded Systems
14
Outline
  • Ongoing research
  • Modeling and Validation of Programmable Embedded
    Systems
  • Programmable embedded systems
  • Motivation
  • Traditional validation techniques
  • Language-driven validation methodology
  • Conclusion

15
Technology and Demand
of transistors are doubling every 2 years
Demand
Technology
Communication, multimedia, entertainment,
networking
Exponential growth of design complexity ?
verification complexity
16
North America Re-spin Statistics
100
48
44
39
1st Silicon Success
2004
1999
2002
Source 2002 Collett International Research and
Synopsys
71 SOC re-spins are due to logic bugs
17
Functional Verification of SOC Designs
2000
1000B
2007
200
10B
2001
Engineer Years
Simulation Vectors
100M
20
1995
100M
1M
10M
Logic Gates
Source Synopsys
Source G. Spirakis, keynote address at DATE 2004
18
Functional Validation of Microprocessors
  • Functional validation is a major bottleneck
  • Deeply pipelined complex micro-architectures
  • Logic bugs increase at 3-4 times/generation
  • Bugs increase (exponential) is linear with design
    complexity growth.

19
Outline
  • Motivation
  • Traditional validation techniques
  • Language-driven validation methodology
  • Complements existing techniques
  • Conclusion
  • Future research directions

20
Traditional Validation Approach
Manual Process
21
Traditional Validation Approach
Manual Process
22
Bottlenecks of Functional Verification
  • Bottom-up methodology
  • Lack of a golden reference model
  • Difficult to find micro-architectural bugs
  • Uses reverse-engineering (abstraction) methods
  • Specification has all the details
  • Lack of a suitable functional coverage metric
  • Code coverage, toggle coverage not sufficient
  • Cannot determine if all pipeline interactions
    (with hazards/exceptions) are considered.
  • Approach
  • A top-down validation methodology
  • Complements existing bottom-up techniques

23
Proposed Top-down Validation Methodology
http//www.ics.uci.edu/express
ADL Architecture Description Language
24
Test Generation
25
Functional Validation of Pipelined Processors
Test Generator
Pipelined Processor
TestGen
MOV R1, 011 MOV R2, 010 ADD R3, R1, R2 R3 101
Test Program
R3 101 ?
Check Result
Verifies the functionality of the processor using
assembly programs
26
Functional Validation of Pipelined Processors
Test generation is considered in this work
Test Generation
Pipelined Processor
TestGen
MOV R1, 011 MOV R2, 010 ADD R3, R1, R2 R3 101
Test Program
R3 101 ?
Check Result
27
Related Work Test Generation
  • Directed test program generation
  • Aharon et al., DAC 1995
  • Shen et al., DAC 1999
  • Pipeline behavior is not considered
  • Test generation for pipelined processors
  • Ur and Yadin, DAC 1999
  • Iwashita et al., ICCAD 1994
  • Campenhout et al., DAC 1999
  • No coverage metric for pipeline interactions
  • Functional test program generation
  • Chen et al., DAC 2003, Lai et al., DAC 2001
  • Applied in the context of manufacturing testing

28
Functional Test Program Generation
  • Processor model
  • Graph model for pipelined processors
  • Functional fault model
  • Pipeline interactions based on graph coverage
  • Coverage-directed test generation technique
  • Model checker to generate test programs
  • write the negation of the property to be verified
  • model checker generates example to disprove
    (counter-example)

29
Pipelined Processor Model
Graph Model
Graph (Nodes, Edges) Nodes units U
storages Edges data-transfer edges U
pipeline edges
30
Functional Fault Model
Fetch
MEM
Decode
1. Node Fault A node does not execute
correctly - active - stalled -
exception - flushed 2. Edge Fault An
edge does not transfer inst./data correctly
- active - stalled - flushed
ALU
AddrCalc
RF
LdSt
WB
31
Coverage-directed Test Generation
  • Algorithm
  • Inputs
  • 1. Graph Model of the processor, G
  • 2. List of possible faults, faultList
  • Output Test programs for detecting all the
    faults in the fault model
  • begin
  • TestProgramList
  • for each fault in the faultList
  • testprogreg createTestProgram(fault , G)
  • TestProgramList TestProgramList U testprogreg
  • endfor
  • return TestProgramList
  • end

32
Coverage-directed Test Generation
Fetch
  • Example generate test to make edge LdSt-ALU
    active
  • Two properties need to be generated
  • Make the node LdSt active at clock cycle t
  • Make the node ALU active at clock cycle (t1)
  • Test Program
  • LOAD R1, R5, 0x1
  • NOP
  • MOV R3, R1

Decode
AddrCalc
ALU
RF
Memory (latency 1)
LdSt
WB
33
Test Generation Methodology
Architecture Specification
ADL Specification
Simulator Generation
SMV
Not Enough Properties
Counterexamples
Coverage Report
Simulator
Automatic
ADL Architecture Description Language
Manual
Test Programs
Feedback
34
Test Generation Example
  • Initialize registers Ain and Bin with values 2
    and 3 at cycle 9

One property assert G ((cycle8) ? X ((DIV.Ain
2) (DIV.Bin 3))) Apply at processor
level needs 375.98 sec. and 1928568 BDD nodes
using 333 MHz Sun with 128M RAM
Problem Test generation is limited by the
capacity restrictions of the tool.
Solution Apply properties at the module level
35
Modified Test Generation Methodology
Properties are applied at the module level
SMV Description (for node N)
Property (for node N)
SMV
N parent of N
N parent of N
Counterexamples
input assignments
primary i/p?
output req. for parent node
yes
Simulator
coverage report
test programs
36
Test Generation Example
  • Example Initialize Ain and Bin with values 2 and
    3 at cycle 9
  • Apply to DIV unit
  • assert G ((cycle8) ? X((Ain 2) (Bin 3)))
  • input assignments divInst.src1 2, divInst.src2
    3
  • Apply to Decode unit
  • assert G((cycle7) ? X((divInst.src1 2)
    (divInst.src2 3)))
  • input assignments oper DIV R3 R1 R2 RF12,
    RF23
  • Apply to Fetch unit
  • assert G((cycle6) ? X(oper.opcode DIV)
    (oper.src1 1) (oper.src2 2)))
  • input assignments PC5, Memory5 DIV R3 R1 R2

37
Final Test Program Example
Fetch Cycle Opcode Dest Src1 Src2
1 NOP 2
ADDI R1 R0 2 3
ADDI R2 R0 3 4
NOP 5 NOP 6
NOP 7 DIV
R3 R1 R2
  • Using our modified methodology
  • requires 1 sec. and 5600 BDD nodes
  • 333 MHz Sun UltraSparc II with 128M RAM
  • When applied at the processor level
  • requires 375.98 sec. and 1928568 BDD nodes
  • An order of magnitude improvement time/space

38
Functional Coverage
  • When to end the verification effort?
  • Code coverage, toggle coverage, fault coverage
  • No direct relation with the device functionality
  • Proposed a functional fault model for pipelined
    processors
  • Register read/write, operation execution,
    execution path, and pipeline execution
  • Used to define functional coverage
  • Developed coverage-driven test generation
    algorithms
  • Generates test programs to detect all the faults
    in the fault model

39
Functional Fault Models
  • Register Read/Write
  • All registers are written and read.
  • Operation Execution
  • All operations are executable.
  • Execution Path
  • Each execution path (taken by an operation) works
    correctly
  • Consists of one pipeline path and multiple
    data-transfer paths
  • Pipeline Execution
  • All pipeline interactions are activated.

40
Register Read/Write Faults
  • The fault can be due to an error in
  • reading
  • register decoding
  • register storage
  • prior writing
  • Whatever may be the reason, the outcome is an
    unexpected value.

41
Operation Execution Faults
  • The fault can be due to an error in
  • Operation decoding
  • erroneous decoding returns incorrect opcode
  • Control generation
  • incorrect execution unit gets selected
  • Final implementation
  • execution unit can be faulty
  • The outcome is an unexpected result.

42
Faults in Execution Path
  • Execution path
  • During execution of an operation, one pipeline
    path and one/more data-transfer paths get
    selected
  • these activated paths are defined as execution
    path
  • The fault can be due to an error in any of the
    paths
  • A path is faulty if any of its nodes or edges are
    faulty
  • A node is faulty if does not execute correctly
  • An edge is faulty if it does not transfer
    data/inst. correctly
  • The outcome is an unexpected result.

43
Pipeline Execution Faults
  • The fault can be due to an incorrect
    implementation of the pipeline controller
  • Erroneous hazard detection
  • Incorrect stalling
  • Erroneous flushing
  • Wrong exception handling
  • The outcome is an unexpected result.

44
Test Generation for Register Read/Write
  • Algorithm 1
  • Input Graph model of the architecture G.
  • Output Test programs for detecting faults in
    reg. read/write.
  • begin
  • TestProgramList
  • for each register reg in architecture G
  • valuereg GenerateUniqueValue(reg)
  • writeInst an instruction that writes valuereg
    in reg.
  • testprogreg createTestProgram(writeInst)
  • TestProgramList TestProgramList ?
    testprogreg
  • endfor
  • return TestProgramList
  • end

CreateTestProgram 1. Assigns values to
unspecified locations 2. Creates initialization
instructions for sources 3. Creates instructions
for reading destinations
45
A Case Study
  • Applied on two pipelined architectures
  • VLIW implementation of DLX
  • RISC implementation of Sparc V8 (LEON)
  • Architecture Specification
  • Using Architecture Description Language (ADL)
  • EXPRESSION ADL
  • Test generation and coverage estimation
  • Random/Constrained-random test generation
  • Using Specman Elite framework
  • Coverage-driven test generation
  • Using our test generation algorithms

46
Test Generation and Coverage Estimation
Architecture Specification (ADL Description)
Automatic
Manual
ISA Specification (e Description)
Coverage Specification
Pipelined Implementation (e Description)
Coverage Estimation
Simulator
Random
Test Generation
Specman Elite
Directed
External Test Programs (generated by our
algorithms)
47
Coverage Estimation
  • Instruction definition is used
  • opcode, dest, src1, src2
  • Register read/write
  • coverage of src1 and src2 indicates reads.
  • coverage of dest indicates writes.
  • Operation execution
  • coverage of opcode field
  • Pipeline execution
  • use variable for each stall/exception
  • cross-coverage is used to estimate coverage of
    multiple exception scenarios.

48
Validation Flow
49
Test Generation for VLIW DLX
An entry indicates number of test programs
indicates the fault coverage using the given test
programs for that fault model
Random or constrained-random techniques could not
activate any multiple exception scenarios -
Low coverage in pipeline execution
50
Test Generation for LEON2 Processor
An entry indicates number of test programs
indicates the fault coverage using the given test
programs for that fault model
  • The trend is similar in both architectures.
  • Due to bigger pipeline structure (more
    interactions) VLIW
  • DLX has lower fault coverage than LEON2 model.

51
The Framework is Available
  • https//www.verificationvault.com
  • It includes
  • VLIW DLX models
  • e specification for reference (ISA model)
  • Pipelined implementation in e.
  • Components for random/directed test generation
    and incorporation of external tests
  • Components for data/temporal checking and
    coverage estimation.

52
Conclusion
  • Functional validation is a major bottleneck
  • Existing methods employ bottom-up approach
  • Developed a top-down validation methodology
  • Uses an architecture specification
  • Validate the ADL specification
  • Serves as a golden reference model
  • Specification-driven design automation
  • Design space exploration
  • Implementation validation using equivalence
    checking
  • Functional test program generation
  • Complements existing verification techniques

53
More on this topic
  • Publications are available online
  • http//www.cise.ufl.edu/prabhat
  • Contact me
  • prabhat_at_cise.ufl.edu
  • Recent Book
  • Functional Verification of Programmable Embedded
    Architectures A Top-Down Approach
  • P. Mishra and N. Dutt, Springer, June 2005

54
Future Research Directions
  • Architecture Specification
  • Completeness criteria for specification
    validation
  • Design Space Exploration
  • Generate high quality models from the
    specification
  • Design Verification
  • Model generation without implementation knowledge
  • Test Generation
  • Confluence of validation and manufacturing
    testing
  • Extend current methodology for validation of
  • Architectures with multiple-processor cores
  • Embedded Systems

55
  • Thank you

56
Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
57
Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
58
Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
59
Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
60
Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
61
Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode) (WB RF) ) (TYPE
BI (MEM MEMORY) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
62
Specification of the DLX Processor
Structure
PC
Memory
Fetch
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
Behavior
(OPCODE ADD (OPERANDS (SRC1 rf) (SRC2 imm)
(DEST rf)) (BEHAVIOR DEST SRC1 SRC2)
(FORMAT ) )
FADD3
FADD4
MUL7
MEM
WriteBack
63
Specification of the DLX Processor
Structure
PC
Memory
Fetch
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
Behavior
Mapping
(OPCODE ADD (OPERANDS (SRC1 rf) (SRC2 imm)
(DEST rf)) (BEHAVIOR DEST SRC1 SRC2)
(FORMAT ) )
FADD3
FADD4
MUL7
MEM
WriteBack
64
Validation of Static Behavior
  • Graph based modeling of architectures
  • Verify properties
  • Connectedness
  • False pipeline and data-transfer paths
  • Completeness
  • Finiteness
  • Validated ADL specifications
  • DLX, MIPS R10K, TI C6x, and PowerPC
  • Validation time is in the order of seconds

65
Validation of Dynamic Behavior
  • FSM based modeling of pipelined processors
  • Verify properties
  • Determinism
  • In-order execution
  • Validated DLX processor specification
  • Developed two frameworks
  • Equation solver based using Espresso
  • Model checker based using SMV

66
False Pipeline Path
Fetch
o
Read1
mac
alus
ALU
MUL
o
o
Reg File
o
Read2
o
alus
mac
Shift
ACC
o
WB
Supports two operations alus (ALU-shift) and mac
(multiply-accumulate)
67
False Pipeline Path
Fetch
o
Read1
ALU
MUL
o
o
Reg File
o
Read2
o
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB
Fetch, Read1, ALU,
Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
68
False Pipeline Path
Fetch
o
Read1
ALU
MUL
o
o
Reg File
o
Read2
o
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
69
False Pipeline Path
Fetch
o
Read1
alus
ALU
MUL
o
o
Reg File
o
Read2
o
mac
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
70
False Pipeline Path
Fetch
o
Read1
mac
ALU
MUL
o
o
Reg File
o
Read2
o
alus
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
71
False Pipeline Path
Fetch
o
Read1
ALU
MUL
o
o
Reg File
o
Read2
o
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB X
Fetch, Read1, MUL, Read2, Shift,
WB X
False pipeline paths
72
False Pipeline Path
Algorithm
Fetch
Inputs 1. Graph model of the architecture
2. Each unit has a list of supported
opcodes Output True, if the property is
satisfied else false. 1. Traverse each node of
the graph starting from root if node is
root OutLroot SopLroot / Supported
opcodes / else InL OutLparent
/ recently visited parent / OutLnode
SopLnode n InL endif If OutLnode is
NULL report false pipeline paths. 2. Return true
if there are no false pipeline paths.
Read1
ALU
MUL
Read2
Shift
ACC
WB
InL Input List SopL Supported opcode
list OutL Output list
73
A Fragment of the Processor Pipeline
Stagei-1
Stagei
Pipeline Latch Instruction Register
(IR) Latchji IR i, j
Stagei1
IR i, j receives instructions from p parent
units and sends them to q children units
74
Processor Pipeline Flow Conditions
Time
t
t1
t
t1
t
t1
Stage
i
i
i
i1
i1
i1
Normal Flow
Nop Insertion
Stall
Flow conditions for pipeline latches
t
t1
Time
t
t1
t
t1
pc
pc
pc
new
pc
pc
PC
PC
PC
Sequential Execution
Branch Taken
Stall
Flow conditions for Program Counter (PC)
75
FSM Model of Processor Pipelines
  • Define state of a n-stage pipeline as values of
  • Program Counter
  • Pipeline Latches / Instruction Registers
  • S(t) lt PC(t), IR1,1(t), IR1,2(t), .,
    IRn-1, nn-1(t) gt
  • where, stage i has ni pipeline latches.
  • Modelling flow conditions in FSM
  • A latch IRi,j ( j-th latch in stage i ) is
    stalled
  • Due to stall of children
  • Due to hazards, exceptions etc. on that latch
  • condSTIR i,j STIR i,j STchildIR i,j
    STselfIR i,j

76
Modeling state transition functions
  • PC(t1) S(t) State of the
    pipeline
  • fNSPC(S(t), I(t)) I(t)
    Set of external signals
  • PC(t) L if condSEPC(S(t), I(t)) 1
  • target if condBTPC(S(t), I(t)) 1
  • PC(t) if condSTPC(S(t), I(t))
    1
  • IRi,j(t1)
  • fNS IR i,j(S(t), I(t))
  • IRi-1,j(t) if condNFIR i,j(S(t), I(t))
    1
  • IRi,j(t) if condSTIR i,j(S(t), I(t))
    1
  • nop if condNIIR i,j(S(t),
    I(t)) 1

77
Verification of Determinism
  • All state registers must be deterministic
  • Three state functions must cover all possible
    combinations
  • condSEPC condSTPC condBTPC 1
  • condNFIR i,j condSTIR i,j condNIIR i,j 1
  • Two conditions are disjoint for each next state
    function
  • condxPC . condyPC 0
  • conduIR i,j . condvIR i,j 0

78
Verification of In-Order Execution
  • State transitions of adjacent instruction
    registers must depend on each other.
  • An instruction register cannot be in normal flow
    if all the parent instruction registers (adjacent
    ones) are stalled.
  • condSTIR i-1,j . condNFIR i,k 0 ( for all i,
    j, k )
  • If such a combination is allowed, the instruction
    is duplicated and stored into both IRi-1,j and
    IRi,k in the next cycle.

IRi-1,j
Stall
IRi,j
Normal Flow
79
Automatic Verification Framework
Processor Core
EXPRESSION ADL
FSM Model
Equations
Eqntott
Espresso
Success
Analyze
Failure
80
A Case Study
  • Applied this methodology on DLX processor
  • ADL Specification
  • (DecodeUnit Decode
  • (CONDITIONS
  • (NF ANY ANY)
  • (ST ALL)
  • (NI ALL ANY)
  • (SELF )
  • )
  • )
  • Flow Conditions
  • condSTDEC STEX . STM1 . STA1 . STDIV

A fragment of the DLX pipeline
81
A Case Study
  • A small trace of the property checking in our
    validation framework
  • condNFDEC condSTDEC condNIDEC
  • STPC . (STEX STM1 STA1 STDIV )
  • (STEX . STM1 . STA1 . STDIV )
  • STPC . (STEX STM1 STA1 STDIV )
  • (STEX STM1 STA1 STDIV ) . (STPC STPC )
  • (STEX . STM1 . STA1 . STDIV)
  • 1

82
Model Generation from Specification
  • Model generation is a major challenge
  • Wide varieties of architectures
  • RISC, DSP, VLIW, and superscalar
  • Simulator, hardware, and validation models
  • Developed a functional abstraction scheme
  • Compose abstraction primitives to generate new
    architecture using functional abstraction
  • Retargetable simulator generation (ISSS 2001)
  • Synthesizable RTL generation (RSP03, VLSI04)

83
Functional Abstraction
  • Similarities
  • Computation units connected using ports, buses,
    and latches
  • Structures Behaviors
  • Differences
  • Same unit with different parameters
  • Same functionality in different unit
  • New architectural features
  • Define generic functions and sub-functions
  • Compose functions to create new architecture

84
Functional Abstraction of Architectures
  • Structure of a generic processor
  • functions for units (fetch, decode, issue, )
  • sub-functions for computations (read, write, )

Example A Fetch Unit FetchUnit ( read per
cycle n, res_Station size, ........ ) address
ReadPC() Instructions ReadInstMemory(address,
n) WriteToReservationStation(Instructions,
n) outInst ReadFromReservationStation(m) WriteLa
tch(decode_latch, outInst) pred
QueryPredictor(address) If pred nextPC
QueryBTB(address) SetPC (nextPC) else
IncrementPC(x)
85
Functional Abstraction of Architectures
  • Define generic functions and sub-functions
  • Structure of a generic processor
  • functions for units (fetch, decode, issue,
    res-station)
  • sub-functions for computations (read, write, )
  • Behavior of a generic processor
  • functions for each operation (add, sub, mul,
    div, )

86
Functional Abstraction of Architectures
  • Define generic functions and sub-functions
  • Structure of a generic processor
  • functions for units (fetch, decode, issue,
    res-station,)
  • sub-functions for computations (read, write, )
  • Behavior of a generic processor
  • functions for each operation (add, sub, mul, div,
    )
  • Generic memory subsystem
  • functions for each component (cache, SRAM, SB, )
  • Generic controller
  • Interrupts and exceptions
  • DMA, Co-processors etc.
  • Compose functions to create new architecture

87
Step 1 Read ADL Specification
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) .......... )
Mapping
Behavior
(OPCODE ADD (OPERANDS (SRC1 rf) (SRC2 imm)
(DEST rf)) (BEHAVIOR DEST SRC1 SRC2)
(FORMAT 0101 dest(27-23), src1(22-18), )
)
88
Step 2 Compose Structure
  • DLX ( .. )
  • FetchUnit ( 4, 0, )
  • .
  • DecodeUnit ( )
  • ..
  • ..
  • ..
  • Controller ( )

Reservation Station size
Input/output ports
fetches
89
Step 3 Compose Behavior
  • DLX ( .. )
  • FetchUnit (4, 0, , )
  • -- No reservation station (instruction buffer)
    processing
  • DecodeUnit (.)
  • -- Use binary description and operation mapping
    to
  • -- decide where to send the current operation.
  • ..
  • ..
  • Controller ( . )
  • -- Use control table to stall/unstall/flush the
    pipeline .

90
TI C6x Memory Exploration using GSR
91
Hardware Generation Exploration
Config 1 IF ? ID ? EX1 ? MEM ? WB Config 2
EX1 IF ? ID MEM ?
WB EX2 Config 3
EX1 IF ? ID ? EX2 ? MEM ? WB
EX3 Config 4 EX1
EX2 IF ? ID MEM ? WB
EX3 EX4
  • Schedule length improves due to addition of
    pipeline paths (area, power increases)
  • Fourth configuration is interesting since both
    area and performance improves

92
Exploration Experiments
Exploration varying MIPS R10K processor features
93
Exploration Experiments
Co-processor based Exploration using TI C6x
94
Energy Performance Tradeoff for Compress
95
Addition of Pipeline Stages
  • Clock frequency improves due to addition of
    pipeline stages
  • 4th configuration generated 30 speed improvement
    at the cost of 13 area increase

Cfg 1 1-stage multiplier Cfg 2 2-stage
multiplier Cfg 3 3-stage multiplier Cfg 4
4-stage multiplier
96
Addition of Operations
  • Schedule length improves due to addition of
    operations in units
  • Third configuration generated the best possible
    schedule length

97
RTL Design Validation
Architecture Specification
Success
RTL Design (Implementation)
Reference Model (Properties)
Reference Model (Complete Design)
Different
Failure
Symbolic Simulation
Equivalence Checker
Equivalent
Successful
98
Property Checking using Symbolic Simulation
  • Design Carry Lookahead Adder
  • Three inputs in0, in1, in2
  • One output out
  • One simple property
  • assign out in0 in1 in2
  • Verification failed
  • Incomplete specification of in2
  • With clear and set logic
  • assign temp ( in2 clear ) set
  • assign out in0 in1 temp

Architecture Specification (English Document)
Properties (Verilog)
RTL Design (Verilog)
State Machine
Boolean Model
Symbolic Simulation
99
Property Checking Experiments
  • TLB miss detection
  • assign input ( 1'b1, vsid023,
    ea49, ea1013 )
  • assign out0 ( valid0, data0023,
    data02429, data05457 )
  • assign out1 ( valid1, data1023,
    data12429, data15457 )
  • assign hit0 ( input out0 )
  • assign hit1 ( input out1 )
  • assign miss ( hit0 hit1 )
  • Applicable to BAT array miss detection

TLB
100
A Case Study
  • The Architecture
  • DLX Processor
  • 20 nodes
  • 24 edges
  • 91 instructions

Unit
Storage
Pipeline edge
Data-transfer edge
101
Experiments
  • DLX processor 20 nodes, 24 edges, 91
    instructions
  • 223 test programs needed to cover all single
    faults
  • Reduction possible - 43 test programs
  • Random/constrained-random techniques requires an
    order of magnitude more test programs to cover
    these faults.

102
Functional Fault Models
  • Fault model for register read/write
  • Registers should be written and read correctly
  • Fault model for operation execution
  • Operations must execute correctly
  • Fault model for execution path
  • An operation must execute correctly in all
    supported paths (pipeline data-transfer).
  • Fault model for pipeline execution
  • The pipeline should produce correct result in the
    presence of multiple interactions

103
Test Generation for Register Read/Write
  • Algorithm
  • Input Graph Model of the processor, G
  • Output Test programs for detecting faults in
    register read/write function
  • begin
  • TestProgramList
  • for each register reg in processor G
  • valuereg generateUniqueValue(reg)
  • writeInst an instruction that writes in
    register reg
  • testprogreg createTestProgram(writeInst)
  • TestProgramList TestProgramList U testprogreg
  • endfor
  • return TestProgramList
  • end

104
Publications
Exploration
Static Behavior
IEEE Design Test
2004
(ACM TECS)
(ACM TECS)
DATE
Simulator Generation
Dynamic Behavior
(CODESISSS, DAC)
2003
Equivalence Checking (MTV)
(Kluwer DAES)
Hardware Generation (RSP, VLSI Design)
Test Generation
ASPDAC
2002
(HLDVT)
DATE
Symbolic Simulation (MTV)
http//www.ics.uci.edu/pmishra
Memory Specification
(VLSI Design)
2001
HLDVT
ISSS
Coprocessor Specification
(SASIMI)
Architecture Specification
Specification Validation
Model Generation
Design Validation
105
Publications
  • JOURNALS
  • P. Mishra et al., Processor-memory co-exploration
    using an architecture description language. ACM
    Transactions on Embedded Computing Systems
    (TECS), 3(1), 2004.
  • P. Mishra et al., Modeling and validation of
    pipeline specifications. ACM TECS, 3(1), 2004.
  • P. Mishra et al., A top-down methodology for
    validation of microprocessors. IEEE Design Test
    of Computers, 2004.
  • P. Mishra et al., Towards automatic validation of
    dynamic behavior in pipeline specifications.
    Kluwer Design Automation for Embedded Systems
    (DAES), 8(2), 2003.
  • P. Mishra et al., Functional abstraction driven
    design space exploration of programmable embedded
    systems, Under revision in ACM TODAES.
  • P. Mishra et al., A Methodology for Validation of
    Microprocessors using Symbolic Simulation,
    Inderscience International Journal of Embedded
    Systems (IJES), 2004. Invited Paper
  • BOOK CHAPTER
  • P. Mishra et al., Modeling and verification of
    pipelined embedded processors in the presence of
    hazards and exceptions, in Design and Analysis of
    Distributed Embedded Systems, Bernd Kleinjohann
    et al., Editors, Kluwer Academic Publishers,
    2002.
  • CONFERENCES
  • P. Mishra et al., Graph-based functional test
    program generation for pipelined processors,
    DATE, 2004.
  • P. Mishra et al., Synthesis-driven exploration of
    pipelined embedded processors, VLSI Design, 2004.
  • M. Reshadi, P. Mishra, and N. Dutt, Instruction
    set compiled simulation a technique for fast and
    flexible instruction set simulation, DAC, 2003.

106
Publications
  • M. Reshadi, N. Bansal, P. Mishra, and N. Dutt, An
    efficient retargetable framework for
    instruction-set simulation, CODESISSS, 2003.
  • P. Mishra et al., Automatic verification of
    in-order execution in microprocessors with
    fragmented pipelines and multi-cycle functional
    units. DATE, 2002.
  • P. Mishra et al., Automatic Modeling and
    Validation of Pipeline Specifications driven by
    an Architecture Description Language. ASP-DAC /
    VLSI Design, 2002.
  • P. Mishra et al., Processor-Memory Co-Exploration
    driven by a Memory- Aware Architecture
    Description Language, VLSI Design, 2001.
  • P. Mishra et al., Functional Abstraction driven
    Design Space Exploration of Heterogeneous
    Programmable Architectures, ISSS, 2001.
  • WORKSHOPS
  • P. Mishra et al., Rapid Exploration of Pipelined
    Processors through Automatic Generation of
    Synthesizable RTL Models, IEEE Workshop on Rapid
    System Prototyping (RSP), 2003.
  • P. Mishra et al., A Methodology for Validation of
    Microprocessors using Equivalence Checking, IEEE
    Workshop on Microprocessor Test and Verification
    (MTV), 2003.
  • P. Mishra et al., A Property Checking Approach to
    Microprocessor Verification using Symbolic
    Simulation, Microprocessor Test and Verification
    (MTV), 2002.
  • P. Mishra et al., Automatic Functional Test
    Program Generation for Pipelined Processors using
    Model Checking, IEEE High Level Design Validation
    and Test (HLDVT), 2002.
  • P. Mishra et al., Automatic Validation of
    Pipeline Specifications. HLDVT, 2001.
  • P. Mishra et al., ADL driven Design Space
    Exploration in the Presence of Coprocessors,
    Synthesis and System Integration of Mixed
    Technologies (SASIMI), 2001.

107
Architecture Description Languages
  • Behavior-Centric ADLs
  • ISPS, nML, ISDL, SCP/ValenC, ...
  • primarily capture Instruction Set (IS)
  • good for regular architectures, provides
    programmers view
  • tedious for irregular architectures, hard to
    specify pipelining
  • Structure-Centric ADLs
  • MIMOLA, ...
  • primarily capture architectural structure
  • specify pipelining drive code generation, arch.
    synthesis
  • hard to extract IS view
  • Mixed-Level ADLs
  • LISA, RADL, FLEXWARE, MDes, EXPRESSION,
  • combine benefits of both
  • generate simulator and/or compiler

108
An Example Embedded System
Digital Camera Block Diagram
Memory
Processor
Coprocessors
109
Design Complexity
Design complexity is increasing at an exponential
rate.
110
(No Transcript)
111
Use of Silicon Power
Write a Comment
User Comments (0)
About PowerShow.com