Title: Automatic Translation of Behavioral Testbench for Fully Accelerated Simulation
1Automatic Translation of Behavioral Testbench for
Fully Accelerated Simulation
- November 8, 2004
- Young-Il Kim and Chong-Min Kyung
- Dept. of EECS, KAIST
- ICCAD 2004
V1.4
2Outline
- Introduction
- Motivation
- Previous Work
- Basic Principle
- Simulation Reference Model
- Hardware Architecture
- Automatic Translation Process
- Statement Translation Table
- Equivalent Hardware Generation
- Conclusion
3Introduction
- Simulation acceleration
- To offload software simulator and speed-up
overall simulation performance using dedicated
hardware. - DUT is executed on hardware accelerator.
- Testbench still resides in software simulator.
- Communication overhead becomes a new bottleneck.
lt Simulation gt
lt Simulation acceleration gt
Software simulator
Hardware accelerator
Software simulator
Testbench
DUT
Testbench
DUT
Communication Overhead
4Previous Work
- Fully accelerated testbench
- Whole testbench is moved into hardware
accelerator. - Testbench is described as behavioral command
represented as instructions stored in ROM. - Behavioral testbench is applicable to
accelerator, while attaining the performance of
synthesizable testbench. - Transactor (PGU) has to be manually produced by
designer.
DUT
Synchro- nization Unit
Transaction Unit
Protocol Generator Unit (PGU)
Micro Sequencer Unit
ROM
RAM
Transaction Unit
5Previous Work
- Multiple processor-based accelerator
- To achieve speedup, multiple processors perform
simulation in parallel. - Behavioral HDL syntax is applicable to
accelerator. - Performance limitation
- As the number of processors increases, the
communication overhead among processors becomes
bottleneck.
Global Programmable Interconnect
Local Interconnect
Local Interconnect
FPGA array
FPGA array
RAM
CPU
FPGA
RAM
CPU
FPGA
Emulation Module
Emulation Module
6Objective
- Mapping of general HDL testbench to FPGA
- Synthesizable HDL subset is not sufficient to
build testbench - Unsynthesizable syntax is inevitable.
- Time delay
- Initial statement
- Loop with dynamic number of iteration
- We proposed to execute behavioral testbench in
FPGA without losing compatibility with HDL
software simulator. - Automatic translator converts testbench into
synthesizable one
7Tool Flow of Proposed Method
- DUT is synthesizable but testbench is not
synthesizable. - Testbench is translated into synthesizable one.
- DUT and translated testbench are merged and
synthesized by RTL synthesis tool. - Whole test environment finally run in FPGA-based
hardware accelerator.
HDL Testbench
DUT
Proposed Translator
Synthesizable Testbench
DUT
Synthesis
Map and PR
FPGA
8Verilog Simulation Reference Model
while(there are events) if(no active
events) if(there are inactive events)
activate all inactive events
else if(there are non blocking assign update
events) activate all non blocking
assign update events else if(there are
monitor events) activate all monitor
events else advance T to
the next event time activate all
inactive events for time T E
any active event if(E is an update
event) update the modified object
add evaluation events for sensitive processes
to event queue else evaluate the
process add update events to the event
queue
IEEE 1364-2001 IEEE Standard Verilog Hardware
Description Language
9Basic Principle
- Simulation procedure is iteration between
evaluation and update - We introduce emulation clock for simulation
procedure
1E. Evaluate of RHS of statement 1 1U. Update of
LHS of statement 1 2E. Evaluate of RHS of
statement 2 2U. Update LHS of statement 2 3E.
Evaluate RHS of statement 3 3U. Update LHS of
statement 3
forever clk 1 clk always _at_(posedge clk) a
a1 assign b a
1 2 3
No event
1E
1U
1E
2E
2U
3E
3U
1U
2E
2U
3E
3U
No event
1E
1U
1E
No event
clk
0
1
2
a
0
1
b
2
simulation time
4
0
1
2
3
10Basic Principle
- Hardware Implementation
- Evaluate and update are performed
alternately. - HDL assignment is translated into 2-bit
shift registers. - A chain of two enabled registers in cascade
- Left register is used for Evaluate function
- Right register is used for Update function
Evaluate Register
Update Register
LHS RHS
Translation
RHS
LHS
EN
EN
evaluate
update
Verilog code
eclk
Equivalent Assignment Hardware
11Example
Update Register
Evaluate Register
clk
EN
EN
forever clk 1 clk always _at_(posedge
clk) a a1 assign
b a
a
EN
EN
Example Verilog Code
b
EN
EN
Equivalent Assignment Hardware
12Translated Hardware Architecture
Evaluate Logic
Evaluate Trigger
Local Intercon nection
Event Detect
Update Trigger
Evaluate Logic
Local Intercon nection
Evaluate Trigger
Update Trigger
Evaluate Logic
Event Detect
Evaluate Trigger
Update Trigger
Value Interconnection
Event Interconnection
13Evaluate Logic Block
- Evaluate logic is in charge of producing RHS
value
clk
clk
EN
EN
forever clk 1
clk always _at_(posedge clk) a
a1 assign b
a
a1
a
1
EN
EN
a
b
EN
EN
Evaluate Register
Update Register
14Evaluate Trigger Block
- Evaluate trigger determines the time to enable
evaluate register
Evaluate Register
Update Register
b
a
EN
EN
always _at_(clk) b a always _at_(b) c
b always _at_(b or c) if(a1) d c
c
b
EN
EN
c
EN
d
EN
15Update Trigger Block
- Update trigger determines the time to enable
update register - Blocking assignment
- Nonblocking assignment
Evaluate Register
Update Register
b a
b
a
EN
EN
Evaluate Register
Update Register
b
a
b lt a
EN
EN
wait
Activate when there are no more active events
16Event Detect Block
- Event detector generate event signal from
value signal. - Event signal is only high during one emulation
clock cycle. - We implement this block using an XOR gate with
two inputs. - A value signal
- The same signal delayed by one emulation clock
cycle
Value Signal
Event Signal
XOR
Delay
1 eclk period
Event Detector
17Local Interconnection
- Route signals from evaluate registers to
update register - When a LHS variable is driven by multiple
statements - Single update register is shared via local
interconnection
a a1
Evaluate Logic
Evaluate Register
Update Register
Evaluate Trigger
Local Intercon nection
Update Trigger
initial begin clk 0 forever clk
clk end
Evaluate Logic
Evaluate Register
Local Intercon nection
Evaluate Trigger
Update Trigger
Update Register
Evaluate Logic
Evaluate Register
Evaluate Trigger
Update Trigger
18Automatic Translation Process
- First step Statement information table
- From each statement, we build a row of statement
information table - Table is composed of four main fields.
- Statement ID (STID)
- Preceding statement ID (PRE_STID)
- Notifies when the current statement start to
operate - Statement type assignment, time control, event
control - Contents LHS, RHS, assign type, delay value
19Automatic Translation Process
- Second step Split statement information table
into following three tables according to
statement type
- Assignment table includes information on
assignment statement. - e.g. a b
- Time control table includes information on time
control statement. - e.g. 1
- Event control table includes information on event
control. - e.g. _at_(posedge clk)
20- Third step Equivalent hardware generation
21Emulation Cycle Consumption
- Example of delta delay graph
1E 1U 2E ? 2U 3E ?
3U 4E ? 4U 5E ? 5U
6E ? 6U 7E ?
7U 8E ? 8U
forever clk lt 1 clk always _at_(posedge clk)
begin a a1 b lt b1 end always _at_(b) c
lt b always _at_(posedge clk) begin d lt
c end always _at_(d) e lt d always _at_(e) f lt
e always _at_(posedge clk) begin g lt
f end always _at_(g) h g
? 1
? 2
? 3
Clk
? 4
? 5
? 6
? 7
?1 ?2 ?3 ?4
max(3,1,1) max(2,2,2) max(1,2,1) 1 3 2
2 1 8
? 8
22Experimental Results
- We applied typical testbench which generates
stimuli patterns to DUT and check the responses. - Testbenches have different number of
interconnections between testbench and DUT.
23Conclusion
- Performance enhancement of hardware accelerated
simulation is critically limited by communication
overhead and testbench execution. - Testbench is commonly implemented in
unsynthesizable HDL description such as time
delay, event control and loop. - To move testbench into accelerator, we proposed
the automated procedure translating
unsynthesizable testbench into synthesizable one. - In experiments, simulation time is reduced by a
factor of up to 1000 as compared to the
conventional accelerated-simulation method.
24(No Transcript)
25Event Scheduling
26Event Scheduling
- Proposed Hardware Emulator
- Event Processing - Schedule Event
No more event
- Simulation Time Update
Simulation Time
0
2
2 O1 I 3 O2 I 7 O3 I
27Simulator/Accelerator Partition
28Statement Translation
always _at_(posedge clk) a 1 a1
RHS
Assignment Type
Event Control
Evaluate Logic
Evaluate Register
Update Register
Evaluate Trigger
Local Intercon nection
Event Detect
Update Trigger