Automatic Translation of Behavioral Testbench for Fully Accelerated Simulation - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Automatic Translation of Behavioral Testbench for Fully Accelerated Simulation

Description:

To offload software simulator and speed-up overall simulation ... Sequencer. Unit. ROM. RAM. Transaction Unit. Transaction Unit. Synchro- nization. Unit. 5 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 29
Provided by: Hidetosh9
Category:

less

Transcript and Presenter's Notes

Title: Automatic Translation of Behavioral Testbench for Fully Accelerated Simulation


1
Automatic Translation of Behavioral Testbench for
Fully Accelerated Simulation
  • November 8, 2004
  • Young-Il Kim and Chong-Min Kyung
  • Dept. of EECS, KAIST
  • ICCAD 2004

V1.4
2
Outline
  • Introduction
  • Motivation
  • Previous Work
  • Basic Principle
  • Simulation Reference Model
  • Hardware Architecture
  • Automatic Translation Process
  • Statement Translation Table
  • Equivalent Hardware Generation
  • Conclusion

3
Introduction
  • Simulation acceleration
  • To offload software simulator and speed-up
    overall simulation performance using dedicated
    hardware.
  • DUT is executed on hardware accelerator.
  • Testbench still resides in software simulator.
  • Communication overhead becomes a new bottleneck.

lt Simulation gt
lt Simulation acceleration gt
Software simulator
Hardware accelerator
Software simulator
Testbench
DUT
Testbench
DUT
Communication Overhead
4
Previous Work
  • Fully accelerated testbench
  • Whole testbench is moved into hardware
    accelerator.
  • Testbench is described as behavioral command
    represented as instructions stored in ROM.
  • Behavioral testbench is applicable to
    accelerator, while attaining the performance of
    synthesizable testbench.
  • Transactor (PGU) has to be manually produced by
    designer.

DUT
Synchro- nization Unit
Transaction Unit
Protocol Generator Unit (PGU)
Micro Sequencer Unit
ROM
RAM
Transaction Unit
5
Previous Work
  • Multiple processor-based accelerator
  • To achieve speedup, multiple processors perform
    simulation in parallel.
  • Behavioral HDL syntax is applicable to
    accelerator.
  • Performance limitation
  • As the number of processors increases, the
    communication overhead among processors becomes
    bottleneck.

Global Programmable Interconnect
Local Interconnect
Local Interconnect
FPGA array
FPGA array
RAM
CPU
FPGA
RAM
CPU
FPGA
Emulation Module
Emulation Module
6
Objective
  • Mapping of general HDL testbench to FPGA
  • Synthesizable HDL subset is not sufficient to
    build testbench
  • Unsynthesizable syntax is inevitable.
  • Time delay
  • Initial statement
  • Loop with dynamic number of iteration
  • We proposed to execute behavioral testbench in
    FPGA without losing compatibility with HDL
    software simulator.
  • Automatic translator converts testbench into
    synthesizable one

7
Tool Flow of Proposed Method
  • DUT is synthesizable but testbench is not
    synthesizable.
  • Testbench is translated into synthesizable one.
  • DUT and translated testbench are merged and
    synthesized by RTL synthesis tool.
  • Whole test environment finally run in FPGA-based
    hardware accelerator.

HDL Testbench
DUT
Proposed Translator
Synthesizable Testbench
DUT
Synthesis
Map and PR
FPGA
8
Verilog Simulation Reference Model
while(there are events) if(no active
events) if(there are inactive events)
activate all inactive events
else if(there are non blocking assign update
events) activate all non blocking
assign update events else if(there are
monitor events) activate all monitor
events else advance T to
the next event time activate all
inactive events for time T E
any active event if(E is an update
event) update the modified object
add evaluation events for sensitive processes
to event queue else evaluate the
process add update events to the event
queue
IEEE 1364-2001 IEEE Standard Verilog Hardware
Description Language
9
Basic Principle
  • Simulation procedure is iteration between
    evaluation and update
  • We introduce emulation clock for simulation
    procedure

1E. Evaluate of RHS of statement 1 1U. Update of
LHS of statement 1 2E. Evaluate of RHS of
statement 2 2U. Update LHS of statement 2 3E.
Evaluate RHS of statement 3 3U. Update LHS of
statement 3
forever clk 1 clk always _at_(posedge clk) a
a1 assign b a
1 2 3
No event
1E
1U
1E
2E
2U
3E
3U
1U
2E
2U
3E
3U
No event
1E
1U
1E
No event
clk
0
1
2
a
0
1
b
2
simulation time
4
0
1
2
3
10
Basic Principle
  • Hardware Implementation
  • Evaluate and update are performed
    alternately.
  • HDL assignment is translated into 2-bit
    shift registers.
  • A chain of two enabled registers in cascade
  • Left register is used for Evaluate function
  • Right register is used for Update function

Evaluate Register
Update Register
LHS RHS
Translation
RHS
LHS
EN
EN
evaluate
update
Verilog code
eclk
Equivalent Assignment Hardware
11
Example
Update Register
Evaluate Register
clk
EN
EN
forever clk 1 clk always _at_(posedge
clk) a a1 assign
b a
a
EN
EN
Example Verilog Code
b
EN
EN
Equivalent Assignment Hardware
12
Translated Hardware Architecture
Evaluate Logic
Evaluate Trigger
Local Intercon nection
Event Detect
Update Trigger
Evaluate Logic
Local Intercon nection
Evaluate Trigger
Update Trigger
Evaluate Logic
Event Detect
Evaluate Trigger
Update Trigger
Value Interconnection
Event Interconnection
13
Evaluate Logic Block
  • Evaluate logic is in charge of producing RHS
    value

clk
clk
EN
EN
forever clk 1
clk always _at_(posedge clk) a
a1 assign b
a
a1
a
1
EN
EN
a
b
EN
EN
Evaluate Register
Update Register
14
Evaluate Trigger Block
  • Evaluate trigger determines the time to enable
    evaluate register

Evaluate Register
Update Register
b
a
EN
EN
always _at_(clk) b a always _at_(b) c
b always _at_(b or c) if(a1) d c
c
b
EN
EN
c
EN
d
EN
15
Update Trigger Block
  • Update trigger determines the time to enable
    update register
  • Blocking assignment
  • Nonblocking assignment

Evaluate Register
Update Register
b a
b
a
EN
EN
Evaluate Register
Update Register
b
a
b lt a
EN
EN
wait
Activate when there are no more active events
16
Event Detect Block
  • Event detector generate event signal from
    value signal.
  • Event signal is only high during one emulation
    clock cycle.
  • We implement this block using an XOR gate with
    two inputs.
  • A value signal
  • The same signal delayed by one emulation clock
    cycle

Value Signal
Event Signal
XOR
Delay
1 eclk period
Event Detector
17
Local Interconnection
  • Route signals from evaluate registers to
    update register
  • When a LHS variable is driven by multiple
    statements
  • Single update register is shared via local
    interconnection

a a1
Evaluate Logic
Evaluate Register
Update Register
Evaluate Trigger
Local Intercon nection
Update Trigger
initial begin clk 0 forever clk
clk end
Evaluate Logic
Evaluate Register
Local Intercon nection
Evaluate Trigger
Update Trigger
Update Register
Evaluate Logic
Evaluate Register
Evaluate Trigger
Update Trigger
18
Automatic Translation Process
  • First step Statement information table
  • From each statement, we build a row of statement
    information table
  • Table is composed of four main fields.
  • Statement ID (STID)
  • Preceding statement ID (PRE_STID)
  • Notifies when the current statement start to
    operate
  • Statement type assignment, time control, event
    control
  • Contents LHS, RHS, assign type, delay value

19
Automatic Translation Process
  • Second step Split statement information table
    into following three tables according to
    statement type
  • Assignment table includes information on
    assignment statement.
  • e.g. a b
  • Time control table includes information on time
    control statement.
  • e.g. 1
  • Event control table includes information on event
    control.
  • e.g. _at_(posedge clk)

20
  • Third step Equivalent hardware generation

21
Emulation Cycle Consumption
  • Example of delta delay graph

1E 1U 2E ? 2U 3E ?
3U 4E ? 4U 5E ? 5U
6E ? 6U 7E ?
7U 8E ? 8U
forever clk lt 1 clk always _at_(posedge clk)
begin a a1 b lt b1 end always _at_(b) c
lt b always _at_(posedge clk) begin d lt
c end always _at_(d) e lt d always _at_(e) f lt
e always _at_(posedge clk) begin g lt
f end always _at_(g) h g
? 1
? 2
? 3
Clk
? 4
? 5
? 6
? 7
?1 ?2 ?3 ?4
max(3,1,1) max(2,2,2) max(1,2,1) 1 3 2
2 1 8
? 8
22
Experimental Results
  • We applied typical testbench which generates
    stimuli patterns to DUT and check the responses.
  • Testbenches have different number of
    interconnections between testbench and DUT.

23
Conclusion
  • Performance enhancement of hardware accelerated
    simulation is critically limited by communication
    overhead and testbench execution.
  • Testbench is commonly implemented in
    unsynthesizable HDL description such as time
    delay, event control and loop.
  • To move testbench into accelerator, we proposed
    the automated procedure translating
    unsynthesizable testbench into synthesizable one.
  • In experiments, simulation time is reduced by a
    factor of up to 1000 as compared to the
    conventional accelerated-simulation method.

24
(No Transcript)
25
Event Scheduling
  • Software Simulator

26
Event Scheduling
  • Proposed Hardware Emulator

- Event Processing - Schedule Event
No more event
- Simulation Time Update
Simulation Time
0
2
2 O1 I 3 O2 I 7 O3 I
27
Simulator/Accelerator Partition
28
Statement Translation
  • Statement Translation

always _at_(posedge clk) a 1 a1
RHS
Assignment Type
Event Control
Evaluate Logic
Evaluate Register
Update Register
Evaluate Trigger
Local Intercon nection
Event Detect
Update Trigger
Write a Comment
User Comments (0)
About PowerShow.com