Fast TraceDriven HWSW Cosimulation Using Virtual Synchronization Technique - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Fast TraceDriven HWSW Cosimulation Using Virtual Synchronization Technique

Description:

Example : DIVX Player (H.263 decoder MP3 decoder) ... AVI. Reader. MP3. Decoder. Header. decoder. MC. Display. H.263. Decoder. IDCT. DQ ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 20
Provided by: doh74
Category:

less

Transcript and Presenter's Notes

Title: Fast TraceDriven HWSW Cosimulation Using Virtual Synchronization Technique


1
Fast Trace-Driven HW/SW Co-simulation Using
Virtual Synchronization Technique
  • Dohyung Kim, Youngmin Yi and Soonhoi Ha
  • Seoul National University, Seoul, Korea

2
Content
  • Motivation
  • Related Work
  • Virtual Synchronization Technique
  • The Proposed Approach
  • Experiment Results
  • Conclusion

3
Hardware/Software Co-simulation
  • HW/SW cosimulation evaluates performance of
    system variations, which helps to make design
    decisions

From algorithm
System Design
To implementation
4
Synchronization in Co-simulation
  • All simulators are synchronized to one global
    time at every cycle
  • Functions can receive data in legal sequence
  • Resource conflicts in architecture can be
    resolved properly

func_A
func_A
simulator 1
one global time
func_B
simulator 2
5
Performance Bottleneck in Cosimulation
  • As simulator speed increases, synchronization
    overhead becomes a major performance bottleneck
    of co-simulation
  • Synchronization overheads from different
    implementations
  • Remote TCP/IP 200 us
  • Local TCP/IP 30 us (18 us using POSIX thread)
  • Function call 0.5 us Linux 2.4, Pentium
    1.8GHz dual, 100M LAN

100
45K (90)
303K (90)
18M (90)
synchronization overhead
33K (50)
5K (50)
2M (90)
555 (10)
222K (10)
3.7K (10)
0
simulator speed
0
1K
10K
100K
10
100
1M
10M
6
Formulation of Cosimulation Time
  • Co-simulation time is composed of
  • T0 Simulator time to execute simulators
  • T1 Data exchange time to deliver data between
    simulators
  • T2 Context change time to change context
    between simulators
  • Synchronization time (T1T2)
  • data exchange time
    context change time

one clock cycle n
simulation time
n
n1
Simulator 1
n
Simulator 2
n
Simulator 3
T0
T1
T2
7
Formulation of Cosimulation Time
  • Co-simulation time is composed of
  • T0 Simulator time to execute simulators
  • T1 Data exchange time to deliver data between
    simulators
  • T2 Context change time to change context
    between simulators
  • Synchronization time (T1T2)
  • data exchange time
    context change time
  • Simulation time for each simulator
  • (simulator time synchronization time)
    total simulated cycles

8
Related Work
  • Optimized approach Sung1998
  • Utilize the next event time from simulators
  • It is hard to acquire the exact next event time
  • Optimistic approach Yoo1998
  • Advance clocks without synchronization and
    support roll-back
  • Performance enhancement depends on frequency of
    data exchanges
  • SeamlessCVE Bailey2002
  • Synchronized only when simulators exchanges
    shared data
  • It does not handle resource conflicts and still
    slow
  • Transaction level modeling Grotker2002
  • Change accuracy level to reduce simulator time
  • Synchronization overhead is still problem

9
Virtual Synchronization Kim2002
  • Predict the next synchronization point based on
  • a computation model which defines algorithm
    behavior precisely
  • At that points, take relative times (t1,t2,t3)
    from simulators
  • Then, transform those times to the global times
  • t0, t0t1, t0t1t2, t0t1t2t3

simulator 1
local time
simulator 2
local time
simulator 1
simulator 2
global time
t0
10
Limitation in Virtual Synchronization
  • Assumed that there is no resource conflict on the
    communication architecture
  • Otherwise, simulators should be synchronized at
    every cycle to calculate delays caused by the
    resource conflicts
  • Ex) func_A and func_B are executed concurrently
    on different processors and access a memory
    through a shared bus

simulator 1
func_A
func_B
func_A
shared resource (Ex. memory)
proc 1
proc 2
func_B
simulator 2
MEM
delay from resource conflict
11
The Proposed Approach
  • Propose a new technique to reduce synchronization
    overhead
  • Predict the next data exchange time based on
    computation model
  • Reconstruct the resource conflicts later using
    trace-drivenco-simulation
  • Over 99.95 synchronization points are removed
  • Synchronization overhead becomes under 3.86.4
  • Simulation time (simulator time total
    simulated cycles)

  • (synchronization time synchronization count)
  • Limitation assumption for a computation model

12
First Part Trace Generation
  • Like virtual synchronization, execute simulators
    assuming no resource conflicts BUT store all
    accesses to architecture components (resources)
    during the execution
  • Stored traces have relative times between traces
    to apply virtual synchronization at the second
    part

simulation engine
3. output data resource access traces
1. input data
function_A
simulator 0 (proc0)
function_C
simulator 2 (proc2)
function_B
simulator 1 (proc1)
2. execute a simulator
13
Second Part Trace-driven Co-simulator
  • Transform the relative times in the resource
    access traces to the global times by considering
    conflicts on architecture resources
  • Operating system model resolves conflicts on a
    processor
  • Communication architecture model resolves
    conflicts on a memory
  • Request new traces if it consumes all traces or
    can not determine the next trace to evaluate

4. resource access traces
5. request resource access traces
simulation engine
14
Example Scenario
simulation engine
function_A (proc0)
function_C (proc2)
simulator 1
function_B (proc1)
simulator 2
simulator 3
simulation time
processor 0
proc0
proc1
proc2
processor 1
processor 2
MEM
INTR
simulated cycles
Trace-driven Co-simulator
15
Experiment Environment
  • Example DIVX Player (H.263 decoder MP3
    decoder)
  • Machine Linux 2.4 kernel, Dual Xeon 2.6 GHz
    CPUs, 1GB RAM
  • Simulator ADS 1.2 from ARM, ModelSim from
    MentoGraphics
  • PeaCE framework automatically generates different
    co-simulation environments for different
    architectures

IDCT
DQ
H.263 Decoder
Header decoder
MC
Display
AVI Reader
MP3 Decoder
Simplified view of DIVX Player
16
Candidate Architectures
17
Co-simulation Result
18
Performance Comparison with Other Approaches
  • Target architecture Hardware IDCT ARM
    Processor

result comes from a slower machine
19
Conclusion and Future Work
  • Overcome the synchronization problem in HW/SW
    co-simulation combining two different approaches
  • Virtual synchronization predicts when functions
    can receive data in legal sequence based on a
    computation model,
  • Trace-driven co-simulation guarantees that
    resource conflicts in architecture can be
    resolved properly
  • Future work
  • Implement distributed execution of multiple
    simulators using reduced synchronization overhead
  • Focus on modeling of real systems and compare
    accuracy
  • Please visit PeaCE demonstration at University
    Booth (34PM)
Write a Comment
User Comments (0)
About PowerShow.com