Simulation%20Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Simulation%20Overview

Description:

Simulation Overview Multifacet Group University of Wisconsin-Madison – PowerPoint PPT presentation

Number of Views:277
Avg rating:3.0/5.0
Slides: 43
Provided by: cmau8
Category:

less

Transcript and Presenter's Notes

Title: Simulation%20Overview


1
Simulation Overview
  • Multifacet Group
  • University of Wisconsin-Madison

2
Overview
  • Technical introduction to Simics, Ruby, Opal
  • Simics Full-System Simulator
  • Ruby Memory Timing Simulation
  • Opal Out-of-order Micro-architecture Simulator

3
Outline
  • I. Overview
  • II. Simics introduction
  • III. Ruby
  • A. Simics interfaces
  • B. Software architecture
  • IV. Opal
  • A. Software architecture

4
Simics
  • Full system multi-processor simulator
  • Simulated target SPARC V9 (E 15k-like)
  • Nice Features
  • documentation
  • checkpoints
  • disk images
  • Scripting in python

./simics/doc
./simics/checkpoints
5
Simics Devices (sun4u)
UltraSparc iii
RAM
Memory Bus
6
Simics Timing Model
  • One instruction one cycle
  • Modulo interrupts, traps
  • Cycle time is determined by clock frequency

simicsgt print-event-queue (peq cpu0)
7
Outline
  • I. Overview
  • II. Simics introduction
  • III. Ruby
  • A. Simics interfaces
  • B. Software architecture
  • C. CMP overview
  • IV. Opal
  • A. Software architecture
  • V. Bibtex

8
Ruby Introduction
  • Models timing for caches, memory, interconnect
    and directories
  • Implements multiple cache coherence protocols
  • Uses event-driven simulation

Interconnect
Ruby
9
Ruby Timing Model
  • Queues act as delay centers
  • 1 cycle between queues

CPU
tryCacheAccess
hitCallback
L1 Cache Controller
TBE
TBE Transaction Buffer Entries One per
outstanding memory transaction
Network (L2 Cache, Directory)
10
How to Drive Ruby
  • Three ways to run ruby
  • Random Tester
  • Simics (only)
  • Simics Opal

11
Ruby Random Tester
  • Stand-alone executable
  • Action-Check pairs
  • Massive false sharing
  • Action write a set of values in a block
  • Check validate the values are correct
  • Invaluable when developing protocols
  • Other testers available
  • Lock contention
  • Deterministic behavior
  • Etc

12
Ruby-Simics Interfaces
  • Timing-Model interface
  1. Simics encounters a memory instruction
  2. Simics creates memory_trans structure
  3. timing_model interface is called
  4. Ruby returns stall time
  5. (opt) Ruby changes stall time
  6. Simics commits instruction
  7. snoop_device called to read memory

stall time
13
Simics Memory Interfaces
  • Timing-Model interface
  • Provides memory reference structure
  • Address, Ld/St, Size, I/O
  • Ruby returns stall time
  • Polling interface
  • Ruby is called every N steps

14
SLICC Introduction
  • SLICC Specification Language for Implementing
    Cache Coherence Protocols
  • Models multiple coherence protocols

15
Ruby SW Architecture
System
Driver
Node
Profiler
Network
common/Driver.h
generated/Node.h
profiler/Profiler.h
network/
Tester
L1Cache
tester/Tester.h
generated/L1Cache_
Simics Interface
L2Cache
simics/SimicsDriver.h
generated/L2Cache_
Opal Interface
Directory/Memory
interface/OpalInterface.h
generated/Directory_
16
Ruby SW Architecture
Node
Directory
Sequencer
Caches
system/DirectoryMemory.h
system/NewCacheMemory.h
Sequencer.h
Cache Line
Directory Controller State
SLICC
generated/L?Cache_Entry.h
generated/L1Directory_Entry.h
L1 Cache Controller
Directory Controller
L2 Cache Controller
17
Wheres Waldo?
  • Describes the FSM in cache controller
  • Data Structures
  • L1_CacheEntry.h
  • L2_CacheEntry.h
  • Directory_Entry.h
  • Node.h
  • Control
  • L1_Transitions.h
  • L2_Transitions.h
  • Directory_Transitions.h

18
Day in the life of a Request
simics simics/src/extensions/ruby/ruby.c ruby/simi
cs/SimicsDriver.C makeRequest() ruby/system/STD_S
equencer.C makeRequest() doRequest() node-gtL1Ca
che-gttryCacheAccess() ruby/system/CacheMemory.h t
ryCacheAccess() issueRequest() hitCallback()
timing_model
19
Ruby Configuration
  • ruby/config/config.include
  • All parameters defined here
  • ruby/config/rubyconfig.defaults
  • Defines parameters for the ruby module
  • All parameters can be adjusted at runtime
  • ruby/config/tester.defaults
  • Defines parameters for the tester

20
CMP Overview
  • Node contains
  • Exactly one Processor and L1 ID Cache
  • 1-16 in the system
  • Partitioned across 1-16 chips
  • 0 to N L2 Cache Banks
  • At least one per chip
  • 0 to N Directories
  • At least one per system
  • Network
  • One network connects all components in the system
  • Composed of switches and point-to-point links

21
Outline
  • I. Overview
  • II. Simics introduction
  • III. Ruby SW architecture
  • IV. Opal SW architecture

22
Processor Simulator Opal
  • Models a R10000 like out-of-order processor
  • SPARC V9 instruction set
  • Timing-First Organization

23
Timing-First Simulation
  • Timing Simulator
  • does functional execution of user and privileged
    operations
  • does speculative, out-of-order multiprocessor
    timing simulation
  • does NOT implement functionality of full
    instruction set or any devices
  • Functional Simulator
  • does full-system multiprocessor simulation
  • does NOT model detailed micro-architectural timing

System
CPU
Network
RAM
Opal
Simics
24
Timing-First Operation
  • As instruction retires, step CPU in functional
    simulator
  • Verify instructions execution
  • Reload state if timing deviates from functional
  • Instructions with unidentified side-effects
  • NOT loads/store to I/O devices

System
CPU
Network
RAM
Opal
Simics
25
Benefits of Timing-First
  • Supports speculative multi-processor timing
    models
  • Leverages existing simulators
  • Software development advantages
  • Increases flexibility and reduces code complexity
  • Immediate, precise check on timing simulator

26
Conclusions
  • Simics
  • Functional simulator
  • Attach timing modules to control execution time
  • Ruby
  • Uses generated and non-generated code to simulate
    the memory system
  • Extended to simulate CMPs
  • Opal
  • Timing first out-of-order processor model
  • Drives execution

27
Backup Slides
  • More Opal Details

28
Top Level Interfaces
Simics API
29
Pipeline Overview
Branch Predictors
squash
Fetch
Decode
Schedule
Execute
Retire
Complete
Input Wait
LSQ Wait
Cache Miss
30
System
31
Sequencer
  • Instruction Window
  • Register Files
  • Caches / LSQ / MSHR (or ruby cache intf)
  • Branch Predictors
  • Simics / Checking Routines
  • Micro-architectural checkpointing
  • Instruction / Memory / Branch Tracing

32
Static Instruction
  • One static instructions per physical address
  • Can be cached in instruction pages
  • Fields of interest
  • opcode, type, source / dest registers

33
Dynamic Instructions
  • One dynamic instruction per in-flight instruction
  • data renamed registers, events
  • functional execution
  • predict actual program counter

34
Instruction Window
  • All in-flight instructions are tracked
  • Markers delimit pipeline progress
  • Implemented using rotating buffer

----------------------------------------------
- DDFFFFOOOOOOOCCECCEDDD
D ------------------------------------------
-----

\last_scheduled \last_fetched
\last_retired \last_decoded
35
Abstract Register File (arf)
  • Instructions treat registers uniformly

36
Statistics
  • pseq statistics
  • observer functions
  • observe instruction
  • observe static instruction
  • observe thread switch
  • observe transaction complete

37
Branch Predictor Overview
bts - branch trace start btt - branch trace
take btf - branch trace finish
38
BP Classes
Predict, update
Fetch
nextPC
May rollback FixupState()
Execute
Retire
setTarget
Retire
39
Configuration Files
  • Files define all micro-architectural parameters
  • imported as global ALL CAPS variables
  • name value pairs
  • found in opal/config
  • Must load file before running opal!
  • load-module opal
  • opal0.conf filename

40
Adding global variables
  • config.include
  • config.defaults

41
Template for stand-alone opal
read-conf ../../checkpoints/oltp/oltp-warm-2p.chec
k cpu0.print-time _at_import mfacet _at_from mfacet
import _at_magic_enable_cmd() _at_mfacet.setup_run_for
_n_transactions( 100000 ) module-list-refresh _at_SIM
_get_attribute( SIM_get_object( "sim" ),
"cpu_switch_time" ) _at_SIM_set_attribute(
SIM_get_object( "sim" ), "cpu_switch_time", 1
) _at_SIM_get_attribute( SIM_get_object( "sim" ),
"cpu_switch_time" ) load-module opal load-module
ruby opal0.init opal0.start /scratch.local/warm-2p
.log opal0.s 10000000 opal0.stats opal0.stop ruby0
.dump-stats /scratch.local/warm-2p-ruby.log
42
Makefile Defines
  • PIPELINE_VIS pipeline visualization output
  • MODINIT_VERBOSE startup debugging
  • VERIFY_SIMICS once per new version of simics
  • REDECODE_EACH disables static instruction
    caching
  • USE_MINI_TLB increases performance
  • Most defines should be variables! Not compile
    time options.
Write a Comment
User Comments (0)
About PowerShow.com