Title: Design of OnChip Software and Hardware Under Real World Constraints
1Design of On-Chip Software and Hardware Under
Real World Constraints
- Rajesh K. Gupta
- iESAG Members
- Ali Dasdan, Jian Li, Sumit Gupta,
- Dinesh Ramnathan
- Samir Agrawal, Derek Taubert
- Information and Computer Science
- University of California
- Irvine, California 92697.
Research sponsored by NSF CAREER, NSF/ECD,
NSF-ASC/DARPA/NASA, ATT, IEC, Intel.
c\Rajesh\japan98 PPT7.0
2A System Architecture Today
Add-in board
Cache
Processor Cache/DRAM Controller
Audio
Motion Video
PCI Bus
Graphics
Exp Bus Xface
Base I/O
LAN
SCSI
ISA/EISA - MicroChannel
Bridge Architecture
- Courtesy, Shispal Rawat, Intel Corporation.
3A System Architecture Tomorrow
PCI Interface
VRAM
ProcessorCore
DSP Processor Core
I/O Interface
Glue
Glue
Graphics
Video
Motion
SCSI
MEMORY Cache/SRAM or even DRAM
LAN Interface
Encryption/ Decryption
EISA Interface
Hub Architecture
4Embedded Computing Application Classes
- Time-constrained computing systems.
5Outline
- Embedded Computing Systems
- Problem Areas and Co-Design Deliverables
- Specification, Modeling and Analysis
- Pre-synthesis Optimizations
- Architectural Validation
- Partitioning and Synthesis Subtasks
- Constraint Specification and Analysis
- Language-level Modeling and Validation
- Summary
http//www.ics.uci.edu/rgupta
6System Design Problem Areas
2. HDL Modeling Architectural synthesis Logic
synthesis Physical synthesis
1. Design environment, co-simulation constraint
analysis.
Interface
Analog I/O
3. Software synthesis, Optimization, Retargetable
code gen., Debugging Programming environ.
Processor
ASIC
Interface
4. Test Issues
Memory
DMA
7System Design Problems
- Specification, Modeling and Analysis
- How to capture designer intent efficiently in a
design language? - HDL optimizations (pre-synthesis), C/C
specifications - Constraint modeling and analysis
- System Validation
- How to use description in building a
(computational) prototype capable of running
actual applications? - Co-simulation, Formal Verification
- System Design and Synthesis
- Delayed partitioning of hardware and software
- Software synthesis and optimizations
- Interface design and optimizations.
8Co-synthesis
- Builds upon compilation synthesis techniques
- Joint optimization of Hardware and Software
- Leads to rapid exploration of design alternatives
- On-going projects
- 1. Constraint satisfiability and debugging of
violations - 2. Design modeling and optimizations using Dont
Cares - 3. Architectural validation using program-driven
simulations - 4. System partitioning into hardware and software
- 5. Hardware/Software interface resolution and
synthesis - 6. Software synthesis optimization
9Embedded Computing Systems
- Closed systems
- execution indeterminacy confined to one source
- causal relations are easily established.
- Open systems
- indeterminacy from multiple sources, not
controllable or observable by the programmer - not possible to infer causal relations.
Transformational
Physical Processes
REACTIVE
- constraints are an important part of system
functionality in - building embedded computing systems.
101. Timing Constraints Analysis
- Source of timing constraints
- Time-constrained interactions between system
components and its environment - Types of constraints
- Delay and interval constraints (latency-type)
- Rate constraints (throughput-type)
- End-to-end and intermediate timing constraints.
- Modeling for timing analysis
- Constraint derivation
- Constraint satisfiability
- Are constraints satisfied for a given
implementation? - Given an implementation, re-synthesize to satisfy
a given set of constraints. - Rate-based static timing analysis for validation
11Example
VEHICLE CRUISE CONTROLLER
1/sec
RUNTIME SYSTEM
ROUTINE
speed
DISPLAY INFO
ave_speed
CurFuel
consumption
RotClk
CALIBRATION
GET INFO
maintenance
InstVel
AveVel
SecPulse
1000/sec
lt 1ms
ROUTINE
CLOCK
STATE
SecClk
valve
1/sec
12External Timing Constraints
13Internal Timing Constraints
14A Two-level System Model for Rate Analysis
- Process Level
- a set of concurrent processes
- processes interact using enable signals
- modeled as a Process Graph
- vertices as processes
- edges as inter-process synchronization
- edge weight indicates delay in invoking process
enable from the start of the enabling process - Operation Level
- a set of concurrent and conditional operations
- modeled as a Sequencing Graph
- polar, bilogic, hierarchical
2
10
9
3
4
15Rate Analysis at Operation Level
- No inter-process interaction, No pipelining.
- From bounds on operation invocation intervals
determine bounds on process invocation intervals - Recursively propagate bounds over the sequencing
graph hierarchy
Implemented in VULCAN Co-Synthesis System.
16Rate Analysis at Process Level
- Delay analysis on the process graph
- delay of a path, delay of a cycle
- A k-critical path for a process p is a path that
determines kth invocation time of p. - A k-critical path for p is a maximum delay path
with exactly k edges ending at p. - A cycle with maximum mean delay is called a
critical cycle. - Theorem 1 A k-critical path for p that includes
a vertex of a critical cycle Cj has less than
lcm (C, Cj)/C occurrences of any
non-critical cycle C. - gt bounds the number of non-critical cycles in a
k-critical path.
Let x_i(k) be the time at which process p_i
starts executing for the k-th time. x_i(k)
max_predecessors of p_i x_j(k-1) A_ij
gt X(k) A x X(k-1) Ak x X(0) where Al
_ij is the length of the longest path from p_j to
p_i that goes through exactly (l-1) vertices.
17Critical Cycles
- A k-critical path can be transformed into the
following canonical form w.r.t. any critical
cycle that it touches - For sufficiently large k, one can always find a
k-critical path that touches a critical cycle. - Finally, there exists N such that for kgtN the
canonical stems of k-critical and kL-critical
paths are identical.
Stem
18Rate of Process Executionfor Single SCC
- Consider a strongly connected process graph and
let xi(k) represent sequence of time instances
for invocation of process pi. There exists
integer N such that - 1. The sequence xi(k) - xi(k-1) is periodic for k
gt N, - 2. If the period of the sequence of
inter-execution times is P, then for l gt N - where l is the unique eigenvalue of the process
adjacency matrix.
l obtained by computing the maximum cycle mean
in process graph.
19Rate Analysis on Process Graphs with Multiple SCCs
- Find strongly connected components
- component graph (in linear time)
- For each SCC determine achievable lower and upper
bounds on the rate of execution - Compute effective rate intervals using the
following result - Let P and C be two SCCs in a process graph with
enabling edge(s) from P to C. Let pl, pu and
cl, cu be rate intervals for P and C
respectively. The effective rate interval for C
is then given as - cl minpl, cl and cu minpu, cu
20Interactive Analysis FrameworkDasdan, Mathur,
Gupta, EDTC97
STOP
NO
Process graph Rate constraints
Constraints consistent?
Redesign processes in critical cycles
Rate Analysis
NO
NO
Constraints satisfied?
Self loops in crit. cycles?
Constraints satisfied?
Pipeline processes
YES
YES
STOP
STOP
Prototype Tool Available RATAN
21Constraint Consistency
- Rate constraints are specified by and interval
within which the rate of execution must lie - Ii Li, Ui
- A set of rate constraints that define the
constraint intervals Ii for each process pi in a
process graph is inconsistent iff - There exists a SCC, Cj, such that
- or there exists a component Cj for which
transitive rate interval
22Analysis Steps
- Step 1 Process graph and rate constraints
- Step 2 Check constraint consistency
- Step 3 Rate analysis to determined effective
rates - use monotonicity lemma for bounds checking
- determine critical cycles
- effective rates in presence of predecessor
constraints - Step 4 Check constraint satisfiability
- must subsume
- Step 5 Process critical cycles
- determine self loops.
23Use of Timing Analysis
- Use timing analysis to drive the embedded system
design - Determine timing properties of tasks in the
system - Determine timing properties of a possible
implementation - Select the right partition between hardware and
software - Determine how the choices made during the design
flow affect the systems timing behavior - Use timing analysis to validate the embedded
system - Verify if an implementation is possible under the
imposed timing constraints - Verify if the choices made during the design flow
meet the timing constraints
24Rate-Based Static Timing Analysis
- Problems addressed
- Validation of end-to-end timing constraints
- Derivation of intermediate timing constraints
- Validation of intermediate timing constraints
(for cyclic portions) - Solution Rate-based static timing analysis
- Rate derivation and validation using RADHA
- Derive individual task rates from input task
rates - Derive other intermediate constraints from task
rates - Validate end-to-end timing constraints using
intermediate constraints - Rate analysis using RATAN
- Validate intermediate constraints for cyclic
portions
25Rate-Based Design Flow
26Current Status of Implementation
- RADHA and RATAN are implemented
- Timing constraint derivation, validation,
debugging - Identification of critical tasks and timing
constraint violations - Work in progress
- Develop techniques to tie the RADHA-RATAN
framework to the other parts of the design flow - Develop techniques for faster simulation, e.g.,
localized simulation - Enhance interaction capabilities of the tools
- Develop an interface to inputs in VHDL.
272. Design ModelingSemantic Necessities
- Abstraction
- provide a mechanism for building larger systems
by composing smaller ones - Reactive programming
- provide mechansims to model non-terminating
interaction with other components - watching (signal) and waiting (condition)
- separate (else one is an implementation of the
other) - exception handling
- Determinism
- provide a predictable simulation behavior
- Simultaneity
- model hardware parallelism, multiple clocks
28Adding Reactivity
- Reactivity can be added in one of three ways
- 1. Use annotations, comments
- commonly used in home-grown C-based HDLs
- sometimes use semantic overloads that is
associated with alternative interpretations. - 2. Use library assists
- additional library elements that can be used by
the programmer in modeling hardware - example classes in C or Java
- 3. Use additional language constructs
- new constructs require a specific language
front-end, new debugging tools. - Example divide operations across cycles using
next()
29Pragmatics
- Accept a suitable subset of C input
- Provide output to be synthesizable
- Package hardware analogues and assists in a
library - usable type system for a bit-true representation
- processes, components, ww, exception handling
- multiple logic levels (2-state, 4-state)
- Eliminate or minimize interpretation
30Using C For Synthesis and Simulation
- Scenic Liao,Tjiang,Gupta, DAC97 is a language
for designing hardware - A set of classes and functions in C
- Digtial hardware consists of a set of processes
- plus control information version, library etc
- A Process
- has an associated clock
- waits on signals and conditions
- may have several user-specified exception
behaviors - pass-through
- clock-stretch (suspend)
- power-down
- initialize
- scan-chain ...
31Scenic Concepts
- A module declares a collection of
- modules and processes
- A process describes a specific type of hardware
implementation - combinational
- sequential synchronous, asynchronous
- Processes may react
- to synchronization conditions
- to watching conditions
- Time is logical
- time associated with execution progress
- not real time.
32Wait() and Watching()
- Wait()
- synchronize to the next clock edge
- implemented on threads using a mutex var
- Watching()
- can be implemented using wait() but would be
expensive - represents asynchronous signals and events
- in synchronous model, these are evaluated by the
clock process - encapsulated in a lambda object that contains a
pointer to the waiting process
33Methodological Issues Synthesis
- Watching is restricted to a single signal
- flip-flop implementation
- Exception behavior may not be synthesizable
- models (low-level) hardware
- Multiple or two-phase clocks
- a process can only be operated off one clock edge
34Going from C to HDL
Restricted C Description
Refine data types - bit true, fixed point -
saturation arithmetic
Add reactivity, clock(s), waiting watching
CONTROL
DATA
HDL Description
35Adding Reactivity
- Use class scenic_process for synthesizable
modules - Add clock(s)
- Divide operations across cycles using next()
- BC restrictions loops, balanced branches
- Distinguish expression types
- expr and lambda-expr are syntactically different
- lambda-expr are delay evaluated
- wait_until and watching
- use only lambda-expr
- use expr in control flow
- watching declared in process instatiation
- Example
36Example W W
Blocking
Non-blocking
scenic_signalltgt a wait_until( a 1) block
scenic_signalltgt a if (a.read() 1) block
Con-current Watching
watching (a 1) catch (...) if (a.read()
1) execption_block
try normal_block
37Adding Data Types
- Identify signals
- storage elements, structured memory blocks
- Size variables signed, unsigned, std_logic
- Identify expressions
- lambda_expr integer, std_logic
- expr integer, std_logic, signed, unsigned
- Declare sizes on process state variables on
instantiation - (var , signal) x (scalar, array, 2d-array)
38Language Comparisons
- Verilog, VHDL compiler produces inputs to run a
DES simulator. - Esterel compiler produces a single deterministic
FSM. - Scenic compiler produces (synthesizable)
processes and a simulator.
39Summary
- Co-design is an approach to develop CAD tools for
design of embedded computing systems. - Language-level modeling and analysis
- Rate derivation using external timing
constraints, RADHA - Interactive design explorations using rate
analysis RATAN - HDL pre-synthesis optimizations using assertions,
PUMPKIN - Synthesis into C/C models, SCENIC
- Significant challenges remain in system
integration - probably no silver bullet solution but a
combination of methodology, libraries and
standards will emerge.
Build custom systems using commodity parts. Use
customization to achieve competitive
price/performance (testability) advantages.
40Related Efforts
- System models
- Karp and MillerSIAM66, Lee and
MesserchmittIEEE87, Buck and LeeIEEE93. - JeffayACM93, YenPhD96, Puchol and
MokIEEE94. - Gillies and LiuSIAM95, Yakovlev et al.FMSD96.
- Petri netsIEEE89.
- Rate-based analysis
- Gerber et al.IEEE95, Seto et al.IEEE96, Burns
et al.IPL96. - Interaction
- WegnerCACM97.
- Design of complex systems
- RechtinIEEE97.