EE201A Lecture 5 - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

EE201A Lecture 5

Description:

(Art Designer) Retargetable. coprocessor (Target compiler. technologies) DSP extensions ... Given: SFG G, fixed period vector, lower and upper bounds on the start time ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 45
Provided by: ingridver
Category:
Tags: ee201a | lecture

less

Transcript and Presenter's Notes

Title: EE201A Lecture 5


1
EE201A - Lecture 5
Memory management
2
Motivation
  • Modeling of multi-dimensional arrays
  • Memory management

3
Motivation
  • Step 1 model, or representation of arrays
  • Step 2 given a model,
  • Find a feasible implementation ( schedule)
  • Optimize e.g. memory size, memory access, memory
    latency, power or energy consumption

4
Problem formulation
  • Multi-dimensional periodic scheduling
  • Given a signal flow graph G,
  • With for each operation, a lower and upper bound
    on each of its periods and its start time
  • With for each operation a cost factor
  • Find a schedule s that satifies
  • Timing constraints
  • PU constraints
  • Precedence constraints
  • MPS is NP-hard

5
Approach
  • Decompose into two stages
  • First stage Period Assignment
  • Given SFG G
  • Find Period assignment p
  • Second stage Fixed-period Multidimensional
    Periodic Scheduling
  • Given SFG G, fixed period vector,
  • lower and upper bounds on the start time
  • Find Start time assignment s and PU assignment
    (W,h)

6
More Examples II DCT
7
Reading
  • F. Catthoor, K. Danckaert, S. Wuytack, N. Dutt,
    Code transformations for Data Transfer and
    Storage Exploration Preprocessing in Multimedia
    Processors, IEEE Design Test of Computers,
    May-June 2001, pg. 70-82.
  • P. Panda, F. Catthoor, N. Dutt, et al, Data and
    memory optimization techniques for embedded
    systems, ACM Transactions on Design Automation
    of Electronic Systems, Vol. 6, no. 2, April 2001,
    pg. 149-206 (section 1 and 3 distributed today)
  • Class presentation
  • P. Murthy, E.A.Lee, Multidimensional
    Synchronous Dataflow, IEEE Transactions on
    Signal Processing, Vol. 50, no. 7, July 2002
  • (distributed today)

8
Why is memory important?
Memory access is just another instruction...
...so why treat memory differently?
according to Hennessey/Patterson book
9
Memory Performance Bottleneck
10
Impact on Processor Pipeline
Clock cycle determined by slowest pipeline stage
11
Memory Hierarchy
  • To retain smaller clock cycle, we keep small
    memory in pipeline
  • Leads to Memory Hierarchy

Main Memory
12
Impact of Memory Architecture Decisions
  • Area
  • 50-70 of ASIC/ASIP may be memory
  • Performance
  • 10-90 of system performance may be memory
    related
  • Power
  • 25-40 of system power may be memory related

13
Power Distribution in CMOS LSIs
Source Sakurai Kuroda, EDTC, 97, Tut. D Low
power Circuit Design Multimedia LSIs
14
Memory Power Bottleneck
PROC
SRAM
SRAM
EXTERNAL MEMORY
DP
Embedded DRAM
MMU
P(Ext. Access) typ. 30 x P(Arithmetic
Operations)
P(Int. Memory) typ. 40 - 60 P(Chip)
15
Important Memory Decisions in Embedded Systems
  • What is a good memory architecture for an
    application?
  • Total memory requirement
  • Delay due to memory
  • Power dissipation due to memory access
  • Compiler and Synthesis tool (Exploration tools)
    should make informed decisions on
  • Registers and Register files
  • Cache parameters
  • Number and size of memory banks

16
Outline
  • Model of Registers
  • single registers
  • register files
  • number of registers
  • number of register files
  • next on-chip memory
  • off-chip memory

17
Design sequence
Spec
multidimensional arrays
Background memory management
Foreground memory management as part of HW or SW
scalars
18
Embedded Systems Path to Implementation
Specification/ Program
HW/Software Partitioning
HW
SW
  • Synthesis Flow
  • High Level Synthesis
  • RTL Synthesis
  • Logic Synthesis
  • Compiler Flow
  • Parsing
  • Optimizations
  • Code Generation

19
High Level Synthesis
  • Under Constraints
  • Total Delay
  • Limited Resources

20
High Level Synthesis Scheduling
Y A B Z C D X Y Z
Scheduled DFG
Spec
B
C
D
A
B
C
D
A

Z
Y
Z
Y

X
X
DFG
Assign clock cycles to operations
21
High Level Synthesis Resource Allocation and
Binding
Scheduled DFG
RTL Implementation
B
C
D
A

Z
Y
Resource Library
X

Assign resources to operations
22
Registers in High Level Synthesis
B
C
D
A
Resource Constraint - 2 Adders

Registers
23
Register Access Model
Register Read
Operation
Register Write
24
Limitation of Registers
  • Complex Interconnect
  • Every register connects to every FU

R1
R2
R3
R4


-

FU1
FU2
FU3
FU4
compare to VLIW crossbar network
25
Register Files
R1
R2
R3
  • Modular architecture
  • Limited connectivity
  • New optimization opportunities

26
Access Model of Register Files
Register File
27
Life-time of Variables
Register optimization technique
Life-time definition to last use of variable
x y z a x 1 p y 2 q z p r p
3 k q r
x
y
z
p
q
r
28
Conflict Graph of Life-times
x
y
z
p
q
r
29
Coloring the Conflict Graph
Minimum number of registers Chromatic number of
conflict graph
x
y
z
p
q
r
30
Coloring determines Register Allocation
x
y
z
x
y
z
p
p
q
q
r
r
31
Minimizing Register Count
  • Graph Colouring is NP-complete
  • Heuristics (Growing clusters)
  • Polynomial time solution exists for straight line
    code (no branches)
  • Left-edge algorithm
  • Possible to incorporate other factors
  • Interconnect cost annotated as edge-weight
  • Overview paper Stok L., Jess J., Foreground
    memory management in data path synthesis, Int.
    J. of Circuits Theory Appl. 20, no 3, pg.
    235-255, 1992.

32
Register Files/Multiport Memories
  • Scalar approach infeasible for 100s of registers
  • interconnect delays dominate
  • Need to store variables in Register Files
  • Limited Bandwidth

Problem How to do Register Allocation
to Register Files efficiently?
33
Which variables go into Multiport Memory?
Problem Given a Schedule and a Multiport memory,
which variables should be stored in the memory?
State1 R1 R2 R3 State2 R2 R1 R1
Schedule
Which registers should go into Dual-port Memory?
34
ILP Formulation
Maximize x1 x2 x3 (Maximize regs
stored in Memory) Constraints x1 x2 x3
lt 2 (State1 Max 2 parallel accesses) x1
x2 lt 2 (State2 Max 2 parallel
accesses)
Solution x1 1, x2 1, x3 0 (Store R1 and R2
in Memory)
35
Intermediate conclusion
  • Memory management is important
  • Two main types
  • background memory optimization
    (multidimensional arrays)
  • foreground memory optimization (scalars)
  • Foreground memory
  • registers graph coloring
  • register files and limited access
  • model of individual read/write operations to SRAM

36
Motivation for SRAM
  • Limitation of Register File
  • OK for scalar variables
  • NOT OK for array variables
  • Need to handle large address space
  • But retain fast access to scalar variables

37
SRAM Access
Data Bus
Data Bus
SRAM
Data Bus
38
SRAM-based Architecture
Address
Data
Similar to Processor But Predictability Necessary
39
Memory Model in HLS
Multicycle Operations
Address
Address
Data
Read
Write
Data
40
Behavioral Templates
A
B
1. Defines precedence constraints between
stages 2. Templates are used directly by
scheduler
Stage 1
Stage 2
C
Stage 3
D
41
Templates for Memory Access
Address
Address
Stage 1
Stage 1
Data
Stage 2
Stage 2
Stage 3
Stage 3
Data
3-Cycle MEMORY WRITE
3-Cycle MEMORY READ
42
Using Memory Templates
Address

Cycle 1
Cycle 2
  • Operation can be scheduled into Cycle 1
  • No change to scheduling algorithm
  • Used in Synopsys Behavioral Compiler

43
Ordering and bandwidth reduction
RA
RB
RC
RC
RA
WB
RD
WC
WD
WA
read write dependencies
44
Schedule in 6 cycles
A B B C C D D A A C B
A
B
B D
A
C
C
D
3 single port memories
Time
A B B C C D D A C B A
A
B
B D
A C
C
D
2 single port memories
Time
Write a Comment
User Comments (0)
About PowerShow.com