Evaluation of Offset Assignment Heuristics - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluation of Offset Assignment Heuristics

Description:

General Offset Assignment (GOA) Problem presented by Liao et. al. in 1996. ... General Offset Assignment (GOA) Fix the access sequence. Allow multiple address ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 65
Provided by: Huy9
Category:

less

Transcript and Presenter's Notes

Title: Evaluation of Offset Assignment Heuristics


1
Evaluation of Offset Assignment Heuristics
  • Johnny Huynh, Jose Nelson Amaral, Paul Berube
  • University of Alberta, Canada
  • Sid-Ahmed-Ali Touati
  • Universite de Versailles, France

2
Outline
  • Background
  • Traditional Approach to Offset Assignment
  • Simple Offset Assignment
  • Address-Register Assignment
  • Improving the Problem Model
  • Optimal Address-Code Generation
  • Memory Layout Permutations
  • Evaluating Current Heuristics
  • Methodology
  • Results
  • Conclusions and Future Work

3
Outline
  • Background
  • Traditional Approach to Offset Assignment
  • Simple Offset Assignment
  • Address-Register Assignment
  • Improving the Problem Model
  • Optimal Address-Code Generation
  • Memory Layout Permutations
  • Evaluating Current Heuristics
  • Methodology
  • Results
  • Conclusions and Future Work

4
Background
  • Digital Signal Processors (DSPs) have few general
    purpose registers
  • Program variables kept in memory
  • Address Registers (AR) used to access variables
  • After a variable is accessed, the AR can be
    auto-incremented (or decremented) by one word in
    the same cycle.

5
Processor Model
  • Texas Instruments TMS320C54X DSP family
  • Accumulator-based DSP
  • 8 Address Registers
  • Initializing an address register requires 2
    cycles of overhead
  • Explicit address computations require 1 cycle of
    overhead
  • Using auto-increment (or auto-decrement) has no
    overhead.

6
Processor ModelExample add A and B, store
in accumulator
  • AR0 A
  • ACC AR0
  • AR0 AR0 2
  • ACC AR0
  • AR0 A
  • ACC AR0
  • ACC AR0

0x1000 0x1001 0x1002 0x1000
0x1001 0x1002
Auto-Increment
Explicit address computation
7
Processor ModelExample add A and B, store
in accumulator
  • AR0 A
  • ACC AR0
  • AR0 AR0 2
  • ACC AR0
  • AR0 A
  • ACC AR0
  • ACC AR0

0x1000 0x1001 0x1002 0x1000
0x1001 0x1002
Auto-Increment
Explicit address computation
8
The Offset-Assignment Problem
  • Given k address registers and a basic block
    accessing n variables, find a memory layout that
    minimizes address-computation overhead.
  • How should the variables be placed in memory?
  • Which register should access each variable?

9
Outline
  • Background
  • Traditional Approach to Offset Assignment
  • Simple Offset Assignment
  • Address-Register Assignment
  • Improving the Problem Model
  • Optimal Address-Code Generation
  • Memory Layout Permutations
  • Evaluating Current Heuristics
  • Methodology
  • Results
  • Conclusions and Future Work

10
Traditional Approach to Offset Assignment
Access Sequence
Basic Block
Generate Access Sequence
11
Traditional ApproachSimple Offset Assignment
(SOA)
  • In 1992, Bartley introduced the simplest form of
    the offset assignment problem
  • Given a single address register and basic block
    with n variables, find a memory layout that
    minimizes overhead.
  • Equivalent to finding a maximum weight path cover
    (NP-complete)
  • Many researchers have proposed heuristics for
    this problem
  • Liao et. al. (1996)
  • Leupers and Marwedel (1996)
  • Sugino et. al. (1996)

12
Simple Offset Assignment (SOA)
  • Fix the access sequence
  • Assume only one address register (k 1)
  • Find an ordering of variables in memory (memory
    layout) that has minimum overhead.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layout
F
C
2
2
2
D
E
13
Simple Offset Assignment (SOA)
  • Create Access Graph G (V, E)
  • V variables
  • weight of edge is the frequency of consecutive
    accesses
  • A path defines a memory layout -- Find the
    Maximum Weight Path Cover
  • NP-Complete!

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layout
F
C
2
2
2
D
E
14
Simple Offset Assignment (SOA)
  • Create Access Graph G (V, E)
  • V variables
  • weight of edge is the frequency of consecutive
    accesses
  • A path defines a memory layout -- Find the
    Maximum Weight Path Cover
  • NP-Complete!

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layout
F
C
2
2
2
D
E
15
Traditional ApproachGeneral Offset Assignment
(GOA)
  • Problem presented by Liao et. al. in 1996.
  • Given k address registers, and a basic block with
    n variables, find an assignment of variables to
    address registers that minimizes the total
    overhead of all registers.
  • This problem formulation is more accurately
    described as Address-Register Assignment (ARA).
  • Consists of SOA problems, and is at least
    NP-hard.
  • Many researchers have proposed heuristics for
    address-register assignment
  • Leupers and Marwedel (1996)
  • Sugino et. al. (1996)
  • Zhuang et. al. (2003)

16
General Offset Assignment (GOA)
  • Fix the access sequence
  • Allow multiple address registers (kgt1)
  • Find an ordering of variables in memory (memory
    layout) that has minimum overhead.
  • Assign each variable to an address register to
    form access sub-sequences.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Sub-sequence1 a b c b c a Sub-sequence2 d
e f e f d
F
C
2
2
2
D
E
17
General Offset Assignment (GOA)
  • Each sub-sequence can be viewed as an independent
    SOA problem.
  • Solve each sub-sequence as independent SOA
    problems.
  • More appropriate to call this problem the Address
    Register Assignment (ARA) problem.
  • Requires solving SOA instances, so is at least
    NP-hard.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Sub-sequence1 a b c b c a Sub-sequence2 d
e f e f d
F
C
D
E
2
18
General Offset Assignment (GOA)
  • Each sub-sequence can be viewed as an independent
    SOA problem.
  • Solve each sub-sequence as independent SOA
    problems.
  • More appropriate to call this problem the Address
    Register Assignment (ARA) problem.
  • Requires solving SOA instances, so is at least
    NP-hard.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
19
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
20
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
21
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
22
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
23
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
24
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
25
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
D
E
2
AR1
AR0
26
Address-Code Generation
  • Recall that variables are assigned to address
    registers.
  • There is nothing left to decide each address
    register has a defined sequence of accesses.
  • Imposes a restriction that all access to a
    variable is done by a single address register.

A
B
2
Ex. Access Sequence a d b e c f b e c f a
d Memory Layouts
F
C
Requires Explicit Address Computations
D
E
2
AR1
AR0
27
Traditional Approach to Offset Assignment
a d b e c f b e c f a d
Address Register Assignment
d e f e f d
a b c b c a
Sub-sequence and memory layout accessed by AR0
Sub-sequence and memory layout accessed by AR1
Simple Offset Assignment
Simple Offset Assignment
a, b, c
d, e, f
28
Outline
  • Background
  • Traditional Approach to Offset Assignment
  • Simple Offset Assignment
  • Address-Register Assignment
  • Improving the Problem Model
  • Optimal Address-Code Generation
  • Memory Layout Permutations
  • Evaluating Current Heuristics
  • Methodology
  • Results
  • Conclusions and Future Work

29
Optimal Address-Code Generation
  • Given a fixed access sequence and memory layout,
    it is possible to generate optimal
    addressing-code in polynomial time
  • Minimum-Cost Circulation (Gebotys, 1997)
  • Minimum-Weight Perfect Matching (Udayanarayanan,
    2000)

30
Optimal Address-Code Generation
  • Build a network-flow graph
  • Vertices represent variable accesses
  • For each access ai that occurs before another aj,
    there is an edge (ai,aj) (not all shown the
    graph).
  • Edges represent an opportunity for a register to
    access variables.
  • Each unit flow represents the accesses performed
    by an address register.
  • Optimal Address-Code is found by finding a
    minimum-cost circulation.

31
Traditional Approach to Offset Assignment
Access Sequence
Address Register Assignment
NP-Hard
Sub-Sequence
Sub-Sequence
Sub-Sequence
Simple Offset Assignment
Simple Offset Assignment
Simple Offset Assignment
NP-Complete
Sub-Layout
Sub-Layout
Sub-Layout
Address-Code Generation
Solved, but not used!
Address-Computation Overhead
32
Memory Layout Permutations (MLP)
  • Since optimal address-code generation algorithms
    exist, they can be applied after a memory layout
    is formed (by traditional approaches).
  • However, the traditional approach generates
    multiple sub-layouts that were originally assumed
    to be independent.
  • How is a single memory layout formed from a set
    of sub-layouts?

33
Memory Layout Permutations
  • Let Mi be a memory sub-layout.
  • Let Mir be the reciprocal of Mi
  • Given an access sequence and m memory
    sub-layouts, arrange (M1M1r),,(MmMmr), such
    that overhead is minimum when the sub-layouts are
    placed contiguously in memory.

34
a d b e c f b e c f a d
Memory Layout Permutations Example
Address Register Assignment
This is an optimal address register
assignment These are optimal simple offset
assignments All possible Memory Layout
Permutations (all have cost gt 4) Optimal Layout
b, c, a, d, e, f with cost 4 is not found
d e f e f d
a b c b c a
Simple Offset Assignment
Simple Offset Assignment
a, b, c
d, e, f
Memory Layout Permutations
a, b, c, d, e, f, f, e, d, c, b, a c, b, a,
d, e, f, f, e, d, a, b, c a, b, c, f, e, d,
d, e, f, c, b, a c, b, a, f, e, d, d, e, f,
a, b, c
35
Outline
  • Background
  • Traditional Approach to Offset Assignment
  • Simple Offset Assignment
  • Address-Register Assignment
  • Improving the Problem Model
  • Optimal Address-Code Generation
  • Memory Layout Permutations
  • Evaluating Current Heuristics
  • Methodology
  • Results
  • Conclusions and Future Work

36
Experimental MethodologyEvaluating the Solution
Space
  • Testcases are DSP code kernels from the UTDSP
    benchmark suite.
  • Use gcc to obtain access sequences.
  • The quality of a memory layout is evaluated using
    the minimum-cost circulation technique.
  • The entire solution space is found for each
    access sequence, to be used as a point of
    reference.

37
Experimental MethodologyEvaluating Current
Heuristics
Access Sequence
  • Identified and implemented three Address-Register
    Assignment heuristic algorithms
  • Leupers
  • Sugino
  • Zhuang

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
38
Experimental MethodologyEvaluating Current
Heuristics
Access Sequence
  • Identified and implemented five Simple Offset
    Assignment heuristic algorithms
  • Liao
  • Leupers
  • ALOMA
  • Order-First Use (OFU)
  • Branch and Bound (BB)

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
39
Experimental MethodologyEvaluating Current
Heuristics
Access Sequence
  • Each combination of ARA and SOA algorithm
    generates a set of sub-layouts.
  • All possible memory layout permutations are
    generated, forming a set of memory layouts.
  • Each memory layout is evaluated using the
    Minimum-Cost Circulation technique.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
40
Results
  • The 15 combinations of algorithms produce 15
    distributions overhead values.
  • The distributions are aggregated into one
    distribution.
  • The aggregate distributions represent the
    solution space of all current algorithms.

41
Results
  • Memory layouts have a significant impact on
    overhead.
  • Some layouts have 100 higher overhead than the
    minimum.
  • Over 99 of all layouts have an overhead that is
    50 higher than the minimum.

42
Results
  • Memory layouts produced by traditional approaches
    have a large range of possible overhead values --
    sometimes the same as the entire solution space
    itself.
  • In some cases, no combination of ARA and SOA
    heuristics can produce an optimal layout.

43
Results
  • Memory layouts produced by traditional approaches
    have a large range of possible overhead values --
    sometimes the same as the entire solution space
    itself.
  • In some cases, no combination of ARA and SOA
    heuristics can produce an optimal layout.

44
Distribution of Overhead ValuesTestcase
iir_arr_swp -- infinite impulse response filter
45
Exhaustive Solution SpaceTestcase iir_arr_swp
-- infinite impulse response filter
46
Algorithmic Solution SpaceTestcase iir_arr_swp
-- infinite impulse response filter
47
Efficiency of SOA Algorithms
Access Sequence
  • For each SOA algorithm, combine with each of the
    5 ARA algorithms to generate 5 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
48
Efficiency of SOA Algorithms
Access Sequence
  • For each SOA algorithm, combine with each of the
    5 ARA algorithms to generate 5 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
49
Efficiency of SOA Algorithms
Access Sequence
  • For each SOA algorithm, combine with each of the
    5 ARA algorithms to generate 5 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
50
Efficiency of SOA Algorithms
Access Sequence
  • For each SOA algorithm, combine with each of the
    5 ARA algorithms to generate 5 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
51
Efficiency of SOA Algorithms
Access Sequence
  • For each SOA algorithm, combine with each of the
    5 ARA algorithms to generate 5 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
52
Efficiency of SOA AlgorithmsTestcase
iir_arr_swp -- infinite impulse response filter
53
Efficiency of SOA AlgorithmsTestcase
iir_arr_swp -- infinite impulse response filter
54
Evaluating SOA AlgorithmsTestcase latnrm_ptr --
normalized lattice filter
55
Efficiency of ARA Algorithms
Access Sequence
  • For each ARA algorithm, combine with each of the
    3 SOA algorithms to generate 3 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
56
Efficiency of ARA Algorithms
Access Sequence
  • For each ARA algorithm, combine with each of the
    3 SOA algorithms to generate 3 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
57
Efficiency of ARA Algorithms
Access Sequence
  • For each ARA algorithm, combine with each of the
    3 SOA algorithms to generate 3 distributions of
    overhead values.
  • The distributions can be aggregated to form a
    single distribution.

Leupers
Sugino
Zhuang
Sub-Sequences
Liao
Leupers
ALOMA
OFU
BB
Sub-Layouts
Memory Layout Permutations
Memory Layouts
Compute Overhead for each layout via Minimum-Cost
Circulation
Distribution of Overhead values
58
Efficiency of ARA AlgorithmsTestcase
iir_arr_swp -- infinite impulse response filter
59
Efficiency of ARA AlgorithmsTestcase
iir_arr_swp -- infinite impulse response filter
60
Evaluating ARA AlgorithmsTestcase latnrm_ptr --
normalized lattice filter
61
Evaluating Offset Assignment Algorithms
  • There is low variability between SOA algorithms
    -- may be attributed to small problem sizes.
  • The choice of ARA algorithm has more impact on
    overhead. Much of the variability attributed to
    the different number of address registers used.
  • For all combinations of SOA and ARA algorithms,
    the permutation of sub-layouts affects the
    overhead.

62
Outline
  • Background
  • Traditional Approach to Offset Assignment
  • Simple Offset Assignment
  • Address-Register Assignment
  • Improving the Problem Model
  • Optimal Address-Code Generation
  • Memory Layout Permutations
  • Evaluating Current Heuristics
  • Methodology
  • Results
  • Conclusions and Future Work

63
Conclusions
  • The objective is to minimize address-computation
    overhead.
  • Given a fixed access sequence and memory layout,
    the minimum-cost circulation (MCC) technique can
    minimize overhead.
  • Offset assignment algorithms should be evaluated
    with MCC.
  • Offset assignment still has a significant impact
    on overhead.
  • To be effective, current offset assignment
    algorithms (ARA,SOA) must address the Memory
    Layout Permutation problem.

64
Future Work
  • A new algorithm is needed to generate memory
    layouts that will minimize overhead as computed
    by the Minimum-Cost Flow technique.
  • Address-computation overhead must be minimized
    for loop bodies and for variables that are live
    between basic blocks and procedures.

65
References
  • Gebotys, C. DSP address optimization using a
    minimum cost circulation technique. Proceedings
    of the 1997 IEEE/ACM International Conference on
    Computer-Aided Design. 100-103.
  • Leupers, R., Marwedel, P. Algorithms for address
    assignment in DSP code generation. Proceedins of
    the 1996 IEEE/ACM International Conference on
    Computer-Aided Design. 109-112.
  • Liao, S., Devadas, S., Keutzer, K., Tjiang, S.,
    Wang, A. Storage assignment to decrease code
    size. ACM Transactions of Programming Languages
    and Systems 18(3) (1996). 235-253.
  • Sugino, N., Iimuro, S., Nishihara, A., Jujii, N.
    DSP code optimization utilizing memory addressing
    operation. IEICE Transaction Fundamentals 8
    (1996). 1217-1223.
  • Zhuang, X., Lau, C., Pande, S. Storage
    assignment optimizations through variable
    coalescence for embedded processors. Proceedings
    of the 2003 ACM SIGPLAN Conference on Language,
    Compiler, and Tools for Embedded Systems.
    220-231.
  • Bartley, D.H. Optimizing stack frame accesses
    for processors with restricted addressing modes.
    Software Practice Experience 22(2) (2001).
    158-172.

66
Questions?
Write a Comment
User Comments (0)
About PowerShow.com