A Low Energy Set-Associative I-Cache with Extended BTB - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

A Low Energy Set-Associative I-Cache with Extended BTB

Description:

A Low Energy Set-Associative I-Cache with Extended BTB K. Inoue, V. Moshnyaga, and K. Murakami – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 42
Provided by: ino69
Category:

less

Transcript and Presenter's Notes

Title: A Low Energy Set-Associative I-Cache with Extended BTB


1
A Low Energy Set-Associative I-Cache with
Extended BTB
K. Inoue, V. Moshnyaga, and K. Murakami
2
Introduction
Increase in cache size
Power consumed in on-chip caches
DEC 21164 CPU
StrongARM SA-110 CPU
Bipolar ECL CPU
50
25
43
Kamble et. al., Analytical energy Dissipation
Models for Low Power Caches, ISLPED97
Joouppi et. al., A 300-MHz 115-W 32-b Bipolar
ECL Microprocessor ,IEEE Journal

of
Solid-State Circuits93
3
Problem of Conventional Caches
4
Our Proposal
History-Based Tag-Comparison I-Cache
  • Attempts to reduce cache-access energy without
    performance degradation
  • Reuses tag-check results to eliminate unnecessary
    way activation
  • Can achieve 62 of energy reduction with only
    0.2 of performance degradation

5
Conventional Tag-Check Scheme
Completely the same tag-check result!
6
History-Based Tag-Comparison (HBTC) Scheme
Attempts to reuse tag-check results produced
before during a cache-miss interval!
  • The target instruction has been referenced
    before, and
  • No cache miss has occurred since the previous
    reference.

Miss!
Miss!
Ref. A
Ref. A
time
Tag Check!
Reuse!
Cache-miss interval
7
Concept of the HBTC Cache
2. If a cache miss occurs, then we invalidate all
the stored tag-check results
8
Conventional VS. Phased VS. HBTC



Conventional
Phased
HBTC
Reuse
Cache Hit
No Reuse
Cache Miss
9
HBTC SA I- Architecture
PBAreg
10
HBTC I- Operation
Normal Mode (NM) w/ Tag checks Omitting Mode
(OM) w/o Tag checks (Reuse) Tracing Mode (TM)
w/ Tag checks (tag-check results are
preserved into the WPRreg, and are stored into
the WP-table on the next BTB hit )
11
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
12
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
A
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Taken
0
1
2
3
WPreg
Mode Controller
13
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
A
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Taken
0
1
2
3
NO valid WPs are detected!
WPreg
Mode Controller
14
HBTC I- Operation Example
Mode Transition
Valid
PC and Branch prediction result are saved!
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
A
T
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
A
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
NO valid WPs are detected!
WPreg
Mode Controller
15
HBTC I- Operation Example
Mode Transition
Tag-Comparison result is stored into the WPRreg!
Valid
OM
BTB Hit
WPRreg
PBAreg
1
GOtoNM
Invalid
A
T
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Conventional Accesses!
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
16
HBTC I- Operation Example
Mode Transition
Tag-Comparison result is stored into the WPRreg!
Valid
OM
BTB Hit
WPRreg
PBAreg
3
GOtoNM
Invalid
A
T
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Conventional Accesses!
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
17
HBTC I- Operation Example
Mode Transition
Tag-Comparison result is stored into the WPRreg!
Valid
OM
BTB Hit
WPRreg
PBAreg
0
GOtoNM
Invalid
A
T
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Conventional Accesses!
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
18
HBTC I- Operation Example
Mode Transition
The WPRreg is stored into the WP-Table entry
pointed by the PBAreg!
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
A
T
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
BTB Hit!
T
N
B
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
19
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
A
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Taken
0
1
2
3
WPreg
Mode Controller
20
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
A
Branch Target Buffer
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Taken
0
1
2
3
Valid WPs are detected!
WPreg
Mode Controller
21
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Tag-Comparison Reuse
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
1
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
22
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Tag-Comparison Reuse
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
3
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
23
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Tag-Comparison Reuse
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
0
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
24
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
No valid WPs in the WPreg!
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
?
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
25
HBTC I- Operation Example
Mode Transition
Valid
OM
BTB Hit
WPRreg
PBAreg
GOtoNM
Invalid
From I-Cache
WP Table
NM
TM
GOtoNM
Inst. Addr. A
Target Addr.
T
N
Branch Target Buffer
Conventional Accesses!
PC
Inst. Addr. B
Target Addr.
4-way I-Cache
Pred. (T or N)
0
1
2
3
WPreg
Mode Controller
26
Advantages and Disadvantages
Normal Mode (NM) / Tracing Mode (TM)
Omitting Mode (OM)
  • Eliminate unnecessary energy consumption w/o
    performance degradation (during OM)!
  • BTB energy overhead due to WP-table read-accesses
  • BTB access conflict for invalidating all WPs
    (causes 1 stall cycle)
  • BTB access conflict to record WPs (causes 1 stall
    cycle)

27
Evaluation Environment
  • OOO simulation by SimpleScalar
  • 16 KB 4-way I-cache (32 B line size)
  • For others, default parameters were used
  • Cache Energy Model based on Kamble97
  • (including the WP-table read-energy overhead)
  • Assume that the BTB is accessed only when branch
    or jump instructions are executed (instructions
    are pre-decoded)

Kamble97 M.B.Kamble and K.Ghose, Analytical
Energy Dissipation Models For Low Power Caches,
ISLPED97
28
Evaluation Energy and Performance
62
0.2
  • 62 of Ecache reduction with 0.2 of Exe. Time
    increase
  • Even if in the worst case, about 20 of Ecache
    reduction

29
Evaluation Effect of WP invalidation penalty
126.gcc
099.go
Norm. Exe. Time (cycle)
mpeg2(d)
132.ijpeg
WP Invalidation Penalty (cycle)
  • If the penalty is equal to or smaller than 4
    clock cycles, the performance overhead is
    trivial.
  • The performance overhead grows after the penalty
    is more than 4 clock cycles.

30
Evaluation Effect of The Number of WPs
w/ Pre-Decoding
w/o Pre-Decoding
1.2
126.gcc
Energy for Cache Access
1.0
Energy Overhead of BTB
0.8
0.6
Normalized Energy (Joule)
0.4
0.2
0.0
1 2 4 8 16 32
1 2 4 8 16 32
of Way Pointer
  • Increasing the number of WPs makes it possible to
    reuse many tag-check results
  • But, it produces BTB access energy overhead

31
Evaluation Effect of Cache Associativity
mpeg2decode
Conventional
HBTC
Eothers Etag Edata,bl Edata,prectl
Energy (Joule)
1 2 4 8 16 32 64
1 2 4 8 16 32 64
Associativity
  • Conv. Ecache grows with the increase in
    assiciativity
  • HBTC Ecache is reduced with the increase in
    associativity (nlt4), after that, It starts to
    increase (ngt4)

32
Conclusions
History-Based Tag-Comparison Instruction Cache
  1. Recodes tag-check results generated by the
    I-cache into the extended BTB
  2. Attempts to reuse them in order to eliminate
    unnecessary way activation
  3. Achieves 62 of I-cache energy reduction with
    only 0.2 of performance degradation!

Future work
  • Analyze energy consumption based on real chip
    design.

33
Buck Up Slides (History-based Tag-Comparison
Cache)
34
Evaluation Comparison with IS Approach
Interline Sequential approach History-Based
Look-up Cache Combination of IS and HBL
0.8
0.7
0.6
0.5
Normalized Tag-Compare Count
0.4
0.3
0.2
0.1
0.0
099.go 126.gcc 130.li 102.swim
adpcm(d) mpeg2(d) 124.m88ksim 129.comp.
132.ijpeg adpcm(e) mpeg2(e)
35
Evaluation Effects of Cache Associativity
Eothers Etag Edata,bl Edata,prectl
099.go
Conventional
HBL Cache
Energy (Joule)
1 2 4 8 16 32 64
1 2 4 8 16 32 64
Associativity
0.8um CMOS
) M.B.Kamble and K.ghose, Energy-Efficiency of
VLSI Caches A Comparative Study, 10th Int.
Conf. On VLSI Design ) S.J.E.Wilton and
N.P.Jouppi, An Enhanced Access and Cycle Time
Model for On-Chip Caches, WRL Research Report
93/5
36
Evaluation Effects of Cache Associativity
Eothers Etag Edata,bl Edata,prectl
126.gcc
Conventional
HBL Cache
Energy (Joule)
1 2 4 8 16 32 64
1 2 4 8 16 32 64
Associativity
0.8um CMOS
) M.B.Kamble and K.ghose, Energy-Efficiency of
VLSI Caches A Comparative Study, 10th Int.
Conf. On VLSI Design ) S.J.E.Wilton and
N.P.Jouppi, An Enhanced Access and Cycle Time
Model for On-Chip Caches, WRL Research Report
93/5
37
Evaluation Effects of Cache Associativity
Eothers Etag Edata,bl Edata,prectl
132.ijpeg
Conventional
HBL Cache
Energy (Joule)
1 2 4 8 16 32 64
1 2 4 8 16 32 64
Associativity
0.8um CMOS
) M.B.Kamble and K.ghose, Energy-Efficiency of
VLSI Caches A Comparative Study, 10th Int.
Conf. On VLSI Design ) S.J.E.Wilton and
N.P.Jouppi, An Enhanced Access and Cycle Time
Model for On-Chip Caches, WRL Research Report
93/5
38
Evaluation Effects of Cache Associativity
Eothers Etag Edata,bl Edata,prectl
mpeg2decode
Conventional
HBL Cache
Energy (Joule)
1 2 4 8 16 32 64
1 2 4 8 16 32 64
Associativity
0.8um CMOS
) M.B.Kamble and K.ghose, Energy-Efficiency of
VLSI Caches A Comparative Study, 10th Int.
Conf. On VLSI Design ) S.J.E.Wilton and
N.P.Jouppi, An Enhanced Access and Cycle Time
Model for On-Chip Caches, WRL Research Report
93/5
39
Evaluation Effects of of WPs
w/ Pre-Decoding (BTB access occurs only at
branch, or jump, executions)
1.0
126.gcc
132.ijpeg
0.8
0.6
Normalized Energy (Joule)
0.4
0.2
0.0
1 2 4 8 16 32
1 2 4 8 16 32
of Way Pointer
Energy for Cache Access
Energy Overhead at BTB
40
Evaluation Effects of of WPs
w/o Pre-Decoding (BTB access occurs for all
instructions)
1.0
126.gcc
132.ijpeg
0.8
0.6
Normalized Energy (Joule)
0.4
0.2
0.0
1 2 4 8 16 32
1 2 4 8 16 32
of Way Pointer
Energy for Cache Access
Energy Overhead at BTB
41
Evaluation Effect of WP invalidation penalty
BTB Replacement Cache Miss
126.gcc
Normalized Exe. Time (cycle)
Breakdown of WP invalidations
099.go
mpeg2(d)
132.ijpeg
099.go 126.gcc 130.li 102.swim adpcm(d)
mpeg2(d) 124.m88ksim 129.comp.132.ijpeg
adpcm(e) mpeg2(e)
WP Invalidation Penalty (cycle)
  • If the penalty is equal to or smaller than 4
    clock cycles, the performance overhead is
    trivial.
  • The performance overhead grows after the penalty
    is more than 4 clock cycles.
Write a Comment
User Comments (0)
About PowerShow.com