Power Savings in Embedded Processors through Decode Filter Cache

About This Presentation

Title:

Power Savings in Embedded Processors through Decode Filter Cache

Description:

CECS, University of California, Irvine. Introduction ... To minimize decode filter cache miss penalty, the instruction cache is accessed ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 18

Provided by: udokeb

Category:

more less

Transcript and Presenter's Notes

Title: Power Savings in Embedded Processors through Decode Filter Cache

1
Power Savings in Embedded Processors through
Decode Filter Cache

Weiyu Tang, Rajesh Gupta, Alex Nicolau

2
Overview

Introduction
Related Work
Decode Filter Cache
Results and Conclusion

3
Introduction

Instruction delivery is a major power consumer in
embedded systems
Instruction fetch
27 processor power in StrongARM
Instruction decode
18 processor power in StrongARM
Goal
Reduce power in instruction delivery with minimal
performance penalty

4
Related Work

Architectural approaches to reduce instruction
fetch power
Store instructions in small and power efficient
storages
Examples
Line buffers
Loop cache
Filter cache

5
Related Work

Architectural approaches to reduce instruction
decode power
Avoid unnecessary decoding by saving decoded
instructions in a separate cache
Trace cache
Store decoded instructions in execution order
Fixed cache access order
Instruction cache is accessed on trace cache
misses
Targeted for high-performance processors
Increase fetch bandwidth
Require sophisticated branch prediction
mechanisms
Drawbacks
Not power efficient as the cache size is large

6
Related Work

Micro-op cache
Store decoded instructions in program order
Fixed cache access order
Instruction cache and micro-op cache are accessed
in parallel to minimize micro-op cache miss
penalty
Drawbacks
Need extra stage in the pipeline, which increases
misprediction penalty
Require a branch predictor
Per access power is large
Micro-op cache size is large
Power consumption from both micro-op cache and
instruction cache

7
Decode Filter Cache

Targeted processors
Single issue, In-order execution
Research goal
Use a small (and power efficient) cache to save
decoded instructions
Reduce instruction fetch power and decode power
simultaneously
Reduce power without sacrificing performance
Problems to deal with
What kind of cache organization to use
Where to fetch instructions as instructions can
be provided from multiple sources
How to minimize decode filter cache miss latency

8
Decode Filter Cache
fetch address
9
Decode Filter Cache

Decode filter cache organization
Problems with traditional cache organization
The decoded instruction width varies
Save all the decoded instructions will waste
cache space
Our approach
Instruction classification
Classify instructions into cacheable and
uncacheable depending on instruction width
distribution
Use a cacheable ratio to balance the cache
utilization vs. the number of instructions that
can be cached
Sectored cache organization
Each instruction can be cached independently of
neighboring lines
Neighboring lines share a tag to reduce cache tag
store cost

10
Decode Filter Cache

Where to fetch instructions
Instructions can be provided from one of the
following sources
Line buffer
Decode filter cache
Instruction cache
Predictive order for instruction fetch
For power efficiency, either the decode filter
cache or the line buffer is accessed first when
an instruction is likely to hit
To minimize decode filter cache miss penalty, the
instruction cache is accessed directly when the
decode filter cache is likely to miss

11
Decode Filter Cache

Prediction mechanism
When next fetch address and current address map
to the same cache line
If current fetch source is line buffer, the next
fetch source remain the same
If current fetch source is decode filter cache
and the corresponding instruction is valid, the
next fetch source remain the same
Otherwise, the next fetch source is instruction
cache
When fetch address and current address map to
different cache lines
Predict based on next fetch prediction table,
which utilizes control flow predictability
If the tag of current fetch address and the tag
of the predicted next fetch address are same,
next fetch source is decode filter cache
Otherwise, next fetch source is instruction cache

12
Results

Simulation setup
Media Benchmark
Cache size
512B decode filter cache, 16KB instruction cache,
8KB data cache.
Configurations investigated

13
Results reduction in I-cache fetches
14
Results reduction in instruction decodes
15
Results normalized delay
16
Results reduction in processor power
17
Conclusion

There is a basic tradeoff between
no. of the instructions cached as in instruction
caches, and
greater savings in power by reducing decoding,
fetch work (as in decode caches).
We tip this balance in the favor of decode cache
by a coordinated operation of
instruction classification/selective decoding
(into smaller widths)
sectored caches built around this classification
The results show
Average 34 reduction in processor power
50 more effective in power savings than an
instruction filter cache
Less than 1 performance degradation due to
effective prediction mechanism

Write a Comment

User Comments (0)

About PowerShow.com

Power Savings in Embedded Processors through Decode Filter Cache - PowerPoint PPT Presentation

Power Savings in Embedded Processors through Decode Filter Cache

CECS, University of California, Irvine. Introduction ... To minimize decode filter cache miss penalty, the instruction cache is accessed ... – PowerPoint PPT presentation