Power Savings in Embedded Processors through Decode Filter Cache - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Power Savings in Embedded Processors through Decode Filter Cache

Description:

CECS, University of California, Irvine. Introduction ... To minimize decode filter cache miss penalty, the instruction cache is accessed ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 18
Provided by: udokeb
Category:

less

Transcript and Presenter's Notes

Title: Power Savings in Embedded Processors through Decode Filter Cache


1
Power Savings in Embedded Processors through
Decode Filter Cache
  • Weiyu Tang, Rajesh Gupta, Alex Nicolau

2
Overview
  • Introduction
  • Related Work
  • Decode Filter Cache
  • Results and Conclusion

3
Introduction
  • Instruction delivery is a major power consumer in
    embedded systems
  • Instruction fetch
  • 27 processor power in StrongARM
  • Instruction decode
  • 18 processor power in StrongARM
  • Goal
  • Reduce power in instruction delivery with minimal
    performance penalty

4
Related Work
  • Architectural approaches to reduce instruction
    fetch power
  • Store instructions in small and power efficient
    storages
  • Examples
  • Line buffers
  • Loop cache
  • Filter cache

5
Related Work
  • Architectural approaches to reduce instruction
    decode power
  • Avoid unnecessary decoding by saving decoded
    instructions in a separate cache
  • Trace cache
  • Store decoded instructions in execution order
  • Fixed cache access order
  • Instruction cache is accessed on trace cache
    misses
  • Targeted for high-performance processors
  • Increase fetch bandwidth
  • Require sophisticated branch prediction
    mechanisms
  • Drawbacks
  • Not power efficient as the cache size is large

6
Related Work
  • Micro-op cache
  • Store decoded instructions in program order
  • Fixed cache access order
  • Instruction cache and micro-op cache are accessed
    in parallel to minimize micro-op cache miss
    penalty
  • Drawbacks
  • Need extra stage in the pipeline, which increases
    misprediction penalty
  • Require a branch predictor
  • Per access power is large
  • Micro-op cache size is large
  • Power consumption from both micro-op cache and
    instruction cache

7
Decode Filter Cache
  • Targeted processors
  • Single issue, In-order execution
  • Research goal
  • Use a small (and power efficient) cache to save
    decoded instructions
  • Reduce instruction fetch power and decode power
    simultaneously
  • Reduce power without sacrificing performance
  • Problems to deal with
  • What kind of cache organization to use
  • Where to fetch instructions as instructions can
    be provided from multiple sources
  • How to minimize decode filter cache miss latency

8
Decode Filter Cache
fetch address
9
Decode Filter Cache
  • Decode filter cache organization
  • Problems with traditional cache organization
  • The decoded instruction width varies
  • Save all the decoded instructions will waste
    cache space
  • Our approach
  • Instruction classification
  • Classify instructions into cacheable and
    uncacheable depending on instruction width
    distribution
  • Use a cacheable ratio to balance the cache
    utilization vs. the number of instructions that
    can be cached
  • Sectored cache organization
  • Each instruction can be cached independently of
    neighboring lines
  • Neighboring lines share a tag to reduce cache tag
    store cost

10
Decode Filter Cache
  • Where to fetch instructions
  • Instructions can be provided from one of the
    following sources
  • Line buffer
  • Decode filter cache
  • Instruction cache
  • Predictive order for instruction fetch
  • For power efficiency, either the decode filter
    cache or the line buffer is accessed first when
    an instruction is likely to hit
  • To minimize decode filter cache miss penalty, the
    instruction cache is accessed directly when the
    decode filter cache is likely to miss

11
Decode Filter Cache
  • Prediction mechanism
  • When next fetch address and current address map
    to the same cache line
  • If current fetch source is line buffer, the next
    fetch source remain the same
  • If current fetch source is decode filter cache
    and the corresponding instruction is valid, the
    next fetch source remain the same
  • Otherwise, the next fetch source is instruction
    cache
  • When fetch address and current address map to
    different cache lines
  • Predict based on next fetch prediction table,
    which utilizes control flow predictability
  • If the tag of current fetch address and the tag
    of the predicted next fetch address are same,
    next fetch source is decode filter cache
  • Otherwise, next fetch source is instruction cache

12
Results
  • Simulation setup
  • Media Benchmark
  • Cache size
  • 512B decode filter cache, 16KB instruction cache,
    8KB data cache.
  • Configurations investigated

13
Results reduction in I-cache fetches
14
Results reduction in instruction decodes
15
Results normalized delay
16
Results reduction in processor power
17
Conclusion
  • There is a basic tradeoff between
  • no. of the instructions cached as in instruction
    caches, and
  • greater savings in power by reducing decoding,
    fetch work (as in decode caches).
  • We tip this balance in the favor of decode cache
    by a coordinated operation of
  • instruction classification/selective decoding
    (into smaller widths)
  • sectored caches built around this classification
  • The results show
  • Average 34 reduction in processor power
  • 50 more effective in power savings than an
    instruction filter cache
  • Less than 1 performance degradation due to
    effective prediction mechanism
Write a Comment
User Comments (0)
About PowerShow.com