Simulation of Decode Filter Cache using SimpleScalar simulator - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Simulation of Decode Filter Cache using SimpleScalar simulator

Description:

Find benchmarks and compile in the platform ... CRC32: This benchmark performs a 32-bit Cyclic Redundancy Check (CRC) on a file. ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 14
Provided by: feih3
Category:

less

Transcript and Presenter's Notes

Title: Simulation of Decode Filter Cache using SimpleScalar simulator


1
Simulation of Decode Filter Cache using
SimpleScalar simulator
  • Presented by Fei Hong

2
Motivation Goals
  • Instruction fetches and decodes are the major
    on-chip power consumers
  • Optimize the power consumption by reducing
    instruction fetches and decodes
  • Simulate the DFC architecture using simplescalar
  • To test the performance of DFC

3
Prediction Mechanism
  • Each sector in DFC has the following fields.
  • (tag, sector_valid, next_address)
  • If A is not equal to C, a different control path
    will be taken
  • tag(A) ! tag(C)
    (1)
  • A and B are consecutively accessed. If they
    belonged to a small loop
  • tag(A) tag(B)
    (2)
  • Based on (1) and (2), the prediction for next
    fetch
  • tag(C)
    tag(B) (3)

4
Working Process
5
The Platform
  • Host computer ACPI x86-based PC
  • Host computer operating system Microsoft Windows
    Vista Ultimate
  • Virtual Machine VMware Workstation version 6.03
  • Linux operating system Fedora Core 6
  • Simulator SimpleScalar version 3.0

6
Work have done so far
  • Setup the platform
  • Reading the source code of SimpleScalar
  • Apply my DFC structure and working process to
    SimpleScalar
  • Find benchmarks and compile in the platform
  • Do simulation using given memory hierarchy
    parameters

7
MiBench
  • dijkstra it constructs a large graph in an
    adjacency matrix representation and then
    calculates the shortest path between every pair
    of nodes using repeated applications of
    Dijkstras algorithm.
  • stringsearch it searches for given words in
    phrases using a case insensitive comparison
    algorithm.
  • rijndael encrypt/decrypt it was selected as the
    National Institute of Standards and Technologies
    Advanced Encryption Standard (AES).
  • CRC32 This benchmark performs a 32-bit Cyclic
    Redundancy Check (CRC) on a file. CRC checks are
    often used to detect errors in data transmission.

8
Memory hierarchy parameters
Parameter Value
Instr. size 4B
DFC direct-mapped, 32 secotors, 4 decoded instr. per sector, 8B per decoded instr.
L1 I-cache 16KB, 2-way, 32B line, 1 cycle hit latency
L1 D-cache 8KB, 2-way, 32B line, 1-cycle hit latency
Memory 30-cycle latency
9
Simulation results
  • reduction in instruction fetches and
    decodes

10
Simulation results
  • Prediction hit rate

11
Simulation results
dijkstra stringsearch rijndael CRC32
sim_num_insn 255620304 4437612 391487315 533385529
il1.accesses 43508918 1605417 236160209 972328
il1.hits 43399500 1568976 228694324 971600
il1.misses 109418 36441 7465885 728
il1.miss_rate 0.0025 0.0227 0.0316 0.0007
dfc.accesses 215740165 3269067 232531480 532674172
dfc.hits 212111386 2832195 155327106 532413201
dfc.misses 3628779 436872 77204374 260971
dfc.miss_rate 0.0168 0.1336 0.3320 0.0005
12
Conclusion
  • The DFC stores decoded instructions and can be
    very small and energy-efficient.
  • Use of the DFC eliminates both the access to a
    much larger instruction cache and the entire
    decoding step.
  • From the simulation results, we can see that most
    instruction fetch and decode can be eliminated by
    using DFC. Therefore, it is a very efficient way
    to optimize the power consumption of embedded
    processors.

13
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com