Simulation of Decode Filter Cache using SimpleScalar simulator

About This Presentation

Title:

Simulation of Decode Filter Cache using SimpleScalar simulator

Description:

Find benchmarks and compile in the platform ... CRC32: This benchmark performs a 32-bit Cyclic Redundancy Check (CRC) on a file. ... – PowerPoint PPT presentation

Number of Views:193

Avg rating:3.0/5.0

Slides: 14

Provided by: feih3

Category:

more less

Transcript and Presenter's Notes

Title: Simulation of Decode Filter Cache using SimpleScalar simulator

1
Simulation of Decode Filter Cache using
SimpleScalar simulator

Presented by Fei Hong

2
Motivation Goals

Instruction fetches and decodes are the major
on-chip power consumers
Optimize the power consumption by reducing
instruction fetches and decodes
Simulate the DFC architecture using simplescalar
To test the performance of DFC

3
Prediction Mechanism

Each sector in DFC has the following fields.
(tag, sector_valid, next_address)
If A is not equal to C, a different control path
will be taken
tag(A) ! tag(C)
(1)
A and B are consecutively accessed. If they
belonged to a small loop
tag(A) tag(B)
(2)
Based on (1) and (2), the prediction for next
fetch
tag(C)
tag(B) (3)

4
Working Process
5
The Platform

Host computer ACPI x86-based PC
Host computer operating system Microsoft Windows
Vista Ultimate
Virtual Machine VMware Workstation version 6.03
Linux operating system Fedora Core 6
Simulator SimpleScalar version 3.0

6
Work have done so far

Setup the platform
Reading the source code of SimpleScalar
Apply my DFC structure and working process to
SimpleScalar
Find benchmarks and compile in the platform
Do simulation using given memory hierarchy
parameters

7
MiBench

dijkstra it constructs a large graph in an
adjacency matrix representation and then
calculates the shortest path between every pair
of nodes using repeated applications of
Dijkstras algorithm.
stringsearch it searches for given words in
phrases using a case insensitive comparison
algorithm.
rijndael encrypt/decrypt it was selected as the
National Institute of Standards and Technologies
Advanced Encryption Standard (AES).
CRC32 This benchmark performs a 32-bit Cyclic
Redundancy Check (CRC) on a file. CRC checks are
often used to detect errors in data transmission.

8
Memory hierarchy parameters
Parameter Value
Instr. size 4B
DFC direct-mapped, 32 secotors, 4 decoded instr. per sector, 8B per decoded instr.
L1 I-cache 16KB, 2-way, 32B line, 1 cycle hit latency
L1 D-cache 8KB, 2-way, 32B line, 1-cycle hit latency
Memory 30-cycle latency
9
Simulation results

reduction in instruction fetches and
decodes

10
Simulation results

Prediction hit rate

11
Simulation results
dijkstra stringsearch rijndael CRC32
sim_num_insn 255620304 4437612 391487315 533385529
il1.accesses 43508918 1605417 236160209 972328
il1.hits 43399500 1568976 228694324 971600
il1.misses 109418 36441 7465885 728
il1.miss_rate 0.0025 0.0227 0.0316 0.0007
dfc.accesses 215740165 3269067 232531480 532674172
dfc.hits 212111386 2832195 155327106 532413201
dfc.misses 3628779 436872 77204374 260971
dfc.miss_rate 0.0168 0.1336 0.3320 0.0005
12
Conclusion

The DFC stores decoded instructions and can be
very small and energy-efficient.
Use of the DFC eliminates both the access to a
much larger instruction cache and the entire
decoding step.
From the simulation results, we can see that most
instruction fetch and decode can be eliminated by
using DFC. Therefore, it is a very efficient way
to optimize the power consumption of embedded
processors.

13
Thank you!

Write a Comment

User Comments (0)