Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling

Description:

Local Regression Models (LOESS) Advantages to non-parametric statistics ... of Local Regressions (LOESS) by Cleveland (Journal of ... Local Regressions (LOESS) ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 19

Provided by: www296

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling

1
Automatic Cache Tuning for Energy-Efficiency
using Local Regression Modeling

Peter Hallschmid and Resve Saleh
June 7th, 2007

2
Overview

Configurable Processors (ASIPs)
Design Space Exploration (DSE) for ASIPs
Previous Work for DSE
DSE Modeling Approach (LRM-DSE)
Local Regression Models (LOESS)
Advantages to non-parametric statistics
Applying LRM-DSE to Cache Tuning
Previous work for Cache Tuning
Experimental platform and base architecture
Results and Conclusions

3
Configurable Processors

Application Specific Instruction-Set Processors
(ASIPs)
A processing element configured for an
application or application domain

speed and power-efficiency
amortization of design and fabrication costs
ease of use

Industry (Soft ASIPs)
Xilinx MicroBlaze
Altera Nios

Industry (Hard ASIPs)
Tensilica Xtensa
ARC ARCtangent
Improv Jazz

Academics
ASIP Meister (Osaka University)
LISA (Aachen University)

4
Current State of ASIPs
Typical Customization Parameters

Memory size, MMU support, and addressing
Cache size, associativity, and write-back policy
Functional Units FP or not ?, DSP or not ?
Bus Sizes Endianness
Register File Size Type
Number of interrupts
Synthesis for area, speed, or power?
Process and standard library specification
Instruction-set Architecture

How do we automate ASIP configuration ?
Room for Improvement

Increased customizability
( mostly prescribed by base architecture )

Increased automation
( still need to be an expert to use ASIPs )

5
Automatic Customization (for ASIPs)
Design Space (DS)
Design Space Exploration (DSE)
Spi ? set of all Parameter i configurations
DS Sp1 x Sp2 x Sp3 x Sp4 x x Spn
Sheer size of the design space makes it
computationally prohibitive to perform an
exhaustive search!
Each point is VERY expensive to evaluate!
6
Previous Work I Pruning Techniques

Decomposition Sanghavi et al. (Tensilica)
DAC01
to a first-degree, decompose space into parts do
not affect each other

DSSPHT,1 x SPHT,2 x SPHT,3 x SBTB,1 x SBTB,2
DSSPHT,1 x SPHT,2 x SPHT,3 SBTB,1 x SBTB,2

Sensitivity Analysis Forniciari, Sciuto,
Silvano, and Zaccaria (Politecnico
di Milano) Design Automation for Embedded
Systems
ordered the importance of each design parameter
(from benchmarks)
for the target benchmark, each dimension was
optimized independently

DSSp1 x Sp2 x Sp3 x Sp4 x x Spn
DSSp1 Sp2 Sp3 Sp4 Spn
7
Previous Work II Methodologies / Heuristics

Mescal Methodology Gries, Keutzer et al
(Berkeley)
Amongst other things, describes benchmarking,
design space specification, and DSE heuristics

PICO Kathail, Aditya et al (HP)
Describes both DSE heuristics and pruning
techniques
Originally developed by HP

Pareto Simulated Annealing Agosta, Palermo, and
Silvano (Politecnico di Milano)
Applies simulated annealing techniques with
simultaneous objectives
Used for co-optimization of the architecture and
compiler transformations

8
Our Approach Modeling the Design Space
The Design Space (DS)
2) Evaluate sample points (expensive)
3) Create statistical model
4) Estimate all points from the model
1) Specify neighborhood and sample points
9
Regression Modeling

Configuration Parameters ? Predictor Variables

Cost Function ? Response
Variable

Can we use standard parametric regressions?

Difficult to predict the nature of
predictor/response interactions and
predictor/predictor interactions a priori

The nature of the interactions may change based
on the application

Non-Parametric Statistics

The data itself defines the function

Captures trends without assumptions about the
nature of the underlying function

We use an adapted version of Local Regressions
(LOESS) by Cleveland (Journal of the American
Statistical Association, 1974)

Hence the name Local Regression Modeling-based
Design Space Exploration (LRM-DSE)

10
Local Regressions (LOESS)

Based on W. S. Cleveland, Journal of the American
Statistical Association, 74.
11
Similar Work

Lee and Brooks (Harvard)
uses regressions to model the design space after
a statistical analysis requiring clustering,
association, and correlation analysis.

predictors are manually combined to form new
predictors to model predictor/predictor
interactions

nonlinear predictor/response interactions
identified and modeled using cubic splines

better results but the methodology is not
automatic !

Ipek et al (Cornell and Lawrence Livermore
National Laboratory)
train an artificial neural networks (ANNs) using
samples and then predict remaining points in the
design space.

nonlinear predictor/response interactions modeled
by the activation functions

results suggest a sigmoid activation function
successfully capture predictor/predictor
interactions

automatic but pred./pred.interactions not
transparent !
12
Experimental Platform
Simplescalar U. of Michigan (Austin) U. of
Texas (Burger) U. of Wisconsin (Sohi)
PowerAnalyzer U. of Michigan (Mudge, Austin)
U. of Colorado (Grunwald) Intel ARM
13
Applying LRM-DSE to Cache Tuning
Predictors (Configuration Parameters)
Optimization Goal
Power reported by Panalyzer as J/s cycles
14
Experiments
Base Processor

StrongARM Architecture but with a separate level
1 cache and a unified level 2 cache.

Benchmark Applications

Ten benchmark applications from the SPECcpu2000
Integer suite and the MediaBench suite.

Sample Selection

Evenly Distributed Selection

Random Selection

15
Model Accuracy
Note
reasonable sampling rates
16
Optimization Goal Improvements
17
Run-time Improvements
LRM-DSE Model and estimate 1.79 sec
Simulate sample points
1 19,278 points
193 points 11.4 minutes
36 hours 40 minutes
Brute-force Simulation of entire space
19,278 points 11.4 minutes
3,662 hours 49 minutes
More than 3,600 machine-hours simulation time
saved !!!
18
Conclusions

ASIPs provide the flexibility of a
general-purpose processor but with performance
closer to that of ASICs
Introduced a method of performing DSE with a 100x
speed-up that capable of finding near optimal
results
Reduced the power dissipation of our experimental
processor by 14 by tuning the cache for each
application

Impact of Research

Faster, cheaper, and more energy-efficient ASIPs
Improved accessibility of this technology through
improved automation
Designers can more easily explore system level
issues through improved system modeling

Write a Comment

User Comments (0)