Title: Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling
1Automatic Cache Tuning for Energy-Efficiency
using Local Regression Modeling
- Peter Hallschmid and Resve Saleh
- June 7th, 2007
2Overview
- Configurable Processors (ASIPs)
- Design Space Exploration (DSE) for ASIPs
- Previous Work for DSE
- DSE Modeling Approach (LRM-DSE)
- Local Regression Models (LOESS)
- Advantages to non-parametric statistics
- Applying LRM-DSE to Cache Tuning
- Previous work for Cache Tuning
- Experimental platform and base architecture
- Results and Conclusions
3Configurable Processors
- Application Specific Instruction-Set Processors
(ASIPs) - A processing element configured for an
application or application domain
speed and power-efficiency
amortization of design and fabrication costs
ease of use
- Industry (Soft ASIPs)
- Xilinx MicroBlaze
- Altera Nios
- Industry (Hard ASIPs)
- Tensilica Xtensa
- ARC ARCtangent
- Improv Jazz
- Academics
- ASIP Meister (Osaka University)
- LISA (Aachen University)
4Current State of ASIPs
Typical Customization Parameters
- Memory size, MMU support, and addressing
- Cache size, associativity, and write-back policy
- Functional Units FP or not ?, DSP or not ?
- Bus Sizes Endianness
- Register File Size Type
- Number of interrupts
- Synthesis for area, speed, or power?
- Process and standard library specification
- Instruction-set Architecture
How do we automate ASIP configuration ?
Room for Improvement
- Increased customizability
- ( mostly prescribed by base architecture )
- Increased automation
- ( still need to be an expert to use ASIPs )
5Automatic Customization (for ASIPs)
Design Space (DS)
Design Space Exploration (DSE)
Spi ? set of all Parameter i configurations
DS Sp1 x Sp2 x Sp3 x Sp4 x x Spn
Sheer size of the design space makes it
computationally prohibitive to perform an
exhaustive search!
Each point is VERY expensive to evaluate!
6Previous Work I Pruning Techniques
- Decomposition Sanghavi et al. (Tensilica)
DAC01 - to a first-degree, decompose space into parts do
not affect each other
DSSPHT,1 x SPHT,2 x SPHT,3 x SBTB,1 x SBTB,2
DSSPHT,1 x SPHT,2 x SPHT,3 SBTB,1 x SBTB,2
- Sensitivity Analysis Forniciari, Sciuto,
Silvano, and Zaccaria (Politecnico
di Milano) Design Automation for Embedded
Systems - ordered the importance of each design parameter
(from benchmarks) - for the target benchmark, each dimension was
optimized independently
DSSp1 x Sp2 x Sp3 x Sp4 x x Spn
DSSp1 Sp2 Sp3 Sp4 Spn
7Previous Work II Methodologies / Heuristics
- Mescal Methodology Gries, Keutzer et al
(Berkeley) - Amongst other things, describes benchmarking,
design space specification, and DSE heuristics
- PICO Kathail, Aditya et al (HP)
- Describes both DSE heuristics and pruning
techniques - Originally developed by HP
- Pareto Simulated Annealing Agosta, Palermo, and
Silvano (Politecnico di Milano) - Applies simulated annealing techniques with
simultaneous objectives - Used for co-optimization of the architecture and
compiler transformations
8Our Approach Modeling the Design Space
The Design Space (DS)
2) Evaluate sample points (expensive)
3) Create statistical model
4) Estimate all points from the model
1) Specify neighborhood and sample points
9Regression Modeling
- Configuration Parameters ? Predictor Variables
- Cost Function ? Response
Variable
Can we use standard parametric regressions?
- Difficult to predict the nature of
predictor/response interactions and
predictor/predictor interactions a priori
- The nature of the interactions may change based
on the application
Non-Parametric Statistics
- The data itself defines the function
- Captures trends without assumptions about the
nature of the underlying function
- We use an adapted version of Local Regressions
(LOESS) by Cleveland (Journal of the American
Statistical Association, 1974)
- Hence the name Local Regression Modeling-based
Design Space Exploration (LRM-DSE)
10Local Regressions (LOESS)
Based on W. S. Cleveland, Journal of the American
Statistical Association, 74.
11Similar Work
- Lee and Brooks (Harvard)
- uses regressions to model the design space after
a statistical analysis requiring clustering,
association, and correlation analysis.
- predictors are manually combined to form new
predictors to model predictor/predictor
interactions
- nonlinear predictor/response interactions
identified and modeled using cubic splines
better results but the methodology is not
automatic !
- Ipek et al (Cornell and Lawrence Livermore
National Laboratory) - train an artificial neural networks (ANNs) using
samples and then predict remaining points in the
design space.
- nonlinear predictor/response interactions modeled
by the activation functions
- results suggest a sigmoid activation function
successfully capture predictor/predictor
interactions
automatic but pred./pred.interactions not
transparent !
12Experimental Platform
Simplescalar U. of Michigan (Austin) U. of
Texas (Burger) U. of Wisconsin (Sohi)
PowerAnalyzer U. of Michigan (Mudge, Austin)
U. of Colorado (Grunwald) Intel ARM
13Applying LRM-DSE to Cache Tuning
Predictors (Configuration Parameters)
Optimization Goal
Power reported by Panalyzer as J/s cycles
14Experiments
Base Processor
- StrongARM Architecture but with a separate level
1 cache and a unified level 2 cache.
Benchmark Applications
- Ten benchmark applications from the SPECcpu2000
Integer suite and the MediaBench suite.
Sample Selection
- Evenly Distributed Selection
15Model Accuracy
Note
reasonable sampling rates
16Optimization Goal Improvements
17Run-time Improvements
LRM-DSE Model and estimate 1.79 sec
Simulate sample points
1 19,278 points
193 points 11.4 minutes
36 hours 40 minutes
Brute-force Simulation of entire space
19,278 points 11.4 minutes
3,662 hours 49 minutes
More than 3,600 machine-hours simulation time
saved !!!
18Conclusions
- ASIPs provide the flexibility of a
general-purpose processor but with performance
closer to that of ASICs - Introduced a method of performing DSE with a 100x
speed-up that capable of finding near optimal
results - Reduced the power dissipation of our experimental
processor by 14 by tuning the cache for each
application
Impact of Research
- Faster, cheaper, and more energy-efficient ASIPs
- Improved accessibility of this technology through
improved automation - Designers can more easily explore system level
issues through improved system modeling