Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling

Description:

Local Regression Models (LOESS) Advantages to non-parametric statistics ... of Local Regressions (LOESS) by Cleveland (Journal of ... Local Regressions (LOESS) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 19
Provided by: www296
Category:

less

Transcript and Presenter's Notes

Title: Automatic Cache Tuning for EnergyEfficiency using Local Regression Modeling


1
Automatic Cache Tuning for Energy-Efficiency
using Local Regression Modeling
  • Peter Hallschmid and Resve Saleh
  • June 7th, 2007

2
Overview
  • Configurable Processors (ASIPs)
  • Design Space Exploration (DSE) for ASIPs
  • Previous Work for DSE
  • DSE Modeling Approach (LRM-DSE)
  • Local Regression Models (LOESS)
  • Advantages to non-parametric statistics
  • Applying LRM-DSE to Cache Tuning
  • Previous work for Cache Tuning
  • Experimental platform and base architecture
  • Results and Conclusions

3
Configurable Processors
  • Application Specific Instruction-Set Processors
    (ASIPs)
  • A processing element configured for an
    application or application domain

speed and power-efficiency
amortization of design and fabrication costs
ease of use
  • Industry (Soft ASIPs)
  • Xilinx MicroBlaze
  • Altera Nios
  • Industry (Hard ASIPs)
  • Tensilica Xtensa
  • ARC ARCtangent
  • Improv Jazz
  • Academics
  • ASIP Meister (Osaka University)
  • LISA (Aachen University)

4
Current State of ASIPs
Typical Customization Parameters
  • Memory size, MMU support, and addressing
  • Cache size, associativity, and write-back policy
  • Functional Units FP or not ?, DSP or not ?
  • Bus Sizes Endianness
  • Register File Size Type
  • Number of interrupts
  • Synthesis for area, speed, or power?
  • Process and standard library specification
  • Instruction-set Architecture

How do we automate ASIP configuration ?
Room for Improvement
  • Increased customizability
  • ( mostly prescribed by base architecture )
  • Increased automation
  • ( still need to be an expert to use ASIPs )

5
Automatic Customization (for ASIPs)
Design Space (DS)
Design Space Exploration (DSE)
Spi ? set of all Parameter i configurations
DS Sp1 x Sp2 x Sp3 x Sp4 x x Spn
Sheer size of the design space makes it
computationally prohibitive to perform an
exhaustive search!
Each point is VERY expensive to evaluate!
6
Previous Work I Pruning Techniques
  • Decomposition Sanghavi et al. (Tensilica)
    DAC01
  • to a first-degree, decompose space into parts do
    not affect each other

DSSPHT,1 x SPHT,2 x SPHT,3 x SBTB,1 x SBTB,2
DSSPHT,1 x SPHT,2 x SPHT,3 SBTB,1 x SBTB,2
  • Sensitivity Analysis Forniciari, Sciuto,
    Silvano, and Zaccaria (Politecnico
    di Milano) Design Automation for Embedded
    Systems
  • ordered the importance of each design parameter
    (from benchmarks)
  • for the target benchmark, each dimension was
    optimized independently

DSSp1 x Sp2 x Sp3 x Sp4 x x Spn
DSSp1 Sp2 Sp3 Sp4 Spn
7
Previous Work II Methodologies / Heuristics
  • Mescal Methodology Gries, Keutzer et al
    (Berkeley)
  • Amongst other things, describes benchmarking,
    design space specification, and DSE heuristics
  • PICO Kathail, Aditya et al (HP)
  • Describes both DSE heuristics and pruning
    techniques
  • Originally developed by HP
  • Pareto Simulated Annealing Agosta, Palermo, and
    Silvano (Politecnico di Milano)
  • Applies simulated annealing techniques with
    simultaneous objectives
  • Used for co-optimization of the architecture and
    compiler transformations

8
Our Approach Modeling the Design Space
The Design Space (DS)
2) Evaluate sample points (expensive)
3) Create statistical model
4) Estimate all points from the model
1) Specify neighborhood and sample points
9
Regression Modeling
  • Configuration Parameters ? Predictor Variables
  • Cost Function ? Response
    Variable

Can we use standard parametric regressions?
  • Difficult to predict the nature of
    predictor/response interactions and
    predictor/predictor interactions a priori
  • The nature of the interactions may change based
    on the application

Non-Parametric Statistics
  • The data itself defines the function
  • Captures trends without assumptions about the
    nature of the underlying function
  • We use an adapted version of Local Regressions
    (LOESS) by Cleveland (Journal of the American
    Statistical Association, 1974)
  • Hence the name Local Regression Modeling-based
    Design Space Exploration (LRM-DSE)

10
Local Regressions (LOESS)

Based on W. S. Cleveland, Journal of the American
Statistical Association, 74.
11
Similar Work
  • Lee and Brooks (Harvard)
  • uses regressions to model the design space after
    a statistical analysis requiring clustering,
    association, and correlation analysis.
  • predictors are manually combined to form new
    predictors to model predictor/predictor
    interactions
  • nonlinear predictor/response interactions
    identified and modeled using cubic splines

better results but the methodology is not
automatic !
  • Ipek et al (Cornell and Lawrence Livermore
    National Laboratory)
  • train an artificial neural networks (ANNs) using
    samples and then predict remaining points in the
    design space.
  • nonlinear predictor/response interactions modeled
    by the activation functions
  • results suggest a sigmoid activation function
    successfully capture predictor/predictor
    interactions

automatic but pred./pred.interactions not
transparent !
12
Experimental Platform
Simplescalar U. of Michigan (Austin) U. of
Texas (Burger) U. of Wisconsin (Sohi)
PowerAnalyzer U. of Michigan (Mudge, Austin)
U. of Colorado (Grunwald) Intel ARM
13
Applying LRM-DSE to Cache Tuning
Predictors (Configuration Parameters)
Optimization Goal
Power reported by Panalyzer as J/s cycles
14
Experiments
Base Processor
  • StrongARM Architecture but with a separate level
    1 cache and a unified level 2 cache.

Benchmark Applications
  • Ten benchmark applications from the SPECcpu2000
    Integer suite and the MediaBench suite.

Sample Selection
  • Evenly Distributed Selection
  • Random Selection

15
Model Accuracy
Note
reasonable sampling rates
16
Optimization Goal Improvements
17
Run-time Improvements
LRM-DSE Model and estimate 1.79 sec
Simulate sample points
1 19,278 points
193 points 11.4 minutes
36 hours 40 minutes
Brute-force Simulation of entire space
19,278 points 11.4 minutes
3,662 hours 49 minutes
More than 3,600 machine-hours simulation time
saved !!!
18
Conclusions
  • ASIPs provide the flexibility of a
    general-purpose processor but with performance
    closer to that of ASICs
  • Introduced a method of performing DSE with a 100x
    speed-up that capable of finding near optimal
    results
  • Reduced the power dissipation of our experimental
    processor by 14 by tuning the cache for each
    application

Impact of Research
  • Faster, cheaper, and more energy-efficient ASIPs
  • Improved accessibility of this technology through
    improved automation
  • Designers can more easily explore system level
    issues through improved system modeling
Write a Comment
User Comments (0)
About PowerShow.com