Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects - PowerPoint PPT Presentation

About This Presentation
Title:

Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects

Description:

Similar to Alpha 21264. Twelve blocks. 10.03. DL1. 8.99. IL1. 75.6. L2. 2.27. Branch. 1.44. Decode ... Area, wire length and CPI. Iterative TPWL Model ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 31
Provided by: changb
Learn more at: http://eda.ee.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects


1
Floorplanning Optimization with Trajectory
Piecewise-Linear Model for Pipelined Interconnects
  • C. Long, L. J. Simonson, W. Liao and L. He
  • EDA Lab, EE Dept. UCLA
  • DAC 2004

2
Outline
  • Motivation
  • Background
  • Trajectory piecewise-linear CPI model
  • CPI-aware floorplanning
  • Experiment results
  • Conclusion and discussions

3
Motivation
  • Traditional design flow
  • Architecture optimization minimize CPI
  • Floorplanning optimization maximize clock
    frequency
  • Architectural optimization is separated from the
    physical optimization under the assumption that
    layout does NOT change CPI.

4
Traditional Flow
  • A few years ago
  • Clock rates were much lower
  • More time for signal to reach its destination
  • Inductance was less of a factor in delay
  • Interconnects delay was smaller
  • Less resistance
  • Lower aspect ratio meant less capacitance
  • Inter-module communication takes less than one
    cycle
  • Interconnect length used to determine clock
    period (just clock it faster until it doesnt
    work)
  • Floorplanning had no impact on the cycle-by-cycle
    operation (CPI) of the processor

5
A New Interconnect Centric Reality
  • Now
  • Clock rates have increased by an order of
    magnitude
  • My P2 from 1998 is 400MHz, The Prescott P4 will
    be 4.0GHz by the fourth quarter of 04 and has 31
    pipeline stages for integer operations, some of
    which are due to interconnect pipelining
    exclusively
  • Interconnects have longer delay with higher
    aspect ratio
  • Die size is the same
  • A signal can take up to ten clock cycles to
    travel from opposite corner to opposite corner of
    a chip in 90nm technology
  • Likely, the inter-module communication may take
    over one cycle
  • Clock period is now a constraint, not an
    objective
  • Interconnect is pipelined when it cannot meet the
    constraint
  • A pipelined interconnect delays the cycle a
    signal arrives
  • Changes the cycle-by-cycle behavior (CPI) of the
    system
  • Determined by floorplanning

6
How to solve this problem?
  • Evaluate performance during floorplanning
    optimization
  • Efficiency of the evaluation is the key
  • Cycle-accurate simulation is too slow for this
    purpose

Architecture optimization
Floorplanning optimization
ISA, Configuration
Performance evaluation
7
Contributions of our work
  • We have pointed out that the interconnect latency
    has a significant impact on architecture
    performance and it is critical to consider it
    during floorplanning
  • We have developed an efficient table-based
    cycle-per-instruction (CPI) model
  • Called trajectory piece-wise linear (TPWL) model
    with error less than 3.0
  • We have Integrated TPWL CPI model with floorplan
    optimization
  • To reduce CPI by up to 28.57 with a small area
    overhead of 5.72

8
Background
  • Architecture and partitioning
  • A SuperScalar implementation of the MIPS
    instruction set
  • Similar to Alpha 21264
  • Twelve blocks

Block Area(mm2) Block Area(mm2)
IALU1 1.00 IALU2 1.00
IALU3 1.00 IMULT 1.00
F_ADD 1.94 F_MULT 2.07
RUU 3.04 Decode 1.44
Branch 2.27 L2 75.6
IL1 8.99 DL1 10.03
9
Bus Latency Vectors
  • Interface between physical level and architecture
    level
  • Twelve buses
  • Bus latency vectors (B)
  • E.g., B 3, 4, 7,
  • Characterize a floorplan as a vector containing
    the latency of each interconnect

Bus id Terminal Bus id Terminal
1 IALU1, RUU 7 IL1, L2
2 IALU2, RUU 8 DL1, L2
3 IALU3, RUU 9 Branch, IL1
4 IMULT, RUU 10 Decode, Branch
5 FPADD, RUU 11 LSQ, DL1
6 FPMUL, RUU 12 Decode, RUU
10
Miss Events and Performance Loss
  • Types of miss events
  • Data Cache Miss
  • Instruction Cache Miss
  • TLB Miss
  • Branch Miss Prediction
  • Other sources of performance loss
  • Data dependencies
  • Resource Contention

11
Measuring Performance
  • No hardware to measure
  • Need a model of the hardware
  • Simulate the execution of the machine
  • Two types of simulation
  • Trace driven simulation
  • Shade to generate instruction and address trace,
    dinero to model cache, etc.
  • Fast, 10s of instructions on host machine per
    instruction on target machine
  • Inaccurate
  • good for I-Cache performance loss measurement
  • bad for D-Cache performance loss measurement
  • poor for branch miss prediction performance loss
  • very bad for data dependency performance loss
  • Execution driven simulation
  • State of target hardware is maintained and
    updated in memory as each instruction is
    processed
  • Slow, 1000s of instructions on host machine per
    instruction on target machine
  • Cycle-accurate, true to cycle by cycle behavior
    of hardware

12
Cycle Accurate Simulation
  • Given B, compute CPI
  • Modify the architecture according to B
  • Change the configuration file
  • Insert buffers between modules
  • Measure CPI for a subset of the SPEC2000
    benchmark suite
  • Floating point benchmarks equake and mesa
  • Integer benchmarks gzip, vortex and mcf
  • Take the arithmetic mean of these benchmarks as
    the CPI for B

13
CPI Models
  • A CPI model estimate CPI under interested
    parameters such as interconnect latency,
    architecture configuration, etc.
  • CPI models in the literature
  • Static simulation Nussbaum01
  • Based on a single detailed simulation
  • Generate a synthetic instruction trace
  • Take advantage of cache and branch prediction
    statistics
  • Statistical sampling of cycle accurate simulation
  • Sampling instead of truncating selectively
    measuring in detail only an appropriate benchmark
    subset
  • Configuring a systematic sampling simulation run
    to achieve a desired confidence in estimates
  • More efficient than cycle-accurate simulation but
    slow, none of them consider interconnect latency

14
Traditional floorplanning
  • Optimize floorplan via simulated annealing (SA)
    algorithm
  • Objective function
  • Moves
  • Change the position or shape of blocks
  • Cooling scheme
  • Initial temperature
  • Constant cooling rate

15
Floorplanning considering CPI
  • Based on simulated annealing
  • Objective function
  • Extend from traditional floorplanning framework
  • Key is to estimate CPI efficiently
  • Moves and cooling schedule remain the same

16
Trajectory of SA
  • The path that SA follows during optimization is a
    trajectory in the solution space
  • We only need to accurately estimate CPI in the
    area where the trajectory travels
  • The trajectory of SA with objective of area, wire
    length and CPI is close to that of area and wire
    length only

Bus2
Area and wire length
Area, wire length and CPI
Bus1
17
Trajectory Piecewise-linear CPI Model
  • Build a piecewise-linear model for a small
    solution region around the trajectories of SA
  • Three phases sampling, collecting and simulating
  • An example for 2-dimension bus vector

Latency (bus2)
simulation
Latency (bus1)
18
TPWL Sampling
Latency (bus2)
Latency (bus1)
  • Sample a complete simulated annealing process
    with objective of area and total wire length to
    obtain a set of bus latency vectors (points in
    n-dimension)

19
TPWL Collecting
Latency (bus2)
Latency (bus1)
  • Collect all the points obtained in the sampling
    phase in as few as possible balls (TPC problem)

20
TPWL Simulating
Latency (bus2)
simulation
Latency (bus1)
  • Obtain CPI by cycle accurate simulation for the
    center of balls
  • Build a CPI table indexed by these center points

21
CPI estimation under TPWL model
  • Based on each entry, CPI of target B could be
    estimated by first order expansion
  • For each entry, a weight is calculated based on
    the distance between the target B and the entry
    in CPI table
  • The final estimation is the weighted sum of the
    estimation based on each entry

22
CPI-aware Floorplanning- Overview
  • Integrate the TPWL CPI model with a traditional
    floorplanning tool

23
Iterative TPWL model
  • When the trajectory with objective of area and
    total wire length is significantly different from
    the trajectory with objective of area, total wire
    length and CPI, an iterative TPWL model is needed

Bus2
Area and wire length
iteration 1
iteration 2
Area, wire length and CPI
Bus1
24
Iterative TPWL Model
  • Iteratively expand the CPI table to build a
    iterative TPWL (iTPWL) model
  • Based on the TPWL model but from the second
    iteration one, the objective of SA is area, total
    wire length and CPI
  • Improve the accuracy of CPI estimation and the
    quality of the final floorplan

25
Summary on TPWL CPI Model
  • Originally proposed for modeling non-linear
    systems Rewienski03
  • Outperforms other techniques based on quadratic
    reduction
  • TPWL model is suitable for floorplanning
    optimization
  • The trajectory of SA with objective of area,
    total wire length and CPI is close to that with
    objective of area and total wire length only
  • When these two trajectories are not close, iTPWL
    model is employed to improve the accuracy
  • Contribution of this paper on TPWL model
  • Introduce the TPC problem
  • Expand TPWL model to iTPWL model

26
Experiment results
  • Verification of CPI models
  • Error of TPWL model 2.62 Error of iTPWL model
    1.66

27
Impact of models to final floorplans
  • Comparison of the floorplans obtained by access
    ratio, sensitivity rate model, TPWL and iTPWL
    model with objective of area, total wire length
    and CPI
  • Access ratio Use access ratio of interconnects
    to represent the impact to system performance
  • Estimate CPI based on first order expansion on
    the original point

28
Floorplanning with iTPWL Model
  • Comparison between floorplans obtained by
    different objectives

29
Running time
  • Simple-scalar simulation times to build up the
    TPWL and iTPWL model

30
Conclusion and discussion
  • Propose an accurate CPI model with less than 3.0
    error
  • The CPI-aware floorplaner reduce CPI by 28.57
    with a small area overhead of 5.72
  • Expand the TPWL model and improve the accuracy of
    estimation
  • the accuracy of iTPWL model leads to
    floorplanning solutions with high quality and
    enables us to develop good heuristics, such as
    access ratio, to minimize CPI without explicit
    CPI calculation.
  • Plan to apply this model to architecture changes
Write a Comment
User Comments (0)
About PowerShow.com