Title: FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model
1 FastPlace Efficient Analytical Placement using
Cell Shifting, Iterative Local Refinement and a
Hybrid Net Model
- Natarajan Viswanathan
- Chris Chong-Nuen Chu
- Iowa State University
- International Symposium on Physical Design
- April 19, 2004
2FastPlace Key Features
Efficient Analytical Placement using
1. Cell Shifting 2. Iterative Local
Refinement 3. Hybrid Net Model
- Standard cell placement
- Wirelength minimization
- Flat placement
3Are Existing Algorithms Adequate?
- Solution Quality
- There may be significant room for improvement
- For existing wirelength-driven placement
algorithms - Cong et al. ASPDAC 03 ISPD 03
- For existing timing-driven placement algorithms
- Cong et al. ICCAD 03
- Efficiency
- Important to have fast placement algorithms
- Circuit sizes are huge in modern design
- Placement must be run in early design stages
4Why Analytical ?
- Inherently minimize the wirelength
- Efficient Intrinsically
- Elegant convex quadratic programming formulation
- Very efficient techniques to solve convex QP
- Typically employ a flat placement methodology
- All cells are placed simultaneously
- Maintain relative positions of cells throughout
the placement process
5Analytical Placement Formulation
-
-
-
-
-
- Analytical Placement Framework
- repeat
- Solve the convex quadratic program
- Spread the cells
- until the cells are evenly distributed
6FastPlace Approach
- Framework
- repeat
- Solve the convex quadratic program ?
- Reduce wirelength by iterative heuristic ?
- Spread the cells ?
- until the cells are evenly distributed ?
- Special features of FastPlace
- Cell Shifting
- Easy-to-compute technique ?
- Enable fast convergence ?
- Hybrid Net Model
- Speed up solving of convex QP ?
- Iterative Local Refinement
- Minimize wirelength based on linear objective ?
- Framework
- repeat
- Solve the convex quadratic program
- Reduce wirelength by iterative heuristic
- Spread the cells
- until the cells are evenly distributed
7Outline
FastPlace Efficient Analytical Placement using
1. Cell Shifting 2. Iterative Local Refinement
3. Hybrid Net Model
8Spreading by Cell Shifting
- Quadratic placement should produce good relative
position of cells - Simple shifting of cells should be able to
produce a good placement - Major difficulties
- How to shift cells in a 2-D region?
- How to make sure wirelength will still be good?
- Our Approach
- Perform 1-D shifting in x and y directions
independently - Interleave a small amount of shifting with
quadratic placement
9Cell Shifting
- Shifting of bin boundary
Uniform Bin Structure
Non-uniform Bin Structure
- Shifting of cells linearly within each bin
- Apply to all rows and all columns independently
10Cell Shifting Animation
NBi
11Pseudo pin and Pseudo net
Pseudo pin
- Need to add forces to prevent cells from
collapsing back - Done by adding pseudo pins and pseudo nets
- Only diagonal and linear terms of the quadratic
system need to be updated - Takes a single pass of O(n) time to regenerate
matrix Q (which is common for both x and y
problems)
Pseudo net
Pseudo pin
Additional Force
Pseudo net
Target Position
Original Position
12Outline
FastPlace Efficient Analytical Placement using
1. Cell Shifting 2. Iterative Local Refinement
3. Hybrid Net Model
13Iterative Local Refinement
- Iteratively go through all the cells one by one
- For each cell, consider moving it in four
directions by a certain distance - Compute a score for each direction based on
- Half-perimeter wirelength (HPWL) reduction
- Cell density at the source and destination
regions - Move in the direction with highest positive score
- (Do not move if no positive score)
- Distance moved (H or V) is
- decreasing over iterations
- Detailed placement is handled
- by the same heuristic
14Outline
FastPlace Efficient Analytical Placement using
1. Cell Shifting 2. Iterative Local Refinement
3. Hybrid Net Model
15Effect of Net Model on Runtime
- Need to replace each multi-pin net by 2-pin nets
- Then the placement problem (even with pseudo
nets) can be formulated as a convex QP - Solved by any convex QP algorithms
- Use Incomplete Cholesky Conjugate Gradient (ICCG)
- Runtime is proportional to of non-zero entries
in Q - Each non-zero entry in Q corresponds to one 2-pin
net - Traditionally, placers model each multi-pin net
by a clique - High-degree nets will generate a lot of 2-pin
nets - Slow down convex QP algorithms significantly
16Clique, Star and Hybrid Net Models
- Star model is introduced by Mo et al. ICCAD-00
for macro placement - Introduce a star node even for 2-pin nets
- Not clear how the placement result will be
affected
pins Net Model
2 Clique
3 Clique
4 Star
5 Star
6 Star
Star Node
Hybrid Model
Clique Model
Star Model
17Equivalence of Clique and Star Models
- Lemma By setting the net weights appropriately,
- clique and star net models are
equivalent. - Proof When star node is at equilibrium position,
- total forces on each cell are the same for
- clique and star net models.
Star Node
Weight ?W
Weight ? kW for a k-pin net
Clique Model
Star Model
18Experimental Setup
- ISPD-02 mixed-mode benchmark suite by IBM
- Macro blocks replaced by standard cells with
width set to 4 x average cell width - 10 whitespace
- FastPlace implemented in C
- Compared with
- MetaPl-Capo 8.8 in default mode
- Dragon 2.2.3 in fixed die mode
- All placers run on a 750MHz Sun Sparc-2 machine
19Placement Benchmark Statistics
Circuit Nodes Terminals Nets Pins Rows
ibm01 12506 246 14111 50566 96
ibm02 19342 259 19584 81199 109
ibm03 22853 283 27401 93573 121
ibm04 27220 287 31970 105859 136
ibm05 28146 1201 28446 126308 139
ibm06 32332 166 34826 128182 126
ibm07 45639 287 48117 175639 166
ibm08 51023 286 50513 204890 170
ibm09 53110 285 60902 222088 183
Ibm10 68685 744 75196 297567 234
Ibm11 70152 406 81454 280786 208
ibm12 70439 637 77240 317760 242
ibm13 83709 490 99666 357075 224
ibm14 147088 517 152772 546816 305
ibm15 161187 383 186608 715823 303
ibm16 182980 504 190048 778823 347
ibm17 184752 743 189581 860036 379
ibm18 210341 272 201920 819697 361
20Clique Net Model vs Hybrid Net Model
Circuit Non-zero Entries Non-zero Entries Non-zero Entries Speed-Up ( Hybrid / Clique )
Circuit Clique Model Hybrid Model Clique / Hybrid Speed-Up ( Hybrid / Clique )
ibm01 109183 41164 2.65 1.5
ibm02 343409 70014 4.90 2.4
ibm03 206069 74680 2.76 1.4
ibm04 220423 84556 2.61 1.2
ibm05 349676 108282 3.23 1.3
ibm06 321308 106835 3.01 1.6
ibm07 373328 147009 2.54 1.3
ibm08 732550 173541 4.22 2.0
ibm09 478777 185102 2.59 1.4
ibm10 707969 251101 2.82 1.6
ibm11 508442 230865 2.20 1.2
ibm12 748371 270849 2.76 1.6
ibm13 744500 295048 2.52 1.5
ibm14 1125147 456474 2.46 1.3
ibm15 1751474 607289 2.88 1.4
ibm16 1923995 668491 2.88 1.3
ibm17 2235716 753507 2.97 1.4
ibm18 2221860 711702 3.12 1.4
Average 2.95 1.5
21Half Perimeter Wirelength
Average Wirelength Ratio FastPlace /
Capo 1.010 FastPlace / Dragon 1.016
22Runtime Comparison
Circuit Runtime Runtime Runtime Speed-Up Speed-Up
Circuit Capo 8.8 Dragon 2.2.3 FastPlace (Capo / FP) (Dragon / FP)
ibm01 3 m 59 s 29 m 06 s 13 s x 18.4 x 134.3
ibm02 7 m 15 s 31 m 13 s 33 s x 13.2 x 56.8
ibm03 8 m 23 s 31 m 49 s 33 s x 15.2 x 57.8
ibm04 10 m 46 s 1 h 5 m 39 s x 16.6 x 100.0
ibm05 10 m 44 s 1 h 48 m 51 s x 12.6 x 127.1
ibm06 12 m 08 s 1 h 21 m 45 s x 16.2 x 108.0
ibm07 18 m 32 s 1 h 47 m 1 m 19 s x 14.1 x 81.3
ibm08 19 m 53 s 4 h 30 m 1 m 33 s x 12.8 x 174.2
ibm09 22 m 50 s 3 h 43 m 1 m 42 s x 13.4 x 131.2
ibm10 29 m 04 s 3 h 19 m 2 m 25 s x 12.0 x 82.3
ibm11 31 m 11 s 2 h 22 m 2 m 13 s x 14.1 x 64.1
ibm12 30 m 41 s 3 h 48 m 2 m 23 s x 12.9 x 95.7
ibm13 39 m 27 s 3 h 04 m 2 m 54 s x 13.6 x 63.4
ibm14 1 h 12 m 7 h 37 m 5 m 34 s x 12.9 x 82.1
ibm15 1 h 30 m 10 h 34 m 8 m 45 s x 10.3 x 72.4
ibm16 1 h 31 m 12 h 06 m 10 m 52 s x 8.4 x 66.8
ibm17 1 h 43 m 26 h 54 m 11 m 30 s x 9.0 x 140.3
ibm18 1 h 44 m 23 h 39 m 12 m 21 s x 8.4 x 114.9
Average x 13.0 x 97.4
23FastPlace - Breakdown of Runtime
All runtime in seconds
24Complexity Analysis
Runtime O(n1.412) where n of pins
Runtime O(n1.37) where n of pins
25Summary
- FastPlace -- Efficient Flat Placement Algorithm
- 13.0x faster than Capo
- 97.4x faster than Dragon
- Comparable WL to Capo and Dragon
- Based on three techniques
- Cell Shifting
- Fast convergence
- Simple computation
- Iterative Local Refinement
- Reduce wirelength based on HPWL measure
- Hybrid Net Model
- 1.5x speedup compared to Clique
- Applicable to any analytical placement tools
26