Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997 - PowerPoint PPT Presentation

Loading...

PPT – Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997 PowerPoint presentation | free to download - id: 6900e2-ZDIyN



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997

Description:

Title: Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997 Author: Computations Last modified by: Department of Computer Science – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Date added: 3 October 2019
Slides: 67
Provided by: Computations
Learn more at: http://vis.computer.org
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Steven F. Ashby Center for Applied Scientific Computing Month DD, 1997


1
Mesh Layouts for Block-Based Caches
Sung-Eui Yoon Peter Lindstrom Lawrence Livermore
National Laboratory
2
Goal
  • Provide cache-coherent layouts of meshes and
    graphs
  • Derive metrics that measure cache-coherence of
    layouts
  • Generality
  • Simplicity
  • Efficiency
  • Accuracy

3
Cache-Coherent Metrics
  • Measure the expected number of cache misses of a
    layout given block-based caches
  • Should correlate well with the observed number of
    cache misses
  • Cache-aware metrics
  • Measure cache-coherence given known cache
    parameters (e.g., block size)
  • Cache-oblivious metrics
  • Consider all possible cache parameters

4
Motivation
  • Lower growth rate of data access speed

130X
Accumulated growth rate during 1993 2004 (log
scale)
46X
20X
1.5X
during 99 - 04
Courtesy Anselmo Lastra, http//www.hcibook.com/e
3/online/moores-law/
5
Memory Hierarchies and Block-Based Caches
Fast memory or cache
Slow memory
Block transfer
Disk
CPU
10-2 sec
10-7 sec
10-8 sec
Access time
6
Main Contributions
  • Propose novel and practical cache-aware and
    cache-oblivious metrics
  • Derive metrics given block-based caches
  • Propose efficient cache-coherent layout
    constructions
  • Apply to different applications

7
Related Work
  • Computation reordering
  • Data layout optimization

8
Computational Reordering
  • Cache-aware Coleman and McKinley 95, Vitter 01,
    Sen et al. 02
  • Cache-oblivious Frigo et al. 99, Arge et al. 04

Focus on specific problems such as sorting and
linear algebra computations
9
Data Layout Optimization
  • Graph and matrix layout Diaz et al. 02
  • Minimum linear arrangement (MLA), bandwidth, and
    wavefront, etc.
  • Space-filling curves
  • Sagan 94, Pascucci and Frank 01, Lindstrom and
    Pascucci 01, Gopi and Eppstein 04
  • Rendering and processing sequences
  • Deering 95, Hoppe 99, Bogomjakov and Gotsman 02,
    Isenburg and Lindstrom 05
  • Cache-oblivious mesh layout
  • Yoon et al. 05

10
Outline
  • Computation models
  • Cache-aware and cache-oblivious metrics
  • Results

11
Outline
  • Computation models
  • Cache-aware and cache-oblivious metrics
  • Results

12
General Framework of Layout Computation
na
Input directed graph, G (N, A)
nb
nd
nc
Cache-coherent metric
Layout algorithm, f
na
nd
nc
nb
..
1D layout, f(N)
13
Two-Level I/O Model Aggarwal and Vitter 88
na
Input directed graph
nb
nd
nc
M cache blocks, whose size is B
Layout algorithm
nb
Cache
na
nd
nc
nb
..
1D layout with block size 3
14
Graph Representation
  • Directed graph, G (N, A)
  • Represent access patterns between nodes
  • Nodes, N
  • Data element
  • (e.g., mesh vertex or mesh triangle)
  • Directed arcs, A
  • Connects two nodes if they are accessed
    sequentially

na
nb
nd
nc
15
Weights of Nodes and Arcs
  • Indicate probabilities that each element will be
    accessed
  • Computed in an equilibrium status during infinite
    random walks
  • Assume that applications infinitely access the
    data according to the input graph
  • Correspond to eigen-values of the probability
    transition matrix

16
Cache-Coherence of a Layout given Block-Based
Caches
  • Expected number of cache misses of a layout
  • Probability accessing a node from another node by
    traversing an arc
  • Conditional probability that we will have a cache
    miss given the above access pattern

na
nb
nd
nc
17
Specialization to Meshes
  • Expected number of cache misses of a layout
  • Probability accessing a node from another node by
    traversing an arc
  • Conditional probability that we will have a cache
    miss given the above access pattern

constant
na
na
1. Two opposite directed arcs 2. Uniform
distribution to access adjacent nodes
given a node
nb
nd
nb
nd
nc
nc
Implicitly derived graph
An input mesh
18
Outline
  • Computation models
  • Cache-aware and cache-oblivious metrics
  • Results

19
Four Different Cases
Cache-aware case single cache block, M1
Cache-oblivious case single cache block, M1
Cache-aware case multiple cache blocks, Mgt1
Cache-oblivious case multiple cache blocks, Mgt1
20
Cache-Aware Single Cache Block, M1
na
Input directed graph
nb
nd
nc
Straddling arcs
Cache, whose block size is B
na
nd
nc
nb
..
1D layout with block size 3
21
Cache-Aware Multiple Cache Blocks, Mgt1
na
Input directed graph
nb
nd
nc
Straddling arcs
Cache
na
nd
nc
nb
..
1D layout with block boundary
22
Final Cache-Aware Metric
  • Counts the number of straddling arcs of the
    layout given a block size B

block index containing the node, i
where
Unit step function, 1 if x gt 0
0 otherwise.
23
High Accuracy of Cache-Aware Metric
Tested block size 4KB
Linear correlation -1, 1 Observed number of cache misses Observed number of cache misses
Linear correlation -1, 1 With 5 cache blocks With 25 cache blocks
Cache-aware metric 0.97 0.97
Z-curve on a uniform grid
Tested layouts Z-curve, Hilbert curve, H-order,
minimum linear arrangement layout, ßO-layout,
geometric CO layout, (bi or uni) row-by-row, (bi
or uni) diagnoal layouts
24
Four Different Cases
Cache-aware case single cache block, M1
Cache-oblivious case single cache block, M1
Cache-aware case multiple cache blocks, Mgt1
Cache-oblivious case multiple cache blocks, Mgt1
25
Cache-Oblivious Single Cache Block, M1
Does not assume a particular block size Then,
what are good representatives for block sizes?
Cache
26
Two Possible Block Size Progressions
  • Arithmetic progression
  • 1, 2, 3, 4,
  • Geometric progression
  • 20 , 21 , 22 , 23 ,
  • Well reflects current caching architectures
  • E.g., L1 32B, L2 64B, Page 4KB, etc.

27
Probability that an Arc is a Straddling Arc
Computed as a probability as a function of arc
length, l
Is an arc straddling given a block size?
Arc length, l, 2
na
nd
nc
nb
..
28
Two Cache-Oblivious Metrics
  • Arithmetic cache-oblivious metric,
  • Geometric cache-oblivious metric,

MLA metric, Arithmetic mean
Arc length of arc (i, j)
Geometric mean of arc lengths
29
Validation for Cache-Oblivious (CO) Metrics
73 of tested power-of-two block sizes
97 of tested block sizes
  • Geometric cache-oblivious metric
  • Practical and useful

The number of cache misses when M 1 (log scale)
Geometric CO layout
Arithmetic CO layout
30
Correlations between Metrics and Observed Number
of Cache Misses
Tested block size 4KB
Linear correlation -1, 1 Observed number of cache misses Observed number of cache misses
Linear correlation -1, 1 With 1 cache block With 5 cache blocks
Geometric CO metric 0.98 0.81
Arithmetic CO metric -0.19 -0.32

Tested with 10 different layouts on a uniform grid
31
Efficient Layout Computation for Our Metrics
  • Cache-aware layouts
  • Optimized with cache-aware metric given a block
    size B
  • Computed from the graph partitioning
  • Geometric cache-oblivious metric
  • Very efficient
  • Can be used in different layout methods

32
Layout Computation with Geometric Cache-Oblivious
Metric
  • Multi-level construction method
  • Partition an input mesh into k different sets
  • Layout partitions based on our metric
  • Generalized layout method
  • for unstructured meshes

1. Partition
2. Lay out
33
Outline
  • Computation models
  • Cache-aware and cache-oblivious metrics
  • Results

34
Applications
  • Isosurface extraction
  • View-dependent rendering

35
Iso-Surface Extraction
Spx model (140K vertices)
  • Uses contour tree van Kreveld et al. 97
  • Runtime is dominated by the traversal of
    iso-surface
  • Layout graph
  • Use an input tetrahedral mesh

36
High Correlation with Number of Cache Misses
Tested block size 4KB
Linear correlation -1, 1 Observed number of cache misses Observed number of cache misses
Linear correlation -1, 1 With 1 cache block With 10K cache blocks
Geometric CO metric 0.99 0.98
Tested with 8 different layouts our geometric
CO, our cache-aware, breadth-first (and
depth-first) layouts, spectral Juvan and Mohar
92, cache-oblivious mesh Yoon et al. 05,
Z-curve Sagan 94, X-axis sorted layouts
37
High Correlation with Runtime Performance
Memory access time is major bottleneck
Disk I/O time is major bottleneck
Linear correlation -1, 1 First iso-surface extraction time Second iso-surface extraction time
Geometric CO metric 0.94 0.94
38
Comparison with Other Layouts
The first iso-surface extraction time (sec)
8 - 77 improvement and very close to the
cache-aware performance
39
View-Dependent Rendering
  • Layout vertices and triangles of progressive
    meshes
  • Used in an efficient VDR system Yoon et al. 04
  • Reduce misses in GPU vertex cache

40
Cache Miss Ratio on Bunny Model
Universal rendering seq. Bogomjakov and Gotsman
02
GPU vertex cache miss ratio
Hoppe Hoppe 99
Theoretical lower bound Bar-Yehuda and Gotsman
96
Geometric CO layout
Vertex cache size
41
Cache Miss Ratio on Power Plant Model
GPU vertex cache miss ratio
Z-curve
COML Yoon et al. 05
Hoppes rendering seq. Hoppe 99
Theoretical lower bound Bar-Yehuda and Gotsman
96
Geometric CO layout
Vertex cache size
42
Conclusion
  • Novel cache-aware and cache-oblivious metrics to
    evaluate layouts
  • Derived metrics based on two-level I/O model
  • Improved the performance of applications without
    modifying codes

OpenCCL, open source library http//gamma.cs.unc
.edu/COL/OpenCCL
43
Ongoing and Future Work
  • Derive a lower bound on our geometric
    cache-oblivious metric
  • Employ mesh compression to further reduce disk
    I/O accesses
  • Investigate efficient layout method for deforming
    models
  • Apply to non-graphics applications
  • e.g., shortest path or other graph computations

44
Cache-Efficient Layouts of Bounding Volume
Hierarchies
  • Yoon and Manocha, Eurographics 06

Ray tracing
Collision detection
45
Acknowledgements
  • Ajith Mascarenhas
  • Martin Isenburg
  • Dinesh Manocha
  • Fabio Bernardon, Joao Comba, and Claudio Silva
  • For their unstructured tetrahedra rendering
    program
  • Members of data analysis group in LLNL
  • Anonymous reviewers

46
UCRL-PRES-225448
This work was performed under the auspices of the
U.S. Department of Energy by University of
California Lawrence Livermore National Laboratory
under contract No. W-7405-ENG-48.
47
Additional slides
48
Cache-Coherence of Layouts
  • Well known heuristics for cache-coherent layouts
  • Space-filling curves Sagan 94
  • How can we compute better layouts?
  • Requires metrics measuring cache-coherence of
    layouts

49
Main Results
  • Define cache-coherence of layout as
  • Expected number of cache misses during random
    walks of a graph given block-based caches
  • Then, the exp. number of cache misses
  • Number of straddling arcs in a cache-aware cache
  • Geometric mean of arc lengths in a
    cache-oblivious case

50
Data Layout Optimization
  • Rendering sequences
  • Triangle strips
  • Deering 95, Hoppe 99, Bogomjakov and Gotsman 02
  • Processing sequences
  • Isenburg and Gumhold 03, Isenburg and Lindstrom
    05

Assume that access pattern globally follows the
layout order
51
Data Layout Optimization
  • Space-filling curves
  • Sagan 94, Velho and Gomes 91, Pascucci and Frank
    01, Lindstrom and Pascucci 01, Gopi and Eppstein
    04

Assume geometric regularity
52
Data Layout Optimization
  • Graph and matrix layout
  • A survey Diaz et al. 02
  • Minimum linear arrangement (MLA)
  • Bandwidth
  • Profile
  • Wavefront, etc.

Does not necessarily produce good layouts for
block-based caches
53
Cache-Aware Metric
  • Expected number of cache misses given a block
    size B

54
Correlation between Cache-Aware Metric and
Observed Number of Cache Misses
R2 0.97
R2 0.97
M 5 M 25 Observed number of
cache misses
Cache-aware metric
Cache block size 4KB
55
Evaluating Existing Layouts
An existing layout, f
  • No known tight bound
  • Compare against the best layout we can construct
  • Employ an efficient sampling method

Is it close to the optimal layout?
Use it
Build a new one
56
Implementation
  • Modify our open source layout computation codes,
    OpenCCL
  • Based on METIS graph partitioning library
    Karypis and Kumar 98
  • Processing speed
  • 15k triangles / second

57
Comparison with Cache-Oblivious Mesh Layouts
(COML) Yoon et al. 05
  • Two major improvements
  • Accuracy
  • Usability

58
Pros and Cons
  • Limitations
  • Not directly applicable to dynamic models
  • Pros
  • Generality
  • Can have benefits without modifying underlying
    codes

59
Specialization to Meshes
na
na
  • Assume an equally likelihood to access adjacent
    nodes given a node

nb
nd
nb
nd
nc
nc
Implicitly derived graph
An input mesh
60
Two Possible Block Size Progressions
Uniform distribution
  • Arithmetic progression
  • 1, 2, 3, 4,
  • Geometric progression
  • 20 , 21 , 22 , 23 ,
  • Well reflects current caching architectures
  • E.g., L1 32B, L2 64B, Page 4KB, etc.

Pr(B)
B
Geometric distribution
Pr(B)
B
61
Cache Miss Ratio with Different Mesh Resolutions
of Bunny Model
GPU vertex cache miss ratio (case size 32)
Hoppes rendering sequence Hoppe 99
COLg
Mesh resolution
62
Correlation between Cache-Aware Metric and Number
of Cache Misses
R2 0.97
R2 0.97
M 5 M 25 Observed number of
cache misses
Cache block size 4KB
63
Correlations with Observed Number of Cache Misses
R2 0.98
R2 0.81
Cache block size 4KB
,





Cache misses One blk Mult blks
Arithmetic CO metric
Geometric CO metric
64
Comparison with Other Layouts
0.94
Correlation
0.98
0.94
0.99
X-axis
BFL
Spectral layout Juvan and Mohar 92
Up to 2X speedup 9 lower than cache-aware layout
DFL
COML Yoon et al. 05
Z-curve
COLg
Aware
65
Comparison with Other Layouts
  • Compute eight different layouts
  • Our geometric cache-oblivious layout
  • Our cache-aware layout
  • Breadth-first layout
  • Depth-first layout
  • Spectral layout Juvan and Mohar 92
  • Cache-oblivious mesh layout Yoon et al. 05
  • Z-curve Sagan 94
  • X-axis sorted layout

66
Our Layout of Bunny Model
About PowerShow.com