Title: The GLIMPSES Toolkit Rapid code prototyping for SPEs
1The GLIMPSES ToolkitRapid code prototyping for
SPEs
- Jaswanth Sreeram, Santosh Pande
2Overview of Toolkit
- GLIMPSES Toolkit GLobal Interprocedural Memory
and ParalleliSm Estimator for SPUs - Profile instrumentation support
- Profile parsers and interpreters.
- Analyzers for memory allocation access behavior
- Visualization Engine
3GLIMPSES toolkit
- One of two tools available in public domain
- Rapid Prototyping, Legacy Code Migration and
Performance Tuning on Cell SPEs - Second one is asmvis
- Released on source-forge in mid July
- http//glimpses.sourceforge.net
- OSI certified open source license(s).
- Has received interest for adoption in academia
and industry - Samsung Korea, Codecs and Media computing Group.
- Sony Computer Entertainment America (SCEA)
4GLIMPSES Motivation
- Prototyping large codebases for porting to SPEs
is challenging - Find a partition (set of functions)
- Find a set of upward exposed references
- DMA transfer them and lay them out alignment
- After execution store the results back
- Make sure memory requirements do not exceed
capacity
5Motivation contd.
- Challenges due to architectural attributes
- Limited local store
- High branch penalty
- Suited for vectorizable code rather than scalar
code - SPE/PPE interactions
- Provide programmer with tools to
- Understand program behavior (esp. memory usage)
- Quickly construct candidates partitions for SPE
- Evaluate/Quantify partitions suitability for
SPEs
6GLIMPSES Details
- Memory Estimation tools enable programmer to
- Estimate static dynamic memory usage
- Code, Stack, Heap
- Understand program behavior
- Detect program objects affecting dynamic memory
behavior - Show the correlation between these program
objects and memory usage. - Rank program segments
- Criteria Memory requirements, vectorizability,
branching, etc. - Visualize results interactively.
7Features overview
- Dynamic Call Graph visualization ability to
select a call tree - Memory Requirements
- Dynamic
- Analytical what if scenario calculator for
memory capacity - Memory Access Patterns
- Locality (spatial, temporal, neighbor affinity)
- Ranking
- Criteria based estimates
- Alias and safe pre-fetching information
- Multiple alias analyses available
8Overview
Dyn. Memory Estimator
Analytical Memory Estimator
Partition Estimator
GraphML Trace
Visualization Engine
Test Inputs
Profile Trace
9Visualization
Graph Visualization Area
Results Display Panel
10Visualization contd
11Visualization contd
- Zoom view
- Shows dynamic call chains for a program run (in
this case the program is mpeg2-decode)
12Visualization contd
Function Characteristics
Alias Analysis Algorithm used
Type of Aliases displayed (Must Alias, May
Alias, No Alias)
Aliasing information for pairs of
variables/memory regions.
13Analytical Memory Estimation
- Correlate dynamic memory usage with program
objects - Dynamic memory usage depends on inputs, etc.
- Compiler Analysis
- From each malloc, do a backward traversal to find
instructions that influence the arguments to
malloc. - Construct an arithmetic expression for amount of
memory allocated, in terms of inputs or other
program objects. - Handles control flow constructs (if-then-else,
loops etc)
14Memory Behavior Analytical Estimation
__Malloc_size__1 Picture_WidthPicture_Hei
ght __Malloc_size__2 Picture_WidthPicture
_Height __Malloc_size__3 Picture_WidthPic
ture_Height __Malloc_size__4
Picture_WidthPicture_Height __Malloc_size__5
Chroma_WidthChroma_Height __Malloc_size__
6 Chroma_WidthChroma_Height __Malloc_size
__7 Chroma_WidthChroma_Height __Malloc_si
ze__8 Chroma_WidthChroma_Height
if (cc0)? size Picture_Width
Picture_Height else size Chroma_Width
Chroma_Height ..
for(.) if (..) malloc(size) if
(..) malloc(size)
15Memory References
- Memory reference metrics
- Temporal (frequency)
- Spatial
- Neighbor affinity
- Metrics measured per memory line
- Per function metrics or per-partition metrics
- Visually represented via a color map
- Pale Violet (low) -gt Bright Red (high)
16Memory Ref. Frequency (mpeg2decode)
Memory Reference map (per partition) with 1024B
memory lines
17Mpeg2decode Load recurrence
18Neighbor Affinity
- Metric to describe how well memory layout is
suited to caching - Consider a slice S of length w of the whole
memory access trace and two loads - L1, L2 ? S
- If L1addr L2addr lt line size
then - L1, L2 exhibit neighbor affinity for slice
size w
19Load Neighbor Affinity
20Alias Analysis for libode
- Basic AA (least precise, fastest)
- Aggressive local analysis
- Non context sensitive
- Non-flow sensitive
- Total number of queries 119520497
- No Alias 35924925
- May Alias 83492482
- Must Alias 103090
21Alias Analysis (contd)
- Globals Mod/Ref
- context-sensitive mod/ref and alias analysis for
internal global variables - Very fast, very precise, limited scope
- Total number of queries 119520497
- No Alias 35944215
- May Alias 83473192
- Must Alias 103090
22Alias Analysis (contd)
- Andersons AA algorithm
- Subset-based, flow-insensitive,
context-insensitive, and field-insensitive alias
analysis - Very precise, but slow.
- Total number of queries 119520497
- No Alias 79361105
- May Alias 40057171
- Must Alias 102221
23Ranking (MPEG2Encode)
- Criteria based
- Code Size (csize)
- Stack Size (ssize)
- Heap Size (hsize)
- Branch density (br_density)
- Autovectorizable loops (av_loops)
- Is LS memory limit likely to be hit (ls_limit)
Rank w1csize w2ssize w3hsize
w4br_density w5/(1 av_loops) w6
ls_limit (wi are weights for each criteria)
24Partitioning
- Preprocessing Propogate ranks upwards in the
call graph - Rank(n) Rank(n) ? Rank(n?childi)
- Input Call graph consisting of nodes annotated
with ranks - Output Graph partitions that are suitable for
execution on the SPEs - A partition P is deemed suitable if
- Rank(P?root) lt Threshold
25Effect of threshold on partitions
mpeg2decode
26GLIMPSES status
- Beta version available for download at
- http//glimpses.sourceforge.net
-
- 300MB source code package (includes visualizer)?
- Lines of code (C/C) 447,000
- Third party tools integrated LLVM (Compiler),
Prefuse (Visualization) - Executable Size 422 MB (x86 binaries)
- Typical trace size 900 MB (LIBODE)
- Man-hour effort 750
- Releases
- v.0.8 based on LLVM version 1.8 (July 7th)?
- v.1.0 based on LLVM version 2.0 (undergoing
testing) - Tested to work with large codebases
- LIBODE (115000 lines of code), mpeg2 (10000 lines
of code etc.), SPEC INT 2000? etc.
26
27Ongoing and future work
- More Validation
- Compare partitions produced with those generated
by expert programmers - An inter-procedural, flow-sensitive,
context-sensitive alias analysis algorithm
28Ongoing and future work
- Function data dependence graph
- Encapsulates data flow between functions
- Arguments, aliases, globals
- Important factor in partitioning decisions
affinity between pairs of functions