The GLIMPSES Toolkit Rapid code prototyping for SPEs - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

The GLIMPSES Toolkit Rapid code prototyping for SPEs

Description:

Rapid Prototyping, Legacy Code Migration and Performance Tuning ... Samsung Korea, Codecs and Media computing Group. Sony Computer Entertainment America (SCEA) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 29
Provided by: creat4
Category:

less

Transcript and Presenter's Notes

Title: The GLIMPSES Toolkit Rapid code prototyping for SPEs


1
The GLIMPSES ToolkitRapid code prototyping for
SPEs
  • Jaswanth Sreeram, Santosh Pande

2
Overview of Toolkit
  • GLIMPSES Toolkit GLobal Interprocedural Memory
    and ParalleliSm Estimator for SPUs
  • Profile instrumentation support
  • Profile parsers and interpreters.
  • Analyzers for memory allocation access behavior
  • Visualization Engine

3
GLIMPSES toolkit
  • One of two tools available in public domain
  • Rapid Prototyping, Legacy Code Migration and
    Performance Tuning on Cell SPEs
  • Second one is asmvis
  • Released on source-forge in mid July
  • http//glimpses.sourceforge.net
  • OSI certified open source license(s).
  • Has received interest for adoption in academia
    and industry
  • Samsung Korea, Codecs and Media computing Group.
  • Sony Computer Entertainment America (SCEA)

4
GLIMPSES Motivation
  • Prototyping large codebases for porting to SPEs
    is challenging
  • Find a partition (set of functions)
  • Find a set of upward exposed references
  • DMA transfer them and lay them out alignment
  • After execution store the results back
  • Make sure memory requirements do not exceed
    capacity

5
Motivation contd.
  • Challenges due to architectural attributes
  • Limited local store
  • High branch penalty
  • Suited for vectorizable code rather than scalar
    code
  • SPE/PPE interactions
  • Provide programmer with tools to
  • Understand program behavior (esp. memory usage)
  • Quickly construct candidates partitions for SPE
  • Evaluate/Quantify partitions suitability for
    SPEs

6
GLIMPSES Details
  • Memory Estimation tools enable programmer to
  • Estimate static dynamic memory usage
  • Code, Stack, Heap
  • Understand program behavior
  • Detect program objects affecting dynamic memory
    behavior
  • Show the correlation between these program
    objects and memory usage.
  • Rank program segments
  • Criteria Memory requirements, vectorizability,
    branching, etc.
  • Visualize results interactively.

7
Features overview
  • Dynamic Call Graph visualization ability to
    select a call tree
  • Memory Requirements
  • Dynamic
  • Analytical what if scenario calculator for
    memory capacity
  • Memory Access Patterns
  • Locality (spatial, temporal, neighbor affinity)
  • Ranking
  • Criteria based estimates
  • Alias and safe pre-fetching information
  • Multiple alias analyses available

8
Overview
Dyn. Memory Estimator
Analytical Memory Estimator
Partition Estimator
GraphML Trace
Visualization Engine
Test Inputs
Profile Trace
9
Visualization
Graph Visualization Area
Results Display Panel
10
Visualization contd
11
Visualization contd
  • Zoom view
  • Shows dynamic call chains for a program run (in
    this case the program is mpeg2-decode)

12
Visualization contd
Function Characteristics
Alias Analysis Algorithm used
Type of Aliases displayed (Must Alias, May
Alias, No Alias)
Aliasing information for pairs of
variables/memory regions.
13
Analytical Memory Estimation
  • Correlate dynamic memory usage with program
    objects
  • Dynamic memory usage depends on inputs, etc.
  • Compiler Analysis
  • From each malloc, do a backward traversal to find
    instructions that influence the arguments to
    malloc.
  • Construct an arithmetic expression for amount of
    memory allocated, in terms of inputs or other
    program objects.
  • Handles control flow constructs (if-then-else,
    loops etc)

14
Memory Behavior Analytical Estimation
__Malloc_size__1 Picture_WidthPicture_Hei
ght __Malloc_size__2 Picture_WidthPicture
_Height __Malloc_size__3 Picture_WidthPic
ture_Height __Malloc_size__4
Picture_WidthPicture_Height __Malloc_size__5
Chroma_WidthChroma_Height __Malloc_size__
6 Chroma_WidthChroma_Height __Malloc_size
__7 Chroma_WidthChroma_Height __Malloc_si
ze__8 Chroma_WidthChroma_Height
if (cc0)? size Picture_Width
Picture_Height else size Chroma_Width
Chroma_Height ..
for(.) if (..) malloc(size) if
(..) malloc(size)
15
Memory References
  • Memory reference metrics
  • Temporal (frequency)
  • Spatial
  • Neighbor affinity
  • Metrics measured per memory line
  • Per function metrics or per-partition metrics
  • Visually represented via a color map
  • Pale Violet (low) -gt Bright Red (high)

16
Memory Ref. Frequency (mpeg2decode)
Memory Reference map (per partition) with 1024B
memory lines
17
Mpeg2decode Load recurrence
18
Neighbor Affinity
  • Metric to describe how well memory layout is
    suited to caching
  • Consider a slice S of length w of the whole
    memory access trace and two loads
  • L1, L2 ? S
  • If L1addr L2addr lt line size
    then
  • L1, L2 exhibit neighbor affinity for slice
    size w

19
Load Neighbor Affinity
20
Alias Analysis for libode
  • Basic AA (least precise, fastest)
  • Aggressive local analysis
  • Non context sensitive
  • Non-flow sensitive
  • Total number of queries 119520497
  • No Alias 35924925
  • May Alias 83492482
  • Must Alias 103090

21
Alias Analysis (contd)
  • Globals Mod/Ref
  • context-sensitive mod/ref and alias analysis for
    internal global variables
  • Very fast, very precise, limited scope
  • Total number of queries 119520497
  • No Alias 35944215
  • May Alias 83473192
  • Must Alias 103090

22
Alias Analysis (contd)
  • Andersons AA algorithm
  • Subset-based, flow-insensitive,
    context-insensitive, and field-insensitive alias
    analysis
  • Very precise, but slow.
  • Total number of queries 119520497
  • No Alias 79361105
  • May Alias 40057171
  • Must Alias 102221

23
Ranking (MPEG2Encode)
  • Criteria based
  • Code Size (csize)
  • Stack Size (ssize)
  • Heap Size (hsize)
  • Branch density (br_density)
  • Autovectorizable loops (av_loops)
  • Is LS memory limit likely to be hit (ls_limit)

Rank w1csize w2ssize w3hsize
w4br_density w5/(1 av_loops) w6
ls_limit (wi are weights for each criteria)
24
Partitioning
  • Preprocessing Propogate ranks upwards in the
    call graph
  • Rank(n) Rank(n) ? Rank(n?childi)
  • Input Call graph consisting of nodes annotated
    with ranks
  • Output Graph partitions that are suitable for
    execution on the SPEs
  • A partition P is deemed suitable if
  • Rank(P?root) lt Threshold

25
Effect of threshold on partitions
mpeg2decode
26
GLIMPSES status
  • Beta version available for download at
  • http//glimpses.sourceforge.net
  • 300MB source code package (includes visualizer)?
  • Lines of code (C/C) 447,000
  • Third party tools integrated LLVM (Compiler),
    Prefuse (Visualization)
  • Executable Size 422 MB (x86 binaries)
  • Typical trace size 900 MB (LIBODE)
  • Man-hour effort 750
  • Releases
  • v.0.8 based on LLVM version 1.8 (July 7th)?
  • v.1.0 based on LLVM version 2.0 (undergoing
    testing)
  • Tested to work with large codebases
  • LIBODE (115000 lines of code), mpeg2 (10000 lines
    of code etc.), SPEC INT 2000? etc.

26
27
Ongoing and future work
  • More Validation
  • Compare partitions produced with those generated
    by expert programmers
  • An inter-procedural, flow-sensitive,
    context-sensitive alias analysis algorithm

28
Ongoing and future work
  • Function data dependence graph
  • Encapsulates data flow between functions
  • Arguments, aliases, globals
  • Important factor in partitioning decisions
    affinity between pairs of functions
Write a Comment
User Comments (0)
About PowerShow.com