CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006 - PowerPoint PPT Presentation

About This Presentation
Title:

CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006

Description:

Derek Bruening, Vladimir Kiriansky, Tim Garnett, Sanjeev Banerji (Determina ... Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel China Software Center), Optimizing ... – PowerPoint PPT presentation

Number of Views:231
Avg rating:3.0/5.0
Slides: 41
Provided by: ivanmat
Category:

less

Transcript and Presenter's Notes

Title: CGO 2006: The Fourth International Symposium on Code Generation and Optimization New York, March 26-29, 2006


1
CGO 2006The Fourth International Symposium on
Code Generation and OptimizationNew York,
March 26-29, 2006
  • Conference Review
  • Presented by Ivan Matosevic

2
Outline
  • Conference overview
  • Brief summaries of sessions
  • Keynote speeches
  • Best paper

3
Conference Overview
  • Primary focus back-end compilation techniques
  • Static analysis and optimization
  • Profiling
  • Run-time techniques
  • 8 sessions, 29 papers
  • Dominating topics multicores, dynamic compilation

4
Overview of Session
  • Dynamic Optimization
  • Object-Oriented Code Generation and Optimization
  • Phase Detection and Profiling
  • Tiled and Multicore Compilation
  • Static Code Generation and Optimization Issues
  • SIMD Compilation
  • Optimization Space Exploration
  • Security and Reliability

5
Session 1 Dynamic Optimization
  • Kim Hazelwood (University of Virginia), Robert
    Cohn (Intel), A Cross-Architectural Interface for
    Code Cache Manipulation
  • Pin dynamic instrumentation system with code
    cache
  • The paper describes an API for various operations
    with the code cache (callbacks, lookups,
    statistics, etc.)
  • Derek Bruening, Vladimir Kiriansky, Tim Garnett,
    Sanjeev Banerji (Determina Corporation),
    Thread-Shared Software Code Caches
  • Problem sharing a code cache across multiple
    threads
  • Authors propose a fine-grained locking scheme
  • Evaluation using DynamoRIO

6
Session 1 Dynamic Optimization
  • Keith Cooper, Anshuman Dasgupta (Rice Univ.),
    Tailoring Graph-coloring Register Allocation For
    Runtime Compilation
  • Problem register allocation in JIT compilers
  • Authors propose a novel lightweight
    graph-colouring technique
  • Weifeng Zhang, Brad Calder, Dean Tullsen (UC San
    Diego), A Self Repairing Prefetcher in an
    Event-Driven Dynamic Optimization Framework
  • Extension of the Trident event-driven dynamic
    optimization framework (previously proposed by
    the same authors)
  • Dynamic insertion of prefetching instructions
    based on run-time analysis

7
Session 2 Object-Oriented CodeGeneration and
Optimization
  • Suresh Srinivas, Yun Wang, Miaobo Chen, Qi Zhang,
    Eric Lin, Valery Ushakov, Yoav Zach, Shalom
    Goldenberg (Intel Corporation), Java JNI Bridge
    An MRTE Framework for Mixed Native ISA Execution
  • Use a dynamic translator for the execution of
    native calls to one ISA on a different ISAs Java
    platform
  • Kris Venstermans, Lieven Eeckhout, Koen De
    Bosschere (Ghent University), Space-Efficient
    64-bit Java Objects through Selective Typed
    Virtual Addressing
  • Use address bits on a 64-bit architecture to
    encode object type in order to save memory
  • Objects of the same type allocated in a
    contiguous (virtual) region

8
Session 2 Object-Oriented CodeGeneration and
Optimization
  • Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay
    Sundaresan (IBM Canada), Experiences with
    Multi-threading and Dynamic Class Loading in a
    Java Just-In-Time Compiler
  • The IBM TestaRossa JIT compiler
  • This paper focuses on code patching and profiling
    in a multi-threaded environment with a lot of
    class loading/unloading
  • Lixin Su, Mikko H Lipasti (University of
    Wisconsin Madison), Dynamic Class Hierarchy
    Mutation
  • Run-time reassignment of objects from one derived
    class to another, changing its virtual tables
  • Offers opportunity for optimizations based on
    specialization

9
Session 3 Phase Detection and Profiling
  • Priya Nagpurkar, (UCSB), Michael Hind (IBM),
    Chandra Krintz, (UCSB), Peter Sweeney, V.T. Rajan
    (IBM), Online Phase Detection Algorithms
  • Detecting phase behaviour in virtual machines
  • Track dynamic program parameters (methods
    invoked, branch directions) over time and apply
    a similarity model
  • Jeremy Lau, Erez Perelman, Brad Calder (UC San
    Diego), Selecting Software Phase Markers with
    Code Structure Analysis
  • Portions of code whose execution correlates with
    phase changes
  • Procedure calls and returns, loop boundaries
  • Profile-based hierarchical loop-call graph

10
Session 3 Phase Detection and Profiling
  • Shashidhar Mysore, Banit Agrawal, Timothy
    Sherwood, Nisheeth Shrivastava, Subhash Suri (UC
    Santa Barbara), Profiling over Adaptive Ranges
  • Voted best paper details later
  • Hyesoon Kim, Muhammad Aater Suleman, Onur Mutlu,
    Yale N. Patt (UT-Austin), 2D-Profiling Detecting
    Input-Dependent Branches with a Single Input Data
    Set
  • Predicts whether the prediction accuracy of each
    branch will vary across input sets
  • Heuristic approach used to derive representative
    profiling results from a single input set

11
Session 4 Tiled and Multicore Compilation
  • David Wentzlaff, Anant Agarwal (MIT),
    Constructing Virtual Architectures on a Tiled
    Processor
  • Map components of a superscalar architecture
    (Pentium III) onto a parallel tiled architecture
    (Raw) using dynamic translation
  • In a way, uses Raw as a coarse-grain FPGA
  • Aaron Smith, (UT-Austin), J. Burrill, (UMass at
    Amherst), J. Gibson, B. Maher, N. Nethercote, B.
    Yoder, D. Burger, K. S. McKinley (UT-Austin),
    Compiling for EDGE Architectures
  • TRIPS EDGE (Explicit Data Graph Execution)
    architecture
  • This paper focuses on compilation of standard C
    and FORTRAN benchmarks

12
Session 4 Tiled and Multicore Compilation
  • Shih-wei Liao, Zhaohui Du, Gansha Wu, Guei-Yuan
    Lueh (Intel), Data and Computation
    Transformations for Brook Streaming Applications
    on Multiprocessors
  • Parallel compiler for the Brook streaming
    language
  • An extension of C that enables specifying data
    parallelism
  • Michael L. Chu, Scott A. Mahlke (University of
    Michigan), Compiler-directed Object Partitioning
    for Multicluster Processors
  • Partitioning of data in clustered architectures
    such as Raw
  • I didnt really understand what programming model
    these authors have in mind?

13
Session 5 Static Code Generation
andOptimization Issues
  • Two papers about the HPUX Itanium compiler
  • Dhruva R. Chakrabarti, Shin-Ming Liu
    (Hewlett-Packard), Inline Analysis Beyond
    Selection Heuristics
  • Cross-module techniques for selection of inlined
    call sites and the choice of specialized function
    versions
  • Robert Hundt, Dhruva R. Chakrabarti, Sandya S.
    Mannarswamy (Hewlett-Packard), Practical
    Structure Layout Optimization and Advice
  • Data layout and placement on the heap to improve
    locality
  • Structure splitting, structure peeling, dead
    field removal, and field reordering

14
Session 5 Static Code Generation
andOptimization Issues
  • Chris Lupo, Kent Wilken (University of
    California, Davis), Post Register Allocation
    Spill Code Optimization
  • Authors propose a profile-based algorithm for
    placement of save/restore instructions handling
    spilled variables in function calls
  • Implemented as a part of GCC
  • Seung Woo Son, Guangyu Chen, Mahmut Kandemir
    (Pennsylvania State University), A
    Compiler-Guided Approach for Reducing Disk Power
    Consumption by Exploiting Disk Access Locality
  • Goal restructure code so that disk idle periods
    are lengthened
  • The approach targets array-based programs disk
    layout of array data exposed to the compiler

15
Session 6 SIMD Compilation
  • Jianhui Li, Qi Zhang, Shu Xu, Bo Huang (Intel
    China Software Center), Optimizing Dynamic Binary
    Translation for SIMD Instructions
  • Algorithms for dynamic binary translation of SIMD
    instructions in general-purpose architectures
    (such as MMX in x86)
  • Evaluation using IA-32 binaries on Itanium 2
  • Dorit Nuzman (IBM), Richard Henderson (Red Hat),
    Multi-Platform Auto-Vectorization
  • Implementation of automatic vectorizer for GCC
    4.0

16
Session 7 Optimization-space Exploration
  • Felix Agakov, Edwin Bonilla, John Cavazos, Bjoern
    Franke, Grigori Fursin, Michael O'Boyle, Marc
    Toussaint, John Thomson, Chris Williams (U. of
    Edinburgh), Using Machine Learning to Focus
    Iterative Optimization
  • Predictive modelling used to search the
    optimization space
  • Targets embedded platforms AMD Au1500 and Texas
    Instruments TI C6713
  • Prasad Kulkarni, David Whalley, Gary Tyson
    (Florida State University), Jack Davidson
    (University of Virginia), Exhaustive Optimization
    Phase Order Space Exploration
  • Exhaustive search of the phase order space (15
    phases) using aggressive pruning takes time on
    the order of minutes to hours
  • Targets StrongARM SA-100

17
Session 7 Optimization-space Exploration
  • Zhelong Pan, Rudolf Eigenmann (Purdue
    University), Fast and Effective Orchestration of
    Compiler Optimizations for Automatic Performance
    Tuning
  • Problem find the optimal combination of 38 GCC
    O3 options, targeting Pentium IV and Sparc II
  • Proposed heuristic algorithm that provides s
    quality solution in time on the order of several
    hours

18
Session 8 Security and Reliability
  • Edson Borin, (UNICAMP), Cheng Wang, Youfeng Wu
    (Intel), Guido Araujo (UNICAMP), Software-Based
    Transparent and Comprehensive Control-Flow Error
    Detection
  • Addresses the problem of soft (transient) errors
    that cause branches to incorrect instructions
  • Implemented in SW as a part of a dynamic binary
    translator
  • Tao Zhang, Xiaotong Zhuang, Santosh Pande
    (Georgia Tech), Compiler Optimizations to Reduce
    Security Overheads
  • Optimizations that specifically target techniques
    that implement software protection with minimal
    HW support

19
Session 8 Security and Reliability
  • Susanta Nanda, Wei Li, Tzi-cker Chiueh (State
    University of NY at Stony Brook), BIRD Binary
    Interpretation using Runtime Disassembly
  • Goal framework for automatic detection of
    vulnerabilities such as buffer overflows when the
    source code is not available
  • Static and dynamic disassembly and
    instrumentation targets Windows x86 application

20
Keynote Speeches
  • Wei Li, Principal Engineer, Intel "Parallel
    Programming 2.0"
  • Kevin Stoodley, Fellow and CTO of Compilation
    Technology, IBM "Productivity and Performance
    Future Directions in Compilers"

21
Wei Li Parallel Programming 2.0
  • Major technological change
  • Moores Law continues to increase transistor
    counts
  • However power, memory latency, limits to ILP are
    setting an effective performance ceiling
  • General trend towards thread-level on-chip
    parallelism
  • SMT
  • Chip multiprocessors

22
Wei Li Parallel Programming 2.0
  • Parallel Programming 2.0 refers to the advent
    of multicores
  • A very optimistic future vision

23
Wei Li Parallel Programming 2.0
  • Key issue where will the parallelism come from?
  • Parallel programming needs to become more
    mainstream
  • Consumer vs. HPC/server/database
  • Inclusion into education at more elementary level
  • New tools for greater ease of programming
  • Intels parallel programming tools
  • http//www.intel.com/software

24
K. Stoodley"Productivity and Performance
Future Directions in Compilers"
  • Limits to traditional static compilation
  • Overview of IBM compiler technology
  • Testarossa JIT compiler, Toronto Portable
    Optimizer, Tobey backend
  • Challenges at present and near future
  • Software abstraction complexity forces the
    scope of compilation to higher levels
  • Maintaining high performance backwards
    compatibility increasingly difficult

25
K. Stoodley"Productivity and Performance
Future Directions in Compilers"
  • Future convergence/combination of dynamic and
    static compilation technologies

26
Best Paper
  • Shashidhar Mysore, Banit Agrawal, Timothy
    Sherwood, Nisheeth Shrivastava, Subhash Suri (UC
    Santa Barbara) Profiling over Adaptive Ranges

27
Profiling over Adaptive Ranges
  • Problem how to count specific events efficiently
    and accurately?
  • Code segments executed
  • Memory regions accessed
  • IP addresses of routed packets
  • In all cases, impossible to maintain separate
    counters for the entire range of values
  • Each basic block, memory address, IP address

28
Trade-off Precision vs. Efficiency
Uniform ranges
Unlimited counters
  • Profiling with uniform ranges fails to
    distinguish hot code

29
Higher Precision for Hot Regions
  • Good trade-off with limited resources
  • High precision for hot regions
  • Low precision for colder ones, but this affects
    the accuracy less
  • Challenge how to determine what exactly to count
    with what precision?

30
Solution Adaptive Profiling
  • Start with one counter split counters as they
    become hot

31
Solution Adaptive Profiling
  • Start with one counter split counters as they
    become hot

32
Solution Adaptive Profiling
  • Start with one counter split counters as they
    become hot

33
Counter Merging
  • Problem what if program behaviour changes after
    the initialization phase?

34
Counter Merging
  • Problem what if program behaviour changes after
    the initialization phase?

35
Counter Merging
  • Solution perform counter merging along with
    splitting

36
Counter Merging
  • Counters of merged child nodes added to the parent

37
Counter Merging
  • Counters of merged child nodes added to the parent

38
Counter Merging
  • Problem how to identify nodes for merging?
  • They are by definition those ones that are not
    updated frequently
  • Solution periodic batched merge operations
  • Tree depth grows at logarithmic rate ? can be
    done at exponentially increasing intervals

39
Additional Contributions
  • Heuristics for splitting and merging
  • Theoretical analysis of accuracy guarantees
  • Proposal for hardware implementation
  • Experimental evaluation
  • Memory requirements
  • Average and worst-case errors on benchmarks
  • Performance of HW implementation
  • Accuracies on the order of 98.0-99.8 with only
    8-64K of memory

40
Conclusions
  • Highly interesting program
  • My short presentation certainly doesnt do
    justice to most of the mentioned works!
  • Readings to perhaps consider for future CARG
  • D. Wentzlaff, A. Agarwal, Constructing Virtual
    Architectures on a Tiled Processor
  • A. Smith et al., Compiling for EDGE Architectures
  • F. Agakov et al., Using Machine Learning to Focus
    Iterative Optimization
  • (Highly subjective!)
Write a Comment
User Comments (0)
About PowerShow.com