Catching Accurate Profiles in Hardware - PowerPoint PPT Presentation


PPT – Catching Accurate Profiles in Hardware PowerPoint presentation | free to download - id: f5d1b-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Catching Accurate Profiles in Hardware


SW used to gather program behavior information ... SPEC95:go, li, vortex; SPEC2K: gcc, vortex; deltablue, sis, burg. Compilation: ... – PowerPoint PPT presentation

Number of Views:10
Avg rating:3.0/5.0
Slides: 24
Provided by: deptinform
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Catching Accurate Profiles in Hardware

Catching Accurate Profiles in Hardware
ICS 280/259
  • Satish Narayanasamy, Timothy Sherwood, Suleyman
    Sair, Brad Calder, George Varghese

Presented by Jelena Trajkovic
  • Introduction Motivation
  • Goal
  • Related Work (Stratified Sampler)
  • Interval-based Profiling for a Single Hash
  • Experimental results
  • Multiple-hash Profiler
  • Experimental results

Introduction Motivation
  • SW used to gather program behavior information
  • Architectural support for generating profiles at
  • HW is used to assist SW,
  • dependent on on system SW (for management or
    aggregation of events)
  • HW-only profiler

Introduction Motivation (cont.)
  • HW optimizations that can take advantage of info
    gathered in run-time
  • Cache replacement prefetching
  • identifying loads that cause majority of misses
  • Value based optimization
  • 50 of memory accesses are dominated by 10
    distinct values
  • capture this dynamically? gt this information is
    used for storing compressed values in data cache
  • Trace formation
  • dynamically extracting and ordering frequently
    executed code gt I-fetch more efficient
  • Multiple path execution
  • find branches that are hard to predict and
    execute down multiple paths

  • The goal is to build a profiling scheme that
    satisfies following properties
  • Area Efficient capacity constraints (fixed
    amount of area)
  • Accurate identify important / frequent events
    and count them accurately
  • Timely up-to-date information about program
  • Performance Efficiency and SW Independence
    independent of system SW support to manage
    profiles (accumulate and analyze events),
    identifying in HW

Related Work
  • SW profiling
  • Binary instrumentation (ATOM by Calder et al.)
  • HW counter assisted profiling
  • DCPI system for Alpha Processors
  • HW table based profiling
  • Stratified sampling (Sastry et al.)
  • Co-processor profiler
  • Distill information passed from main processor
    (Ziles and Sohi)

Profiling Events
  • Profiling event combination of several variables
  • instruction PC, load address, register value or
    name, cache miss
  • Tuple represents event as combination of 2
  • ltpc, valuegt

Related Work Stratified Sampler
  • Divides the original input stream into multiple
    streams via hashing (independently sampled)
  • Table of counters
  • number of occurrences of different events
  • counter is selected by applying hash function on
    the input event
  • incremented when event appears in the input
  • on reaching threshold value, counter is reset
    and event is reported (interrupt to the OS)

Related Work Stratified Sampler (cont.)
  • To reduce aliasing and improve accuracy
  • Partial tags, miss counters, state information
  • Hit counters number of occurrences
  • Miss counters tuple hashes to particular entry,
    but tag differs (replacement policy)
  • On reaching threshold value
  • Generate interrupt
  • Buffered, interrupt is sent when buffer fills up
  • Placed in associative counter table, passed to SW
    (via intermediate buffer)
  • Accumulating information in SW (5 interrupt

Interval-based Profiling for a Single Hash
  • Removing SW accumulator table
  • Interval-based
  • significant number of occurrences within
  • reset hash-table counters after every interval
  • improving accuracy - shielding
  • Divide execution time into intervals
  • interval length fixed number of profiling
    events (tuples)
  • capture only events (candidate tuples) that occur
    more than candidate threshold ( of interval

Single Hash Architecture
  • accumulator table is fully associative and tagged
  • if (input tuple is in acc. table )
  • inc counter
  • else
  • hash into hash-table
  • increment corresponding counter
  • hash-table does not contain tags aliasing
  • if (tuple reaches candidate threshold value)
  • if (acc. table is not full)
  • acc. table is allocated
  • mark entry as non-replicable till the end of
  • particular entry is not given as an input to the
    hash-table shielding
  • if (end of the interval)
  • flush hash-table
  • mark all entries in acc. table as replaceable

Single Hash Architecture (cont.)
  • Calculate worst case number of entries in the
    acc. table (avoid capacity and aliasing issues)
    as a function of profile interval length and
    candidate threshold
  • number of events that determine profiling
  • number of occurrences in order to get recorded in
    acc. table (percentage of interval length)
  • e.g. interval length 10,000
  • candidate threshold 1 gt 100 entries
  • 0.1 gt 1,000 entries
  • 10,000 w/ 1 and 1 million w/ 0.1
  • Hash-table 2K entries

Single Hash Architecture (cont.)
  • Hash functions for a given tuple ltpc, valuegt
  • npc flip(randomize(pc))
  • nv randomize(value)
  • index xor-fold(npc xor nv, index-size)
  • Optimizations
  • Retaining keeps top entries in acc. table from
    the previous interval
  • Resetting reset counter in hash-table, after it
    reaches candidate threshold

Experimental setup
  • SPEC95go, li, vortex SPEC2K gcc, vortex
    deltablue, sis, burg
  • Compilation
  • DEC Alpha 21164, DEC C (full optimizations)
  • Profiling analysis ATOM
  • Fast forwarded and then ran for 500 million

Error Calculation
  • For each interval compare candidates seen by HW
    profiler and perfect profiler
  • False Positive
  • False Negative
  • Neutral Positive
  • Neutral Negatives
  • Total error rate for an interval

Experimental Results
  • Accuracy of HW profiling depends
  • number of unique tuples in an interval (distinct
  • number of unique tuples that cross threshold
  • Analysis of candidate tuples

Number of distinct tuples seen in an interval
on average
  • Number of unique candidate tuples in an interval
    on average

  • Percentage of variation of candidates from
  • one interval to the next

Error rates
  • Single Hash table with retaining/resetting
    results across a set of benchmarks

Multiple-hash Profiler
  • Independent hash functions (for each table)
  • if(no entry in acc. table)
  • hash to each table
  • update each counter
  • if(all entries for particular tuple in hash table
    reach candidate threshold)
  • add entry to the acc. table
  • reset counters in hash-table (immediately or at
    the end of interval)
  • Conservative update update just smallest counter

  • Muti-hash profiler for an interval of 10,000, 1
    candidate threshold, and a total number of 2K
    hash-table entries

Muti-hash profiler for an interval of 1 million,
0.1 candidate threshold, and a total number of
hash-table entries of 2K
  • Varying number of hash tables for the best
    muti-hash profiler - C1, R0 (w/ conservative
    update and w/o resetting) (10,00, 1 - L 1mill,
    0.1 - R)

Variation in the error across different intervals
(BSH w/ resetting - L multi-hash w/ conservative
update and no resetting 4hash tables - R)
  • Profiling architecture
  • Efficiently filters out important data
  • Efficient in terms of HW cost (6KB (1KB or 10
    KB) and overhead (no performance overhead)