Cycle Accurate Performance Measurement - PowerPoint PPT Presentation

Loading...

PPT – Cycle Accurate Performance Measurement PowerPoint presentation | free to download - id: d7352-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Cycle Accurate Performance Measurement

Description:

Creating a module for capturing cycle-accurate profiles of hardware events ... The Statistics Module Allows You To: Pull Event Signals from anywhere ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 31
Provided by: JohnLo95
Learn more at: http://www.arl.wustl.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Cycle Accurate Performance Measurement


1
Cycle Accurate Performance Measurement
  • Richard Hough
  • Phillip Jones, Scott Friedman, Roger Chamberlain,
    Jason Fritts, John Lockwood, and Ron Cytron
  • rh3_at_wustl.edu
  • http//liquid.arl.wustl.edu/

Funded by NSF Grant ITR-0313203
2
Outline
  • Introduction
  • Motivation
  • Background
  • Architecture
  • Usage
  • Results
  • Future Work
  • Related Work
  • Conclusion

3
Introduction What Are We Doing?
  • Creating a module for capturing cycle-accurate
    profiles of hardware events during the runtime of
    programs on real systems

4
Introduction What Are We Doing?
  • Creating a module for capturing cycle-accurate
    profiles of hardware events during the runtime of
    programs on real systems

Statistics Module
5
Introduction What Are We Doing?
  • Creating a module for capturing cycle-accurate
    profiles of hardware events during the runtime of
    programs on real systems

Statistics Module
6
Introduction What Are We Doing?
  • Creating a module for capturing cycle-accurate
    profiles of hardware events during the runtime of
    programs on real systems

Statistics Module
7
Introduction What Are We Doing?
  • Creating a module for capturing cycle-accurate
    profiles of hardware events during the runtime of
    programs on real systems

Statistics Module
8
Background - FPX
  • Designed and implemented on the FPX platform
  • The FPX platform is
  • Designed for developing pluggable network
    circuits
  • Contains a Virtex 2000e FPGA for design
    deployment
  • Possesses a smaller FPGA used as a network
    interface device
  • Can potentially operate at gigabit line rates

9
Background - LEON2
  • Developed by Gaisler Research
  • Sparc-V8
  • Open-Source VHDL
  • Widely used
  • European Space Agency, etc.
  • Second in popularity only to the Microblaze

10
Motivation Why Not Use Software?
  • Software Profiling Is
  • Inaccurate
  • Many data points estimated
  • Time slices not absolute
  • Profiling affects results
  • Inefficient
  • Unreasonable for real-system deployment
  • Ineffective
  • Difficult to separate OS overhead

11
Motivation Why Not Use Simulation?
  • Simulation is
  • Slow
  • A simple simulation could require 100X more time
    than running the program
  • Bound by the quality of the model
  • The model used may be inaccurate
  • Processors often tweaked without updating the
    documentation Larus

12
Motivation Why Use FPGAs?
  • ASICs are expensive
  • FPGAs provide good blend of cost and accuracy
  • Software simulation of processors is incredibly
    slow
  • Allows for easy prototyping
  • Test new caching methods, tweak the ISA, etc.

13
Motivation Why Put Statsmod In A FPGA?
  • The Statistics Module Allows You To
  • Pull Event Signals from anywhere
  • Evaluate both software and hardware optimizations
  • Tweak the architecture
  • Integrate hardware accelerated modules into
    software solutions
  • Adjust the software algorithm
  • Gather repeatable and reliable results

14
Architecture Naïve Solution
  • Interested in 10 events and counters
  • Naïve solution implements a counter for each
    possibility
  • 100 counters!
  • Not scalable for large systems

15
Architecture Our Solution
  • Better Approach
  • Associate counters to events and methods at run
    time
  • Covers the problem area, but uses less chip space

16
Architecture An In Depth Look
17
Architecture Scalability
Naïve Approach
Address Range Registers
Counters
Events
18
Usage
19
Results What do we get?
  • The next few slides contain data from the Linpack
    benchmark running on the FPGA
  • Linpack is a FPU intensive benchmark
  • While the following slides focus on runtime, it
    is important to remember that the graphs could in
    principle be of any event

20
Results
323,686,726
Clock Cycles
21
Results
22
Results
23
Results
24
Future Work Where can we go?
  • As of a week ago, the StatsMod was successfully
    integrated into a Linux 2.6.11 OS running on Leon
  • Changes have been made to allow a clear
    separation between Process IDs
  • OS, background tasks, threads
  • A device driver allows any program, including the
    program being profiled, to gather the statistics

25
Future Work Where can we go?
  • Programs could now potentially collect statistics
    on themselves perform runtime introspection
  • Adjust operation to conserve power, memory
    accesses, etc.
  • Deeper integration could occur at the kernel
    level to affect scheduler decisions
  • Adds a new dimension for slicing resources
  • Network activity, device activity, page faults,
    etc.

26
Related Work
  • SnoopP
  • Developed by Lesley Shannon and Paul Chow at the
    University of Toronto
  • Collects timing characteristics of programs
    running on a Microblaze processor
  • Focuses on clock cycles only
  • Integrated into the EDK

27
Conclusion
  • In closing, I would like to thank
  • Phillip Jones for his hard work and support
  • Ron Cytron for his mentoring and persistence
  • Scott Friedman for his work on the web interface
  • The rest of the Liquid Architecture team
  • And WISA for the invitation to present

28
Questions?
29
Background Liquid
30
Usage
  • Connect to a secure web server controlling the
    FPGA hardware
  • Upload the desired binary executable, associated
    mapfile, and desired programming bitfile
  • A perl script parses the map file and provides a
    graphical interface for selecting the desired
    address ranges and events
  • Statistic results are tabulated at the end of the
    programs execution
About PowerShow.com