Title: Update on HP Caliper, the Performance Tool for Itanium HPUX and Linux Systems
1Update on HP Caliper, the Performance Tool for
Itanium HP-UX and Linux Systems
- September 2006
- Speaker Stephen Williams
- Caliper Development Team
- Hewlett-Packard
2Previous webcasts
- An introduction to HP Caliper, what it is, and
how to use it. - Webcast September 9, 2003
- Slides
- http//h21007.www2.hp.com/dspp/files/unprotected/
caliper/HPCaliper090903_ppt.ppt - An update on HP Caliper for HP-UX and Linux
Itanium. Webcast September 21, 2004 - Slides
- http//h21007.www2.hp.com/dspp/files/unprotected/
caliper/Caliper36_092104.ppt - Yet more HP Caliper an update on the Itanium
HP-UX and Linux Performance Tool - Webcast September 20, 2005
- Slides
- http//h21007.www2.hp.com/dspp/files/unprotected/
caliper/Caliper050920.ppt
3Agenda
- Quick overview of HP Caliper
- New features in HP Caliper 3.9, 4.0, and 4.1
- Future directions
- Hints and tips
- Summary
- DSPP information
- Q A
4What is HP Caliper?
- Per-process or system-wide performance
measurement tool, for any Itanium/Itanium2
native applications - For both HP-UX and Linux Integrity servers
- Swiss army knife
- Many different measurements
- Common user interface and options
- Multiple report formats text, CSV, HTML
- Graphical user interface (new at 4.0)
- Uses Performance Monitor Unit (PMU) hardware and
dynamic instrumentation as needed
5Example command lines
caliper measurement options application
app-opts caliper measurement options PID1
PID2 caliper measurement options
-w Examples caliper fprof --html dir_name
sweep3d caliper dcache t p all cc himom.c
caliper cpu -w -o out.txt --dur 10 caliper
scgprof p myproc myscript.sh caliper icache o
out.txt 8451 8452 8453
6Measurements
Used for What? Where? Details?
(instrumented)
Overview cpu, ecount Profiles alat,
branch, dcache, dtlb,
fprof, icache, itlb, cycles
Traces pmu_trace Call graph scgprof,
cgprof Coverage fcover Counts
acount, fcount not in Linux version
7New features since HP Caliper 3.9
- Improved command line usability
- Quick Start reference card
- Improved reports for multi-process applications
- New cycles measurement (dual-core Itanium 2
only) - Richer sets of PMU events (dual-core Itanium 2
only) - System-wide measurements
- Graphical user interface
8Improved command line usability
- scgprof now the default measurement
- caliper myprog collect scgprof data on myprog
- -a no longer required for attaching to processes
- caliper 1234 collect scgprof data on process
1234 - Re-reporting of last recorded data is simple
- caliper report options
- Reporting from an HP Caliper database simplified
- caliper mydb.db
- New default report to down to sourcebut not
instructionlevel (use -r all to get disassembly) - New default --process all (-p all)
9Improved command line usability(short options)
More short options added. Here is the complete
list
Short Form Long
Form -d --database -e (for elapsed
time) --duration -f --options-file -H
(long form help) --help -m
--metrics -o --output-file -p
--process -r --report-details -s
--sampling-spec -t --threads all -v
--version -w --scope system,attr_mod -h or -?
(short form help) no equivalent
10Improved command line usability(short
measurement names)
Measurement names have been shortened
New Name Old Name alat
alat_miss acount arc_count branch
branch_prediction cpu cpu_metrics dcach
e dcache_miss dtlb dtlb_miss fcount
func_count fcover func_cover icache
icache_miss itlb itlb_miss ecount
total_cpu
11Improved command line usability(simplified merge
and diff syntax)
- --join deprecated. Instead, use
- caliper merge -o out.txt db1 db2 . . .
- caliper diff -o out.txt db1 db2
- Note that you can merge per-process data in a
single database - caliper merge -o out.txt mydb
12Quick Start reference card
http//h21007.www2.hp.com/dspp/files/unprotected/c
aliper/caliper-quick-start.pdf
13Quick Start reference card (back side)
14Improved reports for multi-process applications
- Caliper can now report
- Across-process CPU events
- Histograms of processes and associated metrics
- caliper report -o out.txt mydb
- Histograms of executables and associated metrics
- caliper merge -o out.txt mydb
- Use --process-cutoff to change the number of
processes or executables reported in the process
or executable histogram.
15Improved reports for multi-process applications
(cont.)
- Example of a merged process (executable) summary
Process Summary ----------------------------------
--------- Total Cumulat
IP of IP Samples
Total Samples Process -----------------
-------------------------- 67.86 67.86
1797 be (1 instances) 20.17 88.03
534 ecom (1 instances) 5.25
93.28 139 u2comp (1 instances)
4.83 98.11 128 ld (1
instances) 0.72 98.83 19 sh
(4 instances) ------------------------------------
------- Minimum process entries 5, percent
cutoff 2.00, cumulative percent cutoff
100.00 ------------------------------------------
-
16New measurement cycles
- On dual-core Itanium 2 systems, HP Caliper can
now report average cycles per bundle - caliper cycles -o out.txt -r all myprog
- Resulting report resembles an fprof report
(showing IP sample hits), but provides the
following additional information at disassemby
level - Average cycles used to retire bundles. (With no
stalls, bundle should be retired in one cycle.) - Instructions that were split issued (i.e.,
instructions not issued at the same time as the
instruction that precedes them).
17Richer PMU events sets
- On dual-core Itanium 2 systems, HP Caliper now
reports many more PMU events (and derivations) in
one run. An example from an IP Sample (fprof)
report
Metrics Summed for Entire Run --------------------
------------------------------------
PLM Event
Name U..K TH AC AT
Count --------------------------------------------
------------ BE_L1D_FPU_BUBBLE.ALL x___ 0
T F 175989 BE_RSE_BUBBLE.ALL
x___ 0 T F 3250 BE_FLUSH_BUBBLE.AL
L x___ 0 T F
33615 BACK_END_BUBBLE.FE x___ 0 F F
1208011 CPU_OP_CYCLES.ALL x___ 0
T F 752736219 BE_EXE_BUBBLE.ALL
x___ 0 F F 209463 BE_L1D_FPU_BUBBLE.
L1D x___ 0 T F
175989 BE_EXE_BUBBLE.GRALL x___ 0 F F
199727 BE_EXE_BUBBLE.FRALL x___ 0
F F 8014 BE_EXE_BUBBLE.GRGR
x___ 0 F F 67 CPU_CPL_CHANGES.AL
L x___ 0 F F
1731 ---------------------------------------------
-----------
18Richer PMU events sets (cont.)
Unstalled execution (higher is better) 47.44
Unstalled execution of Cycles lost due to
Front end stalls (lower is better) 6.43
stalls due to ICACHE, ITLB and branch execution
of Cycles lost due to Pipeline flush stalls
(lower is better) 9.23 stalls due to
branch misprediction or interruption flush of
Cycles lost due to data access stalls (lower is
better) 33.23 stalls due to DCACHE and
DTLB (includes FR/FR stalls) of Cycles lost due
to RSE stalls (lower is better) 1.45
stalls due to RSE spilling/filling registers
to/from memory of Cycles lost due to Scoreboard
stalls (lower is better) 2.22 stalls due
to FPU and register dependency (excludes FR/FR
stalls) Number of privilege level changes to/from
all privileges 73385 CPU_CPL_CHANGES.ALL
of Cycles lost due to Front end stalls 6.43
100 (BACK_END_BUBBLE.FE / CPU_OP_CYCLES.ALL)
of Cycles lost due to Pipeline flush stalls
9.23 100 (BE_FLUSH_BUBBLE.ALL /
CPU_OP_CYCLES.ALL) of Cycles lost due to data
access stalls (includes FR/FR stalls) 33.23
register load stalls (includes FR/FR)
stalls due to L1D of Cycles lost due to RSE
stalls 1.45 100 (BE_RSE_BUBBLE.ALL /
CPU_OP_CYCLES.ALL) of Cycles lost due to
Scoreboard stalls (excludes FR/FR stalls)
2.22 stalls due to FPU register
dependency stalls of Cycles lost due to
register load stalls (includes FR/FR stalls)
26.81 GR/load dependency stalls FR/load
or FR/FR dependency stalls of Cycles lost due
to FR/load or FR/FR dependency stalls 0.20
100 BE_EXE_BUBBLE.FRALL / CPU_OP_CYCLES.ALL
of Cycles lost due to GR/load dependency stalls
26.61 100 (BE_EXE_BUBBLE.GRALL -
BE_EXE_BUBBLE.GRGR) / CPU_OP_CYCLES.ALL of
Cycles lost due to stalls in L1D cache and L1/L2
DTLB 6.42 100 (BE_L1D_FPU_BUBBLE.L1D /
CPU_OP_CYCLES.ALL) of Cycles lost due to
register dependency stalls (excludes FR/FR
stalls) 2.22 (100 BE_EXE_BUBBLE.ALL /
CPU_OP_CYCLES.ALL) - register load stalls of
Cycles lost due to GR/GR dependency stalls
2.14 100 BE_EXE_BUBBLE.GRGR /
CPU_OP_CYCLES.ALL
19System-wide measurements
- Most measurements can now be made
system-wideacross all processes and CPUs in both
user and kernel space. - Three levels of sample attribution
- --scope system,attr-modattr-procattr-none
- -w equivalent to --scope system,attr-mod
- PLM --event-defaults userkernelall
- Sample command (collect IP samples in both kernel
and user space for 20 seconds) - caliper fprof o o.txt --ev all w e 20
20System-wide measurements (cont.)
- Limitations on HP-UX
- You must be logged in as the root user
- Caliper may not be able to locate some
executables and shared libraries, resulting in
many unattributed samples. Workaround use
--module-search-path - Limitations on Linux
- You cannot exclude idle time and the caliper
process (though we hope to provide this feature
in the future). - Limitations on both HP-UX and Linux
- While caliper runs in system-wide mode, no other
caliper process can be run on the same system.
21New graphical user interface
- An Eclipse RCP application
- Makes it easy to
- Perform measurement collections
- Browse Caliper databases
- See measurement data, with easy drill down
- Can be run on remote Integrity server, with
display shown on your desktop X server (not
recommended on wide-area network) via - caliper -g
- Can be run locally on a Windows or Linux
x86-based system (local GUI client communicates
with Caliper server via ssh or rexec)
22New graphical user interface(Projects view and
Collect view)
Saved collection setup
Start process System wide Attach process
Previously collected data
Start data collection
Required fields and tabs in red
Only applicable collection tabs enabled
23New graphical user interface(Measurement tab of
Collect view)
Data cache misses selected
Stop data collection
Collection in progress
24New graphical user interface(viewing data)
Analyze view
Saved collection specification
Process tree tab opened
Available data sets
Application output
25New graphical user interface(CPU event counts)
Show data for entire application
Show CPU events tab
26New graphical user interface(metrics derived
from CPU events)
CPU events tab scrolled to show derived metrics
27New graphical user interface(histogram viewer)
Maximize or minimize by double-clicking Analyze
view tab
Hottest process (double-click to drill down)
Overview of entire histogram
Percent of applications total misses in process
be
28New graphical user interface(drill down to
functions)
Use stacking bars
Popups for long function names
Show local percents (percent of total for be)
DagNodedagConstMarkPredArc(DagNode , DagNode
, Dag)
Area viewed in table highlighted in Overview
Previous levels visited
29New graphical user interface(drill down to
disassembly)
Show
Source
Source/disasm
Sorted by address
Disassembly
Click to show hotspots in table
30New graphical user interface(sorting)
Sort bundles by misses
31New graphical user interface(call graph viewer)
Multiple Analyze views allowed
Callees visited
Current function
Callers
Callees
32Future directions
- Expected new features at HP Caliper 4.2 (January
07) - Load module-centric reports (e.g., across
process profile of libc.so) - Call stack profiling (with wall-clock
sampling) - Bucketing of data cache miss latencies (to
help ascertain cache levels accessed) - Trap profiling
- Merge/diff capability in graphical user
interface - Caliper Advisor integrated with graphical user
interface - Features beyond HP Caliper 4.2
- Caliper Advisor cheatsheets in graphical user
interface - Data-centric cache miss reports
- Integration with Ktrace/Kprofile
- More data visualization aides in graphical
user interface - Per-CPU/per-thread CPU metrics
33Load modules as top level (v4.2)
View load modules as top level
34Call-stack profile (v4.2)
Graph hot call paths by running time, blocked
time, or both
35CPU metrics overview (v4.2)
Overview of metrics collected by cpu measurement
(default metrics)
36Call-stack samples display (potential future
display)
Overview of running and stopped threads
Sample cursor (drag to any point)
Call stacks at sample 754
Playback controls
37Data-centric cache miss profile
display(potential future display)
Double-click row to see functions disassembly
Double-click row (below) to view instruction
addresses (above
Double-click row (below) to view data addresses
(above
383D histograms (potential future display)
Figure from CxPerf Users Guide
39Hints and tips caliper command
- Getting CPU event names from caliper
- Dump all events names and descriptions
- caliper info all
- List all event names (no other fields)
- caliper info all d name
- List names of all events containing string
L3 - caliper info L3 d name
- Or, use an ambiguous event name
- caliper ecount metric L3_READ myprog
- HP Caliper usage error
- Ambiguous event name ("L3_READ") specified for
"--metrics". - Matches L3_READS.ALL.ALL, L3_READS.ALL.HIT,
L3_READS.ALL.MISS, L3_READS.DATA_READ.ALL,
L3_READS.DATA_READ.HIT, L3_READS.DATA_READ.MISS,
L3_READS.DINST_FETCH.ALL, L3_READS.DINST_FETCH.HI
T, L3_READS.DINST_FETCH.MISS, L3_READS.INST_FETCH
.ALL, L3_READS.INST_FETCH.HIT, L3_READS.INST_FETC
H.MISS.
40Hints and tips caliper command (cont.)
- Getting report help
- Dump help file for cycles measurement
- caliper info r cycles
- Append help to a report
- caliper cycles --info o out.txt myprog
- Providing command options using a file
- caliper fprof f myOptionsFile
- Helping Caliper find
- Source code
- --source-path-map dirmapdirmap
- Symbols and disassembly
- --module-search-path dirdir
- Where map old_path,new_path
41Hints and tips using views
Close
Restore views
Minimize
Restore default locations
Maximize
Local view menu
Common view menu (right-click on tab)
Detached view
(not suported by Motif)
42Summary
- Itanium execution performance tool
- Measures production applications
- Measures entire system
- Wide range of performance metrics available
- Explore performance data using textual or
graphical reports - Help available from caliper-help_at_cup.hp.com
- Available on HP-UX and Linux
- http//www.hp.com/go/caliper
43DSPP Tools Resources for Itanium2
Architecture Set You Up for Success
- Community
- Itanium architecture forums, source code
repository, document sharing and mailing lists - Training and Education
- online and classroom training
-
- News Events
- Software
- development environments, compilers, operating
systems, installation/configuration tools,
performance tools and more - Technical documentation
- white papers, tutorials, references documents and
manuals, FAQs, known problems, sample code, etc.
- Partner Resources
- webconferencing services
- podcast production services
- trade show discounts
- Equipment
- rentals and purchase discounts
44Where to go
- Software Developer Resource Kit for the Intel
Itanium2 microarchitecture www.hp.com/go/hpita
niumdvd - Development and Business Resources from HP
Intel for HP Integrity-based solutions
www.hp.com/go/dspp-eap - Contact points for additional information
- Americas email dspp.dev_at_hp.com
- telephone 1.800.249.3294
- Europe email dspp.emea_at_hp.com
- telephone 800.100.929.70
- Asia-Pac email hpdev.support_at_hp.com
or go to www.hp.com/go/dspp for local country
phone numbers
45Complete Survey to Win
- HP Intel are giving away an HP
laptop to 1(one) lucky winner!! - Promotion Period ends November 19, 2006
- Attend a webcast AND complete the
post-event survey. - Full promotion details can be found on DSPP at
http//h21007.www2.hp.com/dspp/bus/bus_BusDetailPa
ge_IDX/1,1252,9284,00.html
46More Events
Tuesday, October 24 New Dual-Core Processor
and Server Hardware Tuesday, November 28 Open
MP Tuesday, December 19 HP-MPI Sign up for the
DSPP newsletter to get the latest webcast
information sent to you directly. Webcast
replays may also be found at www.hp.com/go/itaniu
mwebcasts Did you know...that your company can
use this same webconferencing tool at a
discounted price - to promote your HP Integrity
solutions to your staff and customers? For
members only... http//h21007.www2.hp.com/dspp/bus
/bus_BusDetailPage_IDX/1,,9173!0!,00.html
47Intel Early Access Program - Technology
- The Early Access Program (EAP) gives you access
to Intel technology to support your current
development cycle as well as early access to
tools and information on new technologies. Your
membership includes - Early access to pre-release software development
platforms - Access to Intel and 3rd party software and
testing tools - Training through Intel Software College and Web
events - Technical content and howto articles
- Protected remote access to easily evaluate and
develop software safely and securely on platforms
over the Internet
48Intel Early Access Program -Marketing
Opportunities and Support
- Extensive marketing and business development
opportunities - Inclusion in online and print versions of the
Intel Developer Solutions Catalog - Intel quotes to support your PR
- Case studies
- Access to Intels event marketing asset kit
- Participation in selected industry events and
trade shows
- Support in your development efforts provided
through - Access to an Intel Account Representative who
will act as your primary contact - Intel Premier Support for confidential technical
support - 24/7 online support via www.intel.com/software/sup
port
49Related Intel Resources
- Intel Early Access Program
- http//www.intel.com/software/EAP
- Intel Software Network
- http//www.intel.com/software
- Intel Software College
- http//www.intel.com/software/college
- Intel Software Development Tools
- http//www.intel.com/software/products
- Experience Intel Itanium 2 Architecture
- http//www.intel.com/cd/ids/developer/asmo-na/eng/
66176.htm
50QA Session To ask a question over the phone,
press 1 on your touch-tone telephone.
51QA Session To ask a question over the phone,
press 1 on your touch-tone telephone.
- September 2006
- Speaker Stephen Williams
- Caliper Development Team
- Hewlett-Packard